100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.6 TrustPilot
logo-home
Summary

Summary Comprehensive final exam review: EVERYTHING you need to know from student who got 96% in Stats 2244. Includes notes from all prep 101 sessions.

Rating
-
Sold
-
Pages
42
Uploaded on
12-08-2023
Written in
2023/2024

Comprehensive final exam review: EVERYTHING you need to know from student who got 96% in Stats 2244. Includes notes from all prep 101 sessions.

Institution
Course

Content preview

STATS 2244 FINAL STUDY
REVIEW
December 10 2022

,Summarizing and Exploring Data
Data Stage: collect, monitor the quality of, and conduct a preliminary exploration of the data
Does the data collection method need “tweaking” to ensure quality (monitoring)?
Are there patterns, trends, or associations apparent in the data?
Are there any outliers or missing values? If so, how will you handle them?

Selecting a Summary
 How many variables do you have?
o Univariate: 1 variable
 Will describe the distribution of this one variable
o Bivariate: 2 variables
o Multivariate: three or more variables
 Can explore relationships between variables
 What types of variables do you have?
o Explanatory / response
o Quantitative / categorical
 What characteristic(s) or relationship do you want to emphasize?
o Parameter, Measures of Spread, Relationship

Measures of Spread
Measures of Spread: characterize the variability in a distribution
Range
Range = maximum – minimum
 Inflated by outliers and skew
5-Number Summary
 5-number summary splits a distribution into 4 quarters
Minimum, Q1, x̃, Q3, maximum
 Q1 = 25th percentile
 X̃ = median
o Centermost value: order the dataset smallest→largest then take the middle value
 Q3 = 75th percentile
Interquartile Range (IQR): Q3-Q1
IQR = Q3 – Q1
 Q3 = third quartile = 75th percentile
 Q1 = first quartile = 25th percentile
 IQR contains the 50% of the data surrounding the median (25% above, 25% below)
1

,Percentiles
Percentile: a value below which a particular percentage of the distribution lies
 Quartiles are percentiles which divide the distribution into 4 equal size sections
o Q1 = first quartile = 25th percentile = 25% of distribution lies below this value
o Q2 = second quartile = 50th percentile = 50% of distribution lies below this value
o Q3 = third quartile = 75th percentile = 75% of distribution lies below this value
 If a value is in the 90th percentile, it is in the top 10% of the distribution




Variance
 Takes into account all the data we have
Sample variance
 Sample variance is a statistic
 The larger the s2, the more variable the data (wider the spread)
 Calculates the average of the square differences from the sample mean
 R automatically uses this equation to calculate variance (assumes we’re working with
sample data, not population data)




Population variance
 Population variance is a parameter
 The larger the σ2, the more variable the data (wider spread)
 Calculates the average of the square differences from the population mean (µ)
o Takes every value in the distribution and subtracts it from the population mean
o Squares the differences (between values and mean) to get rid of the negatives
o Divides by the total number of values in the distribution (N)




Standard Deviation
 Square root of the sample variance

2

, o Gets rid of the squaring and returns variance to original units
 Suitable for use with distributions without extreme outliers and/or skew
o Extreme outliers can make it seem like data has a wide variation, but really just
due to outliers




Measures of Center
Measures of center: tell us the “typical” value of a distribution
Mean
Mean (average): add up all the values and divide by the total number of values
 Affected by outliers
Median
Median: arrange values smallest → largest and take centermost value
 50th percentile: 50% of distribution below, 50% of the distribution above
 Is not affected by outliers / extreme values

Describing Shape of a Distribution
 Can describe the shape of a distribution when it is represented as a histogram
o Histogram: shows frequency distribution for univariate quantitative data
 All values for variable on x-axis; frequency on y-axis
Symmetry
Symmetry: the degree to which the distribution looks like a mirror image when split down the
center




 Opposite of symmetric is skewed


3

Written for

Institution
Study
Course

Document information

Uploaded on
August 12, 2023
Number of pages
42
Written in
2023/2024
Type
Summary

Subjects

$40.99
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Get to know the seller
Seller avatar
oawn18

Get to know the seller

Seller avatar
oawn18 University of Western Ontario
Follow You need to be logged in order to follow users or courses
Sold
0
Member since
2 year
Number of followers
0
Documents
8
Last sold
-

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions