Reading Statistics
outline is provided on the side
Data Preparation
➔ think about what kind of analyses will be conducted and what type of data is needed
➔ identify/create and label the variables
◆ use a codebook, a log of how the data was prepared and how the analyses was conducted
➔ ensure correct values are inputted
➔ screen the data set for errors, missing values, etc
◆ using graphs [i.e. histograms, scatterplots, etc.] would be beneficial
Distributions
*the kinds of distributions are hyperlinked to google images of what they should look like*
➔ Guassian
◆ normal distribution
➔ Lognormal
◆ log normal distribution
➔ Skewed
◆ positively/negatively skewed
➔ Lepto
◆ leptokurtic distribution
● very skinny bell curve
➔ Platy
◆ platykurtic distribution
● very flat bell curve
Types of Statistics
➔ descriptive stats: organize and describe the data
◆ can’t make conclusions/generalizations based on these stats
◆ look for trends, but isn’t conclusive
◆ numerical summaries of data
➔ inferential stats: make predictions about the population through observations and analysis of
a sample, using statistical tests
◆ use descriptive stats to make explore the inferential stats, afterwards
◆ must have a good representative sample to make the predictions
, ● assess if the sample represents the general pop. by testing assumptions
◆ consider sampling error before inference
● relatively small SE to the end results
Histograms
➔ a visual summary of univariate data, w/ minimal loss of info
➔ usually used w/ dependent variable, but can also be used with independent variable
➔ identify the anomalies (factors that could skew the data) that violate assumptions
◆ i.e. outliers, non-normality
➔ histograms vs. bar graphs
Histograms Bar Graphs
● bars are touching, depicting the variable is ● bars don’t touch, depicting that the
continuous variable is discrete
● univariate graph: shows distribution of ● compares variables
one continuous variable [y-axis/IV can be
a frequency while the x-axis/DV is a
continuous variable]
● bars can’t be reordered; ascending order ● bars can be reordered; any order fine
only
➔ constructing bins (must be careful in order to avoid creating misleading information)
◆ bins are equal-sized; the range per bin needs to be equal
◆ the size/number of bins can change the shape of the graph
◆ formula for # of bins: 2k = n
● k: the number of bins
● n: the number of data points
● essentially, you’re taking the square root of n (isolating for k), k is the number of
bins, to determine if those number of bins is enough solve 2k and see if its
greater/equal to n
◆ formula for bin width: (Max - Min)/k
● divide the range by the number of bins
Descriptive Stats (click the title to return to summary of descriptive stats)
➔ two kinds
◆ measures of central tendency
outline is provided on the side
Data Preparation
➔ think about what kind of analyses will be conducted and what type of data is needed
➔ identify/create and label the variables
◆ use a codebook, a log of how the data was prepared and how the analyses was conducted
➔ ensure correct values are inputted
➔ screen the data set for errors, missing values, etc
◆ using graphs [i.e. histograms, scatterplots, etc.] would be beneficial
Distributions
*the kinds of distributions are hyperlinked to google images of what they should look like*
➔ Guassian
◆ normal distribution
➔ Lognormal
◆ log normal distribution
➔ Skewed
◆ positively/negatively skewed
➔ Lepto
◆ leptokurtic distribution
● very skinny bell curve
➔ Platy
◆ platykurtic distribution
● very flat bell curve
Types of Statistics
➔ descriptive stats: organize and describe the data
◆ can’t make conclusions/generalizations based on these stats
◆ look for trends, but isn’t conclusive
◆ numerical summaries of data
➔ inferential stats: make predictions about the population through observations and analysis of
a sample, using statistical tests
◆ use descriptive stats to make explore the inferential stats, afterwards
◆ must have a good representative sample to make the predictions
, ● assess if the sample represents the general pop. by testing assumptions
◆ consider sampling error before inference
● relatively small SE to the end results
Histograms
➔ a visual summary of univariate data, w/ minimal loss of info
➔ usually used w/ dependent variable, but can also be used with independent variable
➔ identify the anomalies (factors that could skew the data) that violate assumptions
◆ i.e. outliers, non-normality
➔ histograms vs. bar graphs
Histograms Bar Graphs
● bars are touching, depicting the variable is ● bars don’t touch, depicting that the
continuous variable is discrete
● univariate graph: shows distribution of ● compares variables
one continuous variable [y-axis/IV can be
a frequency while the x-axis/DV is a
continuous variable]
● bars can’t be reordered; ascending order ● bars can be reordered; any order fine
only
➔ constructing bins (must be careful in order to avoid creating misleading information)
◆ bins are equal-sized; the range per bin needs to be equal
◆ the size/number of bins can change the shape of the graph
◆ formula for # of bins: 2k = n
● k: the number of bins
● n: the number of data points
● essentially, you’re taking the square root of n (isolating for k), k is the number of
bins, to determine if those number of bins is enough solve 2k and see if its
greater/equal to n
◆ formula for bin width: (Max - Min)/k
● divide the range by the number of bins
Descriptive Stats (click the title to return to summary of descriptive stats)
➔ two kinds
◆ measures of central tendency