Overview
- Inferential statistics helps generalize conclusion about a) CI, b) p-value
1 sample = 1 observation
Chapter 1 DICTIONARY
1. Sampling distribution: many samples, all possible sample statistic values and their
probabilities/ probability densities
2. Sample statistic: (aka. Random variable) number describing a characteristic of the
sample (ex. Number of yellow candies)
3. Sampling statistics: (aka. Relative frequency); sampling space (value 0-1), probability .
proportion based, sample distribution = probability distribution
4. Sampling space:(all possible sample statistic values) sample statistic value present on x
axis of graph
5. Expected value: value drawn from population = mean of sampling distribution; mean of a
probability distribution such as a sample distribution
a. True population value = expected value
6. Probability density: a way of getting probability of a continuous random variable (like a
sample statistic) falls within a particular range
- Continuous variables! (range of values!) (ex. Bag with average candy weight of
AT LEAST 2.8 grams)
- Choose threshold/ range (ex. Between 2.70 grams to 2.85, and the
expected average is 2.8 grams)
- Probability density function: x-axis values for continuous probability
distribution (0-1)! NOT PROBABILITY!
- Right-hand probabilities: probabilities of values above (and
including) a threshold
- Left-hand probabilities: probabilities of values up to (and including)
a threshold value
7. Random variable: variable with values that depend on chance
, 8. Confidence interval: estivate possible range of values for sample statistic of a selected
population (compare sample drawn vs. expected value of population)
9. Population: large set of observations about which we want to make a statement
10. Sample: smaller set of observations about which we want to make a statement
11. Population statistic: parameter
12. Probability distribution: when we change frequency in sampling distribution into
proportions
- Tells us
- How many yellow candies to expect in bag of 10 candies
- Probability of specific outcome occurring
Population value: 1) draw 1000+ samples, 2) calculate mean of sampling distribution (of sample
statistic), 3) that number = population value
- To calculate population value accurately:
1. Random samples only
2. Unbiased estimated (used throughout course, and assumed in SPSS)
3. Continuous (probability density) vs. discrete (probability)
4. Impractical (time/resources)
Chapter 2: Probability models, sampling distribution
Mean of sampling distribution = expected value = true value in population
Sampling distribution construction from ONE SAMPLE
1. Bootstrapping (NON-CATEGORICAL VARIABLES)
Step 1) Calculate number of yellow candies (%) in original sample
Step 2) see if mean of bootstrap sample is the same = true sampling distribution
Sampling with replacement: bootstrap sample is different than original
- Pros: Creates meaningful sampling distribution
Sampling without replacement: proportion/ sample statistic of interest = identical to original
sample
- Con: Does not create meaningful sampling distribution
Sample statistic of interest: (ex. Proportion of yellow candies)
Cons of bootstrapping
1) Samples must be drawn randomly
2) Samples must be large
2. Exact approach (ONLY CATEGORICAL & DISCRETE VALUES!)
Aim: to calculate exact probabilities of all possible sample results
, Conditions
Pro Con
True sampling distribution Categorical & discrete variables only
Computer intensive
3. Theoretical approximation
Theoretical probability distribution: sampling distribution as math function
Normal distribution: larger amount of samples = more accurate (1000+ samples)
Conditions
- Probability of drawing a sample statistic X population size > 5
Con
- Does not fit sampling distribution for all kinds of data (can be skewed towards left/right
and therefore, it does not appear in the graph)
- Approximation of sampling distribution does not equal the true sampling distribution
- T-distribution: tests on means in small samples
- F-distribution: analysis of variance (ANOVA)
- Chi-squared: categorical variables
Chapter 3: Estimating a parameter; which population values are possible?
Confidence level/ probability: area under the curve which is not in the rejection area
Percision: width of interval (ex. 95% confidence interval)
Critical value: (z-value) where the CI ends/starts
- Population value does not have probability ( because its an exact. One value)
Z x SE =lower limit/ upper limit
exact distance between sample result and lowest plausible population value (lower limit)
*SE: Standard deviation of sampling distribution (calculated by SPSS)
To find lower and upper limit
- Set sample as mean, and apply Z x SE
- This leads to the conclusion that we are 95% confident that the average candy
weight in the population is between X gram and X
- Y grams.
Chapter 3 Dictionary
1. Point estimate: single guest for population value (based on sample)