Notes 10
-
Scatterplot: used to find relationships between 2 quant. variables
-
Correlation Coefficient “r”
- Between -1 and +1 (closer to one straighter the line)
- r=0 straight line
- Outliers: pull r very strongly
- Correlation does not equal causation
- Just bc two things correlate doesn't mean that one
impacts the other
Notes 11
- Best fit line: line that best describes dataset (regression line)
- Add up distance of the points above and below the line they should be the same
- Square each of those distances and add together it will be impossible to find a line with a smaller value
Notes 12
- Normal distribution: bell shaped curve (unimodal (one high point on graph), symmetric, mean, median, and mode are the same)
- Described by its mean and standard deviation
- Used to compare values within data sets or between data sets too
- Z-scores: number given to a value in a data set that indicates how many stdv above or below mean value
- Used to standardize values coming from different normal distributions
- Only use when data is normally distributed
- Z= (value-mean)/ stdv
- Z score < -2 or > 2 the data value is an outlier
Notes 13
- Distribution: possible values for a variable and how likely they are
- Sampling distribution: possible sample means and how likely different
sample means are
- CLT: what sorts of sample means we get from random samples
- Approximately normal (big enough sample size for bell curve)
- Same mean as population
- Stdv equal to population of stdv divided by square root of sample size
Notes 14
- Probability: proportion of time we expect an outcome to occur in the long run (0.7 probability of rain today, over tim
there will be rain on days like today 70% of the time)
- Number between 0-1. 0= impossible 1=guaranteed
- Probability of an event not occurring is 1 minus the probability the event does occur
- Empirical probability: use the results of observations or experiments to estimate probabilities
- Theoretical probability: have many outcomes that all are equally likely to occur (⅙ chance of rolling a two on a six
sided dice)
- Statistically significant: unlikely to have occurred by chance alone (flipping 85 heads and 15 tails w/ a fair coin)
- Practical significance: describe an effect/event/difference that is “big enough” to matter or be used
Notes 15
- Law of large numbers: if a process is repeated many times, the proportion of the time an event occurs will be close t
the probability of that event occurring. More times repeated = closer observed proportion is to theoretical probability
- Expected value: weighted average of all its possible values. Only rel;event when there is a large # of events
- Expected value = ∑ (event value)*(event probability)
-
Scatterplot: used to find relationships between 2 quant. variables
-
Correlation Coefficient “r”
- Between -1 and +1 (closer to one straighter the line)
- r=0 straight line
- Outliers: pull r very strongly
- Correlation does not equal causation
- Just bc two things correlate doesn't mean that one
impacts the other
Notes 11
- Best fit line: line that best describes dataset (regression line)
- Add up distance of the points above and below the line they should be the same
- Square each of those distances and add together it will be impossible to find a line with a smaller value
Notes 12
- Normal distribution: bell shaped curve (unimodal (one high point on graph), symmetric, mean, median, and mode are the same)
- Described by its mean and standard deviation
- Used to compare values within data sets or between data sets too
- Z-scores: number given to a value in a data set that indicates how many stdv above or below mean value
- Used to standardize values coming from different normal distributions
- Only use when data is normally distributed
- Z= (value-mean)/ stdv
- Z score < -2 or > 2 the data value is an outlier
Notes 13
- Distribution: possible values for a variable and how likely they are
- Sampling distribution: possible sample means and how likely different
sample means are
- CLT: what sorts of sample means we get from random samples
- Approximately normal (big enough sample size for bell curve)
- Same mean as population
- Stdv equal to population of stdv divided by square root of sample size
Notes 14
- Probability: proportion of time we expect an outcome to occur in the long run (0.7 probability of rain today, over tim
there will be rain on days like today 70% of the time)
- Number between 0-1. 0= impossible 1=guaranteed
- Probability of an event not occurring is 1 minus the probability the event does occur
- Empirical probability: use the results of observations or experiments to estimate probabilities
- Theoretical probability: have many outcomes that all are equally likely to occur (⅙ chance of rolling a two on a six
sided dice)
- Statistically significant: unlikely to have occurred by chance alone (flipping 85 heads and 15 tails w/ a fair coin)
- Practical significance: describe an effect/event/difference that is “big enough” to matter or be used
Notes 15
- Law of large numbers: if a process is repeated many times, the proportion of the time an event occurs will be close t
the probability of that event occurring. More times repeated = closer observed proportion is to theoretical probability
- Expected value: weighted average of all its possible values. Only rel;event when there is a large # of events
- Expected value = ∑ (event value)*(event probability)