Formula’s Measurements of association
If linear relation is expected: Pearson’s
Variance: -> SD From population (affected by outliers → thus create first a scatterplot
𝒔 = √Var value to %: Association
Percentage
Standard error: 𝒙 Not necessarily linear :Spearman’s (based on ranks)
= ( ) × 𝟏𝟎𝟎
𝒔 𝑵 Less vulnerable for outliers, takes into account monotonic but
𝑺𝑬 =
√𝒏 non-linear relationships
From % to
Confidence interval: population value: Cramer: 0-1
𝒙 linear rel. = all symmetric (no IV and DP considered) %
Percentage difference = asymmetric
𝑬𝒔𝒕𝒊𝒎𝒂𝒕𝒆 ± 𝒛(𝟐) × 𝑺𝑬 =( )×𝑵
𝟏𝟎𝟎
SE for correlation coefficient r is 0= 1/sqrt(n-3)
Steps Calculate In R Interpretation
Confidence interval 1: Find 𝑝̂ 𝑥 #1st: amount of people who answer yes , #2 = n Estimator: a quantity that you compute based on sample data
𝑝̂ =
for proportion 2: Calculate SE 𝑁 prop.test (…, …)
3: Find Z (2) binom.test(…,…) Estimate: the actual value that you get when computing an estimate
4: Calculate CI 95% 𝑝̂ (1 − 𝑝̂ ) (we use prop.test or binom.test because our dependent variable is
𝑆𝐸 = √ dichotomous)
𝑛
#CI in dataset
𝑝̂ ± 𝑧(2) × 𝑆𝐸 table (dataset)
binom.test(…,…)
table (dataset$vari) → proportion in this variable
T-test 1: Hypothesis 𝑥−μ # T-value One-sample t-test of differences
𝑡=
2: Calculate your t-test 𝑠𝑒 T=(x-u)/(s/sqrt(n)) H0: 𝜇2 - 𝜇1 = 0 (no change); HA: 𝜇2 - 𝜇1 ≠ 0 (change)
3: Find p-value using R #P value (transform t in negative)
𝑠
4. interpret t-value 𝑆𝐸 = pt (t,df) * 2 Two-sample t-test: Tests if the mean of group 1 differs from the mean of group 2.
5. CI (by hand) √𝑛 H0: μ(married) = μ(single) (no change) HA: μ(married) ≠ μ(single) (or μ(married) >
➔ When p-value = <0,05, we reject H0 μ(single) if one-sided)
𝑑𝑓 = 𝑛 – 1 Which means that the average in this sample is significantly
different from the population. 2-sided hypothesis: H0: 𝜇 = 100; HA: 𝜇 is not 100
𝑚𝑎𝑟𝑔𝑖𝑛 𝑜𝑓 𝑒𝑟𝑟𝑜𝑟 = 𝑡 − 𝑣𝑎𝑙𝑢𝑒 ∗ 𝑆𝐸 ➔ When p-value = > 0,05 we can’t reject H0 1-sided hypothesis: H0: 𝜇 = 100; HA: 𝜇 < 100
𝑚𝑒𝑎𝑛 ± 𝑧(2) × 𝑆𝐸 t.test: compare mean from a single group to a known value. (one sample t-test and
paired t-test are similar) → paired is only measured overtime and the difference will be
the one sample t-test. Independent two-sample t-test whether two groups differ from
each other
Linear equations 𝛽0 = intercept = starting point 𝑦̂ = 𝛽0 + 𝛽1 ⋅ 𝑥
𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡 = 𝑦 𝑤ℎ𝑒𝑛 𝑥 𝑖𝑠 0
Slope:
𝛽1 = slope = Geeft aan hoeveel
𝑦̂ verandert als 𝑥 met 1 eenheid stijgt 𝑦2 − 𝑦1
𝑥2 − 𝑥1
𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖ë𝑛𝑡 (𝑏2) =
𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑤𝑖𝑡ℎ 𝑜𝑡ℎ𝑒𝑟 𝑔𝑟𝑜𝑢𝑝 + 𝑜𝑟 −
𝑤ℎ𝑒𝑛 𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑔𝑟𝑜𝑢𝑝 𝑥 = 0
Chi square 1. Calculate expected values (row total) ⋅ (column total) # to find P-value given a chisquare and df # P-value < 0.05 there is a significant association (reject H0)
2. Calculate chi-square for 𝐸= pchisq(chisq, DF, lower.tail = FALSE) # P-value > 0.05 there is no singificant association (don’t reject H0)
grand total
each cell
3. Add up each cell (𝑂 − 𝐸)2
4. Calculate DF χ2 =
𝐸
5. P-value in R
(𝑂 − 𝐸)2
χ2 = ∑
𝐸
𝑑𝑓 = (𝑟 − 1) × (𝑐 − 1)
Goodness of fit test #Goodness of fit test #H0: The sample proportions are a good representation of the population proportions
If linear relation is expected: Pearson’s
Variance: -> SD From population (affected by outliers → thus create first a scatterplot
𝒔 = √Var value to %: Association
Percentage
Standard error: 𝒙 Not necessarily linear :Spearman’s (based on ranks)
= ( ) × 𝟏𝟎𝟎
𝒔 𝑵 Less vulnerable for outliers, takes into account monotonic but
𝑺𝑬 =
√𝒏 non-linear relationships
From % to
Confidence interval: population value: Cramer: 0-1
𝒙 linear rel. = all symmetric (no IV and DP considered) %
Percentage difference = asymmetric
𝑬𝒔𝒕𝒊𝒎𝒂𝒕𝒆 ± 𝒛(𝟐) × 𝑺𝑬 =( )×𝑵
𝟏𝟎𝟎
SE for correlation coefficient r is 0= 1/sqrt(n-3)
Steps Calculate In R Interpretation
Confidence interval 1: Find 𝑝̂ 𝑥 #1st: amount of people who answer yes , #2 = n Estimator: a quantity that you compute based on sample data
𝑝̂ =
for proportion 2: Calculate SE 𝑁 prop.test (…, …)
3: Find Z (2) binom.test(…,…) Estimate: the actual value that you get when computing an estimate
4: Calculate CI 95% 𝑝̂ (1 − 𝑝̂ ) (we use prop.test or binom.test because our dependent variable is
𝑆𝐸 = √ dichotomous)
𝑛
#CI in dataset
𝑝̂ ± 𝑧(2) × 𝑆𝐸 table (dataset)
binom.test(…,…)
table (dataset$vari) → proportion in this variable
T-test 1: Hypothesis 𝑥−μ # T-value One-sample t-test of differences
𝑡=
2: Calculate your t-test 𝑠𝑒 T=(x-u)/(s/sqrt(n)) H0: 𝜇2 - 𝜇1 = 0 (no change); HA: 𝜇2 - 𝜇1 ≠ 0 (change)
3: Find p-value using R #P value (transform t in negative)
𝑠
4. interpret t-value 𝑆𝐸 = pt (t,df) * 2 Two-sample t-test: Tests if the mean of group 1 differs from the mean of group 2.
5. CI (by hand) √𝑛 H0: μ(married) = μ(single) (no change) HA: μ(married) ≠ μ(single) (or μ(married) >
➔ When p-value = <0,05, we reject H0 μ(single) if one-sided)
𝑑𝑓 = 𝑛 – 1 Which means that the average in this sample is significantly
different from the population. 2-sided hypothesis: H0: 𝜇 = 100; HA: 𝜇 is not 100
𝑚𝑎𝑟𝑔𝑖𝑛 𝑜𝑓 𝑒𝑟𝑟𝑜𝑟 = 𝑡 − 𝑣𝑎𝑙𝑢𝑒 ∗ 𝑆𝐸 ➔ When p-value = > 0,05 we can’t reject H0 1-sided hypothesis: H0: 𝜇 = 100; HA: 𝜇 < 100
𝑚𝑒𝑎𝑛 ± 𝑧(2) × 𝑆𝐸 t.test: compare mean from a single group to a known value. (one sample t-test and
paired t-test are similar) → paired is only measured overtime and the difference will be
the one sample t-test. Independent two-sample t-test whether two groups differ from
each other
Linear equations 𝛽0 = intercept = starting point 𝑦̂ = 𝛽0 + 𝛽1 ⋅ 𝑥
𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡 = 𝑦 𝑤ℎ𝑒𝑛 𝑥 𝑖𝑠 0
Slope:
𝛽1 = slope = Geeft aan hoeveel
𝑦̂ verandert als 𝑥 met 1 eenheid stijgt 𝑦2 − 𝑦1
𝑥2 − 𝑥1
𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖ë𝑛𝑡 (𝑏2) =
𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑤𝑖𝑡ℎ 𝑜𝑡ℎ𝑒𝑟 𝑔𝑟𝑜𝑢𝑝 + 𝑜𝑟 −
𝑤ℎ𝑒𝑛 𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑔𝑟𝑜𝑢𝑝 𝑥 = 0
Chi square 1. Calculate expected values (row total) ⋅ (column total) # to find P-value given a chisquare and df # P-value < 0.05 there is a significant association (reject H0)
2. Calculate chi-square for 𝐸= pchisq(chisq, DF, lower.tail = FALSE) # P-value > 0.05 there is no singificant association (don’t reject H0)
grand total
each cell
3. Add up each cell (𝑂 − 𝐸)2
4. Calculate DF χ2 =
𝐸
5. P-value in R
(𝑂 − 𝐸)2
χ2 = ∑
𝐸
𝑑𝑓 = (𝑟 − 1) × (𝑐 − 1)
Goodness of fit test #Goodness of fit test #H0: The sample proportions are a good representation of the population proportions