100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4,6 TrustPilot
logo-home
Samenvatting

Advanced Statistics (MAT20306) summary ALL LECTURES

Beoordeling
-
Verkocht
-
Pagina's
73
Geüpload op
18-01-2026
Geschreven in
2025/2026

Wageningen university, Advanced Statistics (MAT20306) summary of ALL lectures. Summary with lecture notes and the examples mentioned during the lectures. This document covers Confidence intervals and Hypothesis Testing, Sample size calculations Wilcoxon rank tests, One and two porportions, Chi square test & correlation, Linear models:: simple linear regression, Multiple Linear Regression, Multiple linear regression , One-way analysis of variance, pairwise comparisons, non-parametric F-test, Two-way ANOVA aka factorial ANOVA, Block design & relative efficiency (RE, Quantitative and categorical x-variables ANCOVA / General Linear Model

Meer zien Lees minder











Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Documentinformatie

Heel boek samengevat?
Nee
Wat is er van het boek samengevat?
Everything covered during the course
Geüpload op
18 januari 2026
Aantal pagina's
73
Geschreven in
2025/2026
Type
Samenvatting

Onderwerpen

Voorbeeld van de inhoud

Content
Lecture 1: Confidence intervals and Hypothesis Testing........................................2
Lecture 2: Sample size calculations Wilcoxon rank tests........................................9
Lecture 3: One and two porportions.....................................................................18
Lecture 4: Chi square test & correlation...............................................................24
Lecture 5: Linear models:: simple linear regression.............................................31
Lecture 6: Multiple Linear Regression 1................................................................41
Lecture 7: Multiple linear regression 2.................................................................44
Lecture 8: One-way analysis of variance, pairwise comparisons, non-parametric F-
test....................................................................................................................... 48
Lecture 9: Two-way ANOVA aka factorial ANOVA.................................................53
Lecture 10: Block design & relative efficiency (RE)..............................................60
Lecture 11: Quantitative and categorical x-variables ANCOVA / General Linear
Models.................................................................................................................. 66

, Lecture 1: Confidence intervals and Hypothesis Testing


What is a confidence interval? A confidence interval for a population parameter gives a range of
plausible values for that parameter based on the sample. Values inside the interval are plausible
parameter values given the observed sample.

Frequentist interpretation: A 1−α (for example, 95%) confidence interval procedure means: If we
repeated the exact sampling and interval-construction process many times (say 100 times), then
about 100×(1−α) of those intervals would contain the true population parameter.
So for a 95% CI: “We are 95% confident that the true parameter is inside this interval.” This is not the
same as saying there is a 95% probability that the particular interval you computed contains the
parameter, the probability statement refers to the procedure over repeated samples.

General formula for a two-sided t-based CI for a mean or difference of means
For many t-procedures the two-sided 100(1−α)% confidence interval has the form:

estimate ± t df (α /2)×standard error
 estimate = the point estimate (e.g., x́ for a single mean, or x́ 1−x́ 2for a difference of means).
 t df (α /2)= critical value from the Student’s t distribution with appropriate degrees of freedom, for
the two-tailed α-level.
 standard error = depends on the problem (see formulas below).
Factors that make a CI narrower (more precise): larger sample size n ,
smaller variability in the data (smaller s), and lower confidence level
(smaller 1−α) — but lowering confidence level reduces reliability.

The t distribution and degrees of freedom: The t distribution is
similar to the normal distribution but has heavier tails; it is used
when the population standard deviation σ is unknown and estimated
from the data. As sample size (or degrees of freedom) grows, the t
distribution approaches the normal distribution. degrees of freedom determine exact shape of t-
distribution

Degrees of freedom (df) quantify how well the standard deviation sis estimated; more df → closer to
normal. Typical df:
o One-sample mean or paired differences: df = n−1.
o Two-sample pooled t (equal variances assumed): df = n1 +n 2−2.
o Welch’s (unequal variances): a complicated approximation (Welch–Satterthwaite
formula), typically non-integer. See formula below.
Intuitively: df reflect how much independent information you had to estimate variability.

,Standard errors: formulas you must know
s
1. One-sample mean: SE(x́)=
√n
where sis the sample standard deviation and n is sample size.

2. Paired t (differences)
o Convert paired observations to differences d i =x i , after −x i ,before .
1 sd
o Use one-sample formulas on differences: d́= ∑d i ,SE( d́ )=
n √n
o df = n−1where n is the number of pairs.

3. Two-sample t with equal variances (pooled)
(n1−1) s12+(n2−1) s 22
2
o Pool the sample variances to get a pooled standard deviation: s = p
n1+ n2−2

and s p= s2p .

o Standard error of the difference of means: SE(x́ 1−x́ 2)=s p

df = n1 +n 2−2.
√ 1 1
+
n1 n 2


4. Two-sample t without equal variances (Welch’s t)

o Do not pool variances. Use: SE=
√ s21 s22
+
n1 n2
s1 s 2
2 2 2
+ ) (
o Approximate degrees of freedom using Welch–Satterthwaite: n1 n2
df ≈
¿¿¿
(This yields a positive real number; statistical software uses this.)
o Welch’s test is default in R and is safer when variances differ.


Sampling distribution of the difference between two sample means
We consider two independent samples:
 Sample 1: size n1 , sample mean ý 1, population variance σ 1
2


 Sample 2: size n2 , sample mean ý 2, population variance σ 22
We are interested in the statistic: ý 1− ý 2
This is an estimator of the population difference: μ1−μ 2

The sampling distribution of ( ý 1− ý 2 )is approximately normal for large samples because of the
Central Limit Theorem (CLT): Each sample mean is approximately normal when the sample size is
large or the population is normal. And the difference of two normally distributed variables is also
normally distributed. So: ý 1− ý 2 ≈ Normal distribution

The expected value (mean) of ý 1− ý 2is: μ ý − ý =μ1 −μ 2
1 2

This makes intuitive sense because on average, a sample mean estimates its population mean.

, Therefore, the difference of two sample
means estimates the difference of two population means.





2 2
σ σ
The standard error of the sampling distribution is: σ ý − ý = 1 + 2
1 2
n1 n 2
Why this formula? The variance of a sample mean is σ 2 /n. Since the samples are independent,
variances add then take the square root to get the standard error. This formula is the general case
when variances are not assumed equal.

When we assume the two population variances are equal, we simplify: σ 21=σ 22 =σ 2

1 2

In that case: σ ý − ý = σ (
2 1 1
+ )
n1 n2
But we do not know σ 2, it’s a population value. So we must estimate it using sample data. That’s
where the pooled standard deviation comes in.

Since we assume that both populations have the same variance and the best estimate of that
common variance is a pooled (combined) estimate.



2 2
(n1−1)s1 +(n2−1)s2
Definition shown in the slide: s p=
n 1+ n2−2

Meaning:
2 2
 We take each sample’s variance s1 , s 2
 Weight them by degrees of freedom ni −1
 Average them
 Then take the square root
This is a more accurate estimate of a shared variance than using either sample alone.
Degrees of freedom for the pooled variance: df =n1 +n2−2
This matches how many independent pieces of information were used in estimating the common
variance.

Once you have s p , the standard error of the sample difference becomes: SE( ý 1− ý 2 )=s p

This is the formula used for a pooled t-test or CI for two means with equal variances
√ 1 1
+
n 1 n2


Confidence interval for μ1−μ 2(equal variances)
The slide shows the formula:
Where:

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
lunafields HAS Den Bosch
Bekijk profiel
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
19
Lid sinds
3 jaar
Aantal volgers
11
Documenten
8
Laatst verkocht
7 maanden geleden

4,0

3 beoordelingen

5
1
4
1
3
1
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen