100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.6 TrustPilot
logo-home
Summary

Advanced Statistics (MAT20306) summary ALL LECTURES

Rating
-
Sold
-
Pages
73
Uploaded on
18-01-2026
Written in
2025/2026

Wageningen university, Advanced Statistics (MAT20306) summary of ALL lectures. Summary with lecture notes and the examples mentioned during the lectures. This document covers Confidence intervals and Hypothesis Testing, Sample size calculations Wilcoxon rank tests, One and two porportions, Chi square test & correlation, Linear models:: simple linear regression, Multiple Linear Regression, Multiple linear regression , One-way analysis of variance, pairwise comparisons, non-parametric F-test, Two-way ANOVA aka factorial ANOVA, Block design & relative efficiency (RE, Quantitative and categorical x-variables ANCOVA / General Linear Model

Show more Read less
Institution
Course











Whoops! We can’t load your doc right now. Try again or contact support.

Connected book

Written for

Institution
Study
Course

Document information

Summarized whole book?
No
Which chapters are summarized?
Everything covered during the course
Uploaded on
January 18, 2026
Number of pages
73
Written in
2025/2026
Type
Summary

Subjects

Content preview

Content
Lecture 1: Confidence intervals and Hypothesis Testing........................................2
Lecture 2: Sample size calculations Wilcoxon rank tests........................................9
Lecture 3: One and two porportions.....................................................................18
Lecture 4: Chi square test & correlation...............................................................24
Lecture 5: Linear models:: simple linear regression.............................................31
Lecture 6: Multiple Linear Regression 1................................................................41
Lecture 7: Multiple linear regression 2.................................................................44
Lecture 8: One-way analysis of variance, pairwise comparisons, non-parametric F-
test....................................................................................................................... 48
Lecture 9: Two-way ANOVA aka factorial ANOVA.................................................53
Lecture 10: Block design & relative efficiency (RE)..............................................60
Lecture 11: Quantitative and categorical x-variables ANCOVA / General Linear
Models.................................................................................................................. 66

, Lecture 1: Confidence intervals and Hypothesis Testing


What is a confidence interval? A confidence interval for a population parameter gives a range of
plausible values for that parameter based on the sample. Values inside the interval are plausible
parameter values given the observed sample.

Frequentist interpretation: A 1−α (for example, 95%) confidence interval procedure means: If we
repeated the exact sampling and interval-construction process many times (say 100 times), then
about 100×(1−α) of those intervals would contain the true population parameter.
So for a 95% CI: “We are 95% confident that the true parameter is inside this interval.” This is not the
same as saying there is a 95% probability that the particular interval you computed contains the
parameter, the probability statement refers to the procedure over repeated samples.

General formula for a two-sided t-based CI for a mean or difference of means
For many t-procedures the two-sided 100(1−α)% confidence interval has the form:

estimate ± t df (α /2)×standard error
 estimate = the point estimate (e.g., x́ for a single mean, or x́ 1−x́ 2for a difference of means).
 t df (α /2)= critical value from the Student’s t distribution with appropriate degrees of freedom, for
the two-tailed α-level.
 standard error = depends on the problem (see formulas below).
Factors that make a CI narrower (more precise): larger sample size n ,
smaller variability in the data (smaller s), and lower confidence level
(smaller 1−α) — but lowering confidence level reduces reliability.

The t distribution and degrees of freedom: The t distribution is
similar to the normal distribution but has heavier tails; it is used
when the population standard deviation σ is unknown and estimated
from the data. As sample size (or degrees of freedom) grows, the t
distribution approaches the normal distribution. degrees of freedom determine exact shape of t-
distribution

Degrees of freedom (df) quantify how well the standard deviation sis estimated; more df → closer to
normal. Typical df:
o One-sample mean or paired differences: df = n−1.
o Two-sample pooled t (equal variances assumed): df = n1 +n 2−2.
o Welch’s (unequal variances): a complicated approximation (Welch–Satterthwaite
formula), typically non-integer. See formula below.
Intuitively: df reflect how much independent information you had to estimate variability.

,Standard errors: formulas you must know
s
1. One-sample mean: SE(x́)=
√n
where sis the sample standard deviation and n is sample size.

2. Paired t (differences)
o Convert paired observations to differences d i =x i , after −x i ,before .
1 sd
o Use one-sample formulas on differences: d́= ∑d i ,SE( d́ )=
n √n
o df = n−1where n is the number of pairs.

3. Two-sample t with equal variances (pooled)
(n1−1) s12+(n2−1) s 22
2
o Pool the sample variances to get a pooled standard deviation: s = p
n1+ n2−2

and s p= s2p .

o Standard error of the difference of means: SE(x́ 1−x́ 2)=s p

df = n1 +n 2−2.
√ 1 1
+
n1 n 2


4. Two-sample t without equal variances (Welch’s t)

o Do not pool variances. Use: SE=
√ s21 s22
+
n1 n2
s1 s 2
2 2 2
+ ) (
o Approximate degrees of freedom using Welch–Satterthwaite: n1 n2
df ≈
¿¿¿
(This yields a positive real number; statistical software uses this.)
o Welch’s test is default in R and is safer when variances differ.


Sampling distribution of the difference between two sample means
We consider two independent samples:
 Sample 1: size n1 , sample mean ý 1, population variance σ 1
2


 Sample 2: size n2 , sample mean ý 2, population variance σ 22
We are interested in the statistic: ý 1− ý 2
This is an estimator of the population difference: μ1−μ 2

The sampling distribution of ( ý 1− ý 2 )is approximately normal for large samples because of the
Central Limit Theorem (CLT): Each sample mean is approximately normal when the sample size is
large or the population is normal. And the difference of two normally distributed variables is also
normally distributed. So: ý 1− ý 2 ≈ Normal distribution

The expected value (mean) of ý 1− ý 2is: μ ý − ý =μ1 −μ 2
1 2

This makes intuitive sense because on average, a sample mean estimates its population mean.

, Therefore, the difference of two sample
means estimates the difference of two population means.





2 2
σ σ
The standard error of the sampling distribution is: σ ý − ý = 1 + 2
1 2
n1 n 2
Why this formula? The variance of a sample mean is σ 2 /n. Since the samples are independent,
variances add then take the square root to get the standard error. This formula is the general case
when variances are not assumed equal.

When we assume the two population variances are equal, we simplify: σ 21=σ 22 =σ 2

1 2

In that case: σ ý − ý = σ (
2 1 1
+ )
n1 n2
But we do not know σ 2, it’s a population value. So we must estimate it using sample data. That’s
where the pooled standard deviation comes in.

Since we assume that both populations have the same variance and the best estimate of that
common variance is a pooled (combined) estimate.



2 2
(n1−1)s1 +(n2−1)s2
Definition shown in the slide: s p=
n 1+ n2−2

Meaning:
2 2
 We take each sample’s variance s1 , s 2
 Weight them by degrees of freedom ni −1
 Average them
 Then take the square root
This is a more accurate estimate of a shared variance than using either sample alone.
Degrees of freedom for the pooled variance: df =n1 +n2−2
This matches how many independent pieces of information were used in estimating the common
variance.

Once you have s p , the standard error of the sample difference becomes: SE( ý 1− ý 2 )=s p

This is the formula used for a pooled t-test or CI for two means with equal variances
√ 1 1
+
n 1 n2


Confidence interval for μ1−μ 2(equal variances)
The slide shows the formula:
Where:

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
lunafields HAS Den Bosch
Follow You need to be logged in order to follow users or courses
Sold
19
Member since
3 year
Number of followers
11
Documents
8
Last sold
7 months ago

4.0

3 reviews

5
1
4
1
3
1
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions