LECTURE NOTES STATISTICAL MODELS
§ I Analysis of Variance
◦ ne -
way anova model :
Y =
Ni t ei
Observation from level
"
*
Y :
jt I
*
Ni = E[
Y ] is the mean of the response variable (level i
)
•
e
:
random error
it ( i, j ) =/ (Kl)
02 (e en)
±0
model assumption :
Eleij] = 0 ,
Varfe ] =
,
COV ,
let Mi =
te + ✗ i 5. t .
Y =
µ + ✗ ite
*
µ : mean across alt levels
* ✗i : Main effect of level I on response variable
balanced design when ni is the same for alt levels i
= ( H ,✗ i , . ..
.
X, )T
←
matrix design : full model R : Y = ✗B te
+ tirst column = '
↑
design matrix (n 11=+1 ) )
LSE :
minimize the error iv.rt .
B
SCB) =
? ? ez =
?? (Y -
µ -
✗i)
≥
= 11 Y -
13112
✗
 that satisties the normalequation XTXB = XTY St .
 -
_
(XTXÏXTY
For XTX to be invertible , ✗ must be 1-411 rank
However , this is often not the cases ,
so we add a constraint :
① Standard parametrizationtt-onis.t.li =
Hi
BI ( 0 Â À) Â;
, . . ..
.
and = In
§ Yij =
Yi .
I
② Sum parametrization
£✗ i =
0
Û È
=
?Ë ? Y = Ì . . and & .
= vii. -
Ï . .
③ Treatment parametrization ✗ 0 ,
=
S.t.li =
Hi -
µ,
Û YÌ and Ô Yi -41
=
( default
.
=
. . in R)
ijijij
ij
ij
ijij
ij ij
, Assume we dea.de µ= 0
Within sum of residual sum of squares
groups squares =
≥
= sum of Square errors (SSE) Sr IIY ✗ÂII ( Y YÌ )
= = - =
§ -
.
Sr
02
~ 22 =
In I
sit E[ ¥] =
n I
-
and hence E[ { ] 02 =
rank (x)
.
-
n -
*
Sr
50 we define Ô? n -
I (unbiased)
Consi der Null model WCR that implies that the mean response
is the same for AN factors ,
It .
the factor has no Influence
on the response .
This redeuced model is : w : Y -
. late where
✗, = ✗2 = = ✗I = ✗ -
ÌÎÏY
. . .
1×11 ?
≥
*
Sw = 11 Y -
- F ) . .
i j
In
2
*
Sw
/ ~ -
RANK(x)
=
In -1
• = Sw
N - I
I
≥
Between groups sum of squares Sw sr-
=
§ ,
ni (Ì . .
-
YT )
.
* Sw is independent of Sw Er -
is Sw -
Sr ~
TE -1
02
F- statistici . F = (Sw -
Sr) ( I T) -
~ F-→ in ±
-
Sr / ( n I) -
#
* F is deviation Of w -
fit from 1- fit normalized b
*
If F is
big ,
then W -
fit is bad
Hypothesis test Ho :
✗i =
✗ for alt i (wnolds )
H , :X ; ≠ ×; for some (ij) ( W does not had )
> we
reject Ho it f >
FI Or p value
- < ✗
-
1. n ± ; ,
- _
×
norm al
Model diagnostic : ↓
* Plot Û against ê ( symmetrie + no pattern) + QQ -
plot of ê
* Barlett -
test for constancy of error variante
*
Shapiro -
Wilk test for normality ( Kruska1- Wallis it normalit does not had )
ijij IJ
ij
§ I Analysis of Variance
◦ ne -
way anova model :
Y =
Ni t ei
Observation from level
"
*
Y :
jt I
*
Ni = E[
Y ] is the mean of the response variable (level i
)
•
e
:
random error
it ( i, j ) =/ (Kl)
02 (e en)
±0
model assumption :
Eleij] = 0 ,
Varfe ] =
,
COV ,
let Mi =
te + ✗ i 5. t .
Y =
µ + ✗ ite
*
µ : mean across alt levels
* ✗i : Main effect of level I on response variable
balanced design when ni is the same for alt levels i
= ( H ,✗ i , . ..
.
X, )T
←
matrix design : full model R : Y = ✗B te
+ tirst column = '
↑
design matrix (n 11=+1 ) )
LSE :
minimize the error iv.rt .
B
SCB) =
? ? ez =
?? (Y -
µ -
✗i)
≥
= 11 Y -
13112
✗
 that satisties the normalequation XTXB = XTY St .
 -
_
(XTXÏXTY
For XTX to be invertible , ✗ must be 1-411 rank
However , this is often not the cases ,
so we add a constraint :
① Standard parametrizationtt-onis.t.li =
Hi
BI ( 0 Â À) Â;
, . . ..
.
and = In
§ Yij =
Yi .
I
② Sum parametrization
£✗ i =
0
Û È
=
?Ë ? Y = Ì . . and & .
= vii. -
Ï . .
③ Treatment parametrization ✗ 0 ,
=
S.t.li =
Hi -
µ,
Û YÌ and Ô Yi -41
=
( default
.
=
. . in R)
ijijij
ij
ij
ijij
ij ij
, Assume we dea.de µ= 0
Within sum of residual sum of squares
groups squares =
≥
= sum of Square errors (SSE) Sr IIY ✗ÂII ( Y YÌ )
= = - =
§ -
.
Sr
02
~ 22 =
In I
sit E[ ¥] =
n I
-
and hence E[ { ] 02 =
rank (x)
.
-
n -
*
Sr
50 we define Ô? n -
I (unbiased)
Consi der Null model WCR that implies that the mean response
is the same for AN factors ,
It .
the factor has no Influence
on the response .
This redeuced model is : w : Y -
. late where
✗, = ✗2 = = ✗I = ✗ -
ÌÎÏY
. . .
1×11 ?
≥
*
Sw = 11 Y -
- F ) . .
i j
In
2
*
Sw
/ ~ -
RANK(x)
=
In -1
• = Sw
N - I
I
≥
Between groups sum of squares Sw sr-
=
§ ,
ni (Ì . .
-
YT )
.
* Sw is independent of Sw Er -
is Sw -
Sr ~
TE -1
02
F- statistici . F = (Sw -
Sr) ( I T) -
~ F-→ in ±
-
Sr / ( n I) -
#
* F is deviation Of w -
fit from 1- fit normalized b
*
If F is
big ,
then W -
fit is bad
Hypothesis test Ho :
✗i =
✗ for alt i (wnolds )
H , :X ; ≠ ×; for some (ij) ( W does not had )
> we
reject Ho it f >
FI Or p value
- < ✗
-
1. n ± ; ,
- _
×
norm al
Model diagnostic : ↓
* Plot Û against ê ( symmetrie + no pattern) + QQ -
plot of ê
* Barlett -
test for constancy of error variante
*
Shapiro -
Wilk test for normality ( Kruska1- Wallis it normalit does not had )
ijij IJ
ij