100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Class notes

Notes advanced statistics 2021

Rating
-
Sold
-
Pages
33
Uploaded on
03-12-2021
Written in
2021/2022

Notes of all lectures except surveillance

Institution
Course











Whoops! We can’t load your doc right now. Try again or contact support.

Connected book

Written for

Institution
Study
Course

Document information

Uploaded on
December 3, 2021
Number of pages
33
Written in
2021/2022
Type
Class notes
Professor(s)
Jos twisk & frank van leth
Contains
All classes

Subjects

Content preview

Advanced statistics notes

Lecture 1

Assumptions linear regression




Homoskedasticity

,*violation that can only occur to time series data, so there needs to be some kind of order in the x-
variable. So when you have cross-sectional data (survey) you don’t have to worry about
autocorrelation, cause there is no natural order in the x-variable.




There should be a bell shaped distribution (but the assumption is not really that important)

,
, Lecture 2: clustered data


Basic principles of mixed model analysis.

Back to the basics of linear regression. You have a scatterplot of all the observations, than you draw a
line through the dots and the characteristic of that line is that the distance from the line and the dots
is as least as possible (idea behind linear regression analysis). The regression line is the best way to
explain the linear relationship between y and x.




The line has 2 parameters, regression coefficients: b0 and b1.
- b0 is the value of the outcome when the independent variable(s) equals 0.
- b1 indicates how much the outcome differs with each unit difference from the independent
variable




If we want to correct for area, the b2 now means the difference in average health between area
number 1 and 2. But it is also an estimation of the difference in average health between area number
2 and 3, and area number 10 and 11. In other words: we assume a linear relationship between the
numbering of the area and the outcome variable health. That does not make any sense! You can’t do
this, area is not a continuous or discrete variable: it is a categorical varaible. Dummy’s in the
regression! 49 dummy variables for area, but not efficient to just adjust for area. You’re interested in
the relation between health and PA and you just want to adjust for area. and you’ll lose power..

Solution: using mixed model analysis. Efficient way to deal with a categorical variable with many
groups

In a mixed model there is a three steps method behind the scene:
1. estimate the intercepts for all groups
2. create a normal distribution over all the intercepts
3. estimate the variance of the normal distribution

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
gezondheidswetenschapper4life Vrije Universiteit Amsterdam
Follow You need to be logged in order to follow users or courses
Sold
51
Member since
7 year
Number of followers
37
Documents
7
Last sold
1 month ago

4,0

10 reviews

5
3
4
5
3
1
2
1
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can immediately select a different document that better matches what you need.

Pay how you prefer, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card or EFT and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions