100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Class notes

Statistics II Lecture Notes

Rating
4.0
(2)
Sold
6
Pages
31
Uploaded on
21-10-2020
Written in
2019/2020

This document contains lecture notes from the Statistics II: Applied Quantitative Analysis course, which is mandatory for all International Relations and Organizations students.

Institution
Course











Whoops! We can’t load your doc right now. Try again or contact support.

Connected book

Written for

Institution
Study
Course

Document information

Uploaded on
October 21, 2020
Number of pages
31
Written in
2019/2020
Type
Class notes
Professor(s)
Unknown
Contains
All classes

Subjects

Content preview

I. COMPARING TWO MEANS: Steps of statistical inference
1. Hypothesis
a. Null hypothesis: ∆= 0
b. Alternative hypothesis: ∆≠ 0
2. Test statistic
"

a. T-test: % = " in this example %̂ = 3.45
%(∆)
#$
3. Sampling distribution of the test statistic
a. T-distribution with 11202 (+()$*(+$,( + +-.,()./ − 2 012345) degrees of freedom
4. Look up/calculate p=value for %̂ = 3.45; 67 = 11202
a. p=0.0006
5. Conclusion
a. Reject the null hypothesis at the 5% significance level (because p < 0.05)
b. Earnings are different from those who followed the training program

II. ANOVA: Comparing more than two means
• If we want to compare more than two means, we cannot use a simple t-test
• ANOVA considers the differences between groups and the differences within groups

EXAMPLE: Is there a statistically significant difference between number of TV appearances for MPs of different parties?
Figure 1. Number of TV show entries




Figure 2. Total sum of squares (990 ) | 990 = 991 + 992
6
990 = ∑7
389;<3 − <̅4)*,5 >


<̅4)*,5 = 3 + 2 + 4 + 7 + 5 + 6 + 8 + 5 + 7 = 47 ÷ 9 = 5.22

990 = (3 − 5.22)6 + (2 − 5.22)6 + (4 − 5.22)6
+(7 − 5.22)6 + (5 − 5.22)6 + (6 − 5.22)6
+(8 − 5.22)6 + (5 − 5.22)6 + (7 − 5.22)6 = 31.55

FF: = GH. II



Figure 3. Model sum of squares (991 ) - 99;$(<$$,
CDA: <̅9 = (3 + 2 + 4) ÷ 3 = 3
VVD: <̅6 = (7 + 5 + 6) ÷ 3 = 6
PvdA: <̅= = (8 + 5 + 7) ÷ 3 = 6.67
(With k for the group (here: political party) and <̅ > the mean for that group
>
6
991 = J +> ;<̅> − <̅4)*,5 >
>89
= 3(3 − 5.22)6 + 3(6 − 5.22)6 + 3(6.67 − 5.22)6 = 22.89

FF? = KK. LM


Figure 4. Residual sum of squares (992 ) - 99@3(A3,
992 = ∑(<3> − <̅> )6
= (3 − 3)6 + (2 − 3)6 + (4 − 3)6
+(7 − 6)6 + (5 − 6)6 + (6 − 6)6
+(8 − 6.67)6 + (5 − 6.67)6 + (7 − 6.67)6 = 8.67

FFB = L. NO

,991 is good to answer the question: Which part of the total sum of squares can we explain by using the group means?
992 is good to answer the question: Which part of the total sum of squares cannot be explained by using the group means?


Mean squares
• The model sum of squares (991 ) is based on the difference between 3 group means and the grand mean.
o The degrees of freedom is the number of groups minus 1 for the grand mean
991 22.89
P91 = = = 11.44
671 2
671 = 3 − 1 = 2
• The residual sum of squares (992 ) is based on the difference between each value and its group mean
o The degrees of freedom is based on the number of observations (minus the number of groups)
992 8.67
P92 = = = 1.44
672 6
672 = 9 − 3 = 6
F statistic
• The ratio between the variance explained by the model (P91 ) and the variance NOT explained by the model (P92 )
• If Q > 1, the model can explain more than what it leaves unexplained
P91 11.44
Q= = = 7.92
P92 1.44

Inference: conclusion about population
Null hypothesis: the mean of all groups is the same

We compare this score for the F-test to the F-distribution.
This distribution has two sets of degrees of freedom: 671 and 672 . Here: 2 and 6.

Critical value for a significance level (a-level) of 0.05 and 2 and 6 degrees of freedom is 5.14.




SCDEFECGH compared to SIJKLDMLN
• The observed value of F (Q.O#$)P$5 = 7.92) is greater than the correspond ding critical value (Q-)3(3-*/ = 5.14)
• Therefore, we reject the null hypothesis (null hypothesis: the mean of all groups is the same)




Reporting: There was a statistically significant difference (at the 5% level) between parties in terms of the average number of tv show entries by their
politicians, F(2, 6) = 7.92, p = 0.021.

,REGRESSION ANALYSIS
Why do we use regression for statistical inference?
• To express uncertainty about our conclusions about the relation between 2 concepts
• Assessing the strength of a relation
• Understand the population (based on a sample)
Why regression?
• What if we are not just interested in the difference between two means, but in how the mean values of a variable change as another
variable changes
• Example: Have available incomes increased in rich and poor countries, or have poor countries remained poor?




• How can we describe the strength of this association? Correlation? r = 0.961

Regression is related to correlation
• But regression can assess the impact of several independent variables on one specific dependent variable
o Not just strength of the association, but size of the effect: the expected change in Y as a result of a 1-unit change in X
• By assuming a linear association exists
• Regression can assess the null hypothesis: incomes are unrelated to incomes in the past

EXAMPLE: What is the relationship between the number of seats a party has in parliament and the number of motions it tables?




‘Line of best fit’
• Minimizing the distances between points and the line; your best guess given the data available

REGRESSION EQUATION: T = U + V<
• Intercept (constant): a; if the number of seats is 0, how many motions can we expect (according to the model)?
• Slope: b; if the number of seats increases by 1, what is the expected change in the number of motions (according to the model)?




Intercept: Slope:
• If a party has 30 seats, how many motions can we expect?
o W2%X2+5 = U + V ∗ 5ZU%5
o W2%X2+5 = 38.11 + 7.17 ∗ 5ZU%5
o \ = 38.11 + 7.17 ∗ 30 = 253.3
W2%[2+5
• We often use VQ and V9 instead of use U and V
o T3 = VQ + V9 <3
o The subscript X stands for the number of the observation,
T9 is the value of the response variable T for the first observation in the dataset,
T3 is the value of the response variable T for any observation X in the dataset.

ERROR: There are observations not on the regression line, there is error! All models are wrong

, Including error in the equation
• T3 = VQ + V9 <3 + ]3 | All models are wrong, but we make assumptions about error (e.g. it is random for all cases)
• Ε[T3 |<3 ] = VQ + V9 <3 | That’s why we work with the expected value of T3 given a value of bE

HOW DO WE DRAW THE REGRESSION LINE?
• Ordinary Least Squares: Minimizes the residual sum of squares; a residual is the difference between a data point and the regression line




• Squaring these residuals gives us squared residuals, or squares; the sum of the squared residuals is 992 = 24680.2
• The regression line is chosen in such a way that the residual sum of squares is as small as possible, least squares




Calculating the regression line
• 992 = ∑(T3 − Tc3 )6
• 992 = ∑(T3 − VQ − V9 <3 )6
• Tc3 = VQ − V9 <3 ; Tc3 refers to the predicted value of y according to the regression model

Analyze > Correlate > Bivariate > Select Options > Cross-
product deviations and covariances

eR (predicted/estimated dR ) in our example
d

∑(<3 − <̅ )(T3 − Tg) 25908
Vf9 = = = 7.17
(<3 − <̅ )6 3612
Vf9 = 7.17




eS (predicted/estimated dS ) in our example
d

h
VQ = Tg − Vf9 <̅
hQ = 199.5 − 7.17 ∗ 22.5 = 38.17
V
h
VQ = 38.17

Multiple explanatory variables: If you have more than one explanatory variable in your model,
you can still calculate the ‘least squares’, this is what SPSS is for!

Regression: Key assumptions
1. It makes sense to treat the relationship between Ε[T3 |<3 ] and the x variable as linear and additive
2. Ε[T3 |<3 ] = 0, error exists but is assumed to be random, so not relevant for estimating point-values
T3 = VQ + V9 <3 + ]3
Ε[T3 |<3 ] = VQ + V9 <3
What variables are suitable for regression?
• Dependent variable: Interval-ratio scale response variables
o Must have the same substantive meaning anywhere on the scale, e.g. profit, GDP
• Otherwise, modification is needed:
o Nominal/Ordinal scale: Logistic regression (blue/brown, agree, strongly agree)
o Count scale (non-negative integers): Poisson and negative binomial regression models; NOT in this course (war casualties)
• Explanatory variables can be of any type (with modification)
• Variable values must vary (variance cannot be zero)
$5.99
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached


Also available in package deal

Reviews from verified buyers

Showing all 2 reviews
4 year ago

4 year ago

4.0

2 reviews

5
0
4
2
3
0
2
0
1
0
Trustworthy reviews on Stuvia

All reviews are made by real Stuvia users after verified purchases.

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
polscinotes Universiteit Leiden
Follow You need to be logged in order to follow users or courses
Sold
176
Member since
5 year
Number of followers
116
Documents
41
Last sold
6 months ago
IRO Lecture notes and Book summaries

Hi! I'm a third-year International Relations and Organizations student at Leiden University who is also part of Honors College. I sell my class notes and book summaries. I have a GPA of 8.6.

4.1

30 reviews

5
10
4
16
3
2
2
0
1
2

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions