100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

Summary lectures Quantitative Methods

Rating
-
Sold
8
Pages
36
Uploaded on
14-02-2022
Written in
2021/2022

Comprehensive summary of all lectures Quantitative Methods divided into the six themes that were also used during the lectures. Includes examples and screenshots from the lectures to clearly describe theory.

Institution
Course











Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Course

Document information

Uploaded on
February 14, 2022
Number of pages
36
Written in
2021/2022
Type
Summary

Subjects

Content preview

Samenvatting Quantitative Methods:
Theme 1: Intro, Variables and techniques, OLS

Data: a matrix of different observations.
Observation: the unit of analysis (people, organisations) and the collected data, the variables (age,
gender etc.)

SPSS is a program where you are looking for relationships between these variables. Looking for the
‘ideal’ model.

Dependent variable (or response variable): a variable whose value depends on that of another. A
variable thought to be affected by changes in an independent variable. You can think of this variable
as an outcome.
Independent variable (or explanatory variable): a variable thought to be the cause of some effect.
This term is usually used in experimental research to describe a variable that the experimenter has
manipulated. Whose variation does not depend on that of another.

Manifest variable: directly observable variables for which we collect data (gender, income). You can
directly use it in variables.
Latent variable: latent means that something is not observable directly (for example globalisation).

Level of measurement:
 Nominal: is also known as categorical or qualitative (colour, type of chocolate). When two
things that are equivalent in some sense are given the same name (or number), but there are
more than two possibilities.
 Ordinal: ordinal variables have a meaningful order but the intervals between the values in
the scale may not be equal (rank, satisfaction, fanciness). Example: smaller difference
between ‘very satisfied’ and ‘satisfied’ but a bigger difference between ‘satisfied’ and
‘unsatisfied’. When categories are ordered.
 Interval/ratio: this label includes things that can be measured rather than classified or order
such as number of customers. Interval ratio data is also known as scale, quantitative or
parametric. The ratios of values along the scale should be meaningful. For this to be true, the
scale must have a true and meaningful zero point.

Linear regression analysis (OLS):
Interval ratio




You are trying to explain a particular (dependent = Y) variable (like housing prices).

,For example: We want to know how the average income differences when we change gender by one
unit.
 A linear regression is adaptive.
 The dots in a model are the combinations of variables. For each observations you have a
unique combinations of variables (the dots).
 There is a linear line in the model that approximates these observations as good as possible.
 Variation: is the variation between all the dots.

R-squared (goodness of fit): defines how well this model fits the observed data. R2 represents the
amount of variance in the outcome explained by the model relative how much variation there was to
explain in the first place. You want to obtain a model that is as close as possible by the observations.
You hope that your model predicts a variation as good as possible. That would be a model where the
dots (the observations) are perfectly on the line. In that case you have a r-square of 100%.

Check model assumptions: to check if the model is ‘good’

1. The sample consists of independent observations: you need to insure that the observations
are independent from each other (so for example not collaborating in the survey).
2. A linear model is suitable, that is, the relationship between the dependent variable and the
independent variable is linear: we need to check the linearity assumption.




Model A & C are correct as a linear model. In model A there is a linearity but there is a spread (in
answers). C is correct because there is a linearity in the line and there is an equal quality of
predictions.
3. The variance of the residuals is equal for alle possible values of the independent variables
(constant variance or homoscedasticity): the variance is constant which means that the
observations that we have, need to be around the ‘zero line’. This is important because when
the dependent variable becomes higher, the lower the reliable predictions will become.
4. The residuals are normally distributed: there needs to be a standard deviations. 2/3 of the
observations needs to be in the standard deviations. This is a very important point whether
we want to draw conclusions.




NL: residuals = verschillen tussen de waargenomen en de door regressie analyse verkregen voorspelde waarden
van een kansvariabele.

,Residuals: The differences between what the model predicts and the observed data in a linear model
(same as deviations). Sometimes the residuals will give an error. To asses the error in a linear model
we use the sum of squared residuals. The residuals sum of square is a measure of how well a linear
model fits the data. If the squared differences are large, the model is not representative of the data
(there is a lot of error in prediction); if the squared differences are small, the line is representative.

Outliers: extreme observations. It is extremely different than the rest. There are problematic because
you try to identify the relation between the dependent and the independent variable and outliers
can show another (not realistic) view on it.

Detect outliers:
- Look at the observations beyond three standard deviations of mean
- Boxplots, histograms, probability plots, scatter plots

Study impact of influential cases:
- Idea is to compare regression outcomes with and without influential cases
- SPSS: influence of case on individual coefficients (DFBETA) and on the model fit (DFFIT)
- Influential cases with Cook’s distance > 1

Multicollinearity: the problem where the correlation between two (or more) explanatory
(dependent) variables is too high ( R < 0,8 or 0,9). If this is so high you can not identify effects
individually from each other.
Problems:
- Standard errors of regression coefficients increase  untrustworthy coefficients
- Limits size of R
- Interpretation of relevance of individual explanatory variables becomes impossible

Dummy variables: are categorical variables that have two values (men and women). It is a value that
takes value 0 or 1.




When do you need a dummy variable?
Continuous Not necessary
Ordinal Not necessary if linear trend exists, otherwise yes
Dichotomous (men/women) Yes
Nominal (more than 2 categories) Create help variables using dummies (number of dummies =
number of categories minus 1)

Interaction variable: we speak of an interaction if the effect of an independent variable is influences
by a second independent variable.
Example: the effect of study hours on grade is different for students with a high level of prior
education than for students with a low level of prior education (the dummy high = 0, low = 1),

, however the hours of study is different for students with low education than for students with high
education. The effects are not parallel anymore. That is where the interaction variables are coming
in.




Overview conditions of linear regression with OLS:
1. The sample consists of independent observations
2. A linear model is suitable, that is, the relationship between the dependent variable and the
independent variable is linear.
3. The variance of the residuals is equal for all possible values of the independent variables
(constant variance or homoscedasticity)
4. The residuals are normally distributed
Linear regression models that predict non-metric dependent variables fail to meet these
conditions. Therefore we use non-linear regression models… the discrete choice model  next
theme



Theme 2: Discrete Choice Model

Basic terms:
Hypothesis testing: We do this all the time, in this course we usually use the T-test, who test a
hypothesis were the coefficient is 0, which we hope to reject. Because we want an explanatory
variable that has an effect on the dependent variable.

Model building is just the case were we are adding variables and hope that these variables explain
the dependent variable.

In social science we take the 5% significant level. You are always checking if the P-value is below 5%.
Than you can reject the hypothesis.

F-test: a test for the overall model significance, if the overall model is not significant the model is
useless. There is no significant evidence that this model helps to predict the dependent variable at
all.

We hope that at least one variable is not equal to 0 (in the coefficient), because than this one
explains the dependent variable.

Normal distribution:
It is a requirement to use hypothesis testing
$8.48
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Get to know the seller
Seller avatar
annick51

Get to know the seller

Seller avatar
annick51 Radboud Universiteit Nijmegen
Follow You need to be logged in order to follow users or courses
Sold
8
Member since
3 year
Number of followers
7
Documents
1
Last sold
2 year ago

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions