College aantekeningen

MAT-22306 Lectures Quantitative Research Methodology and Statistics

Name: MAT-22306 Lectures Quantitative Research Methodology and Statistics
SKU: doc_1287955
Rating: 4.00 (2 reviews)
Author: Nerine

Beoordeling

4.0

(2)

Verkocht

Pagina's

Geüpload op

09-09-2021

Geschreven in

2021/2022

Extensive lecture summary of the course Quantitative Research Methodology and Statistics (MAT) at Wageningen University (WUR). Slides included as examples to give an extensive overview.

Instelling

Vak

Voorbeeld van de inhoud

MAT22306 - Quantitative research methodology and statistics
Lecture 1.1
Data types and distributions:
Variables must be able to vary (have different values), e.g. gender (can be male/female). Male is not a variable, as it
cannot vary. Male is a level of variable.

Types of variables:
Categorical/nominal: there’s no order or magnitude. Solely distinguishes between levels.
Ordinal: distinguishes between levels, fixed order. Clear order, no clear magnitude/difference between the values.
Interval: distinguished between levels and values, with a fixed order and there’s equal distance from the differences.
Ratio: distinguished between levels and values, with a fixed order. Distances are equal, but now there’s a natural zero

Describing findings of variables:
Categorical: reporting in percentages or frequencies (56 oranges, 60 apples)
Ordinal: reporting in percentages or frequencies.
Interval: infinitely many options (infinite categories). Report in summary measures for mean, central tendency, and
width of distribution.
Ratio: infinitely many options (infinite categories). Report in summary measures for mean, central tendency, and width
of distribution.

Measures of central tendency:
How to summarize groups of people with one measure? Describe the typical/average income in group
Mode: most common occurrence. Measure of centrality
Median: middle person
Mean: what is the average?

In a normal distribution, all central tendency measures are the same.

Measures of distribution:
Shows the difference/spread in the sample, used with percentiles (%) or % ranges

Standard deviation: the average distance from the average.
Formula: sum (each individual observation – overall mean) ² / total nr of observations. So,
(squared difference between the value of an observation minus the mean).

Sum of Squares (SS): for every score you have, you calculate the difference to the mean (obs –
mean), and square it. Add all of these up. The more observations, the > the sum.

Variance: independent variation from the number of observations around mean. Formula:
Sum of squares / total number of observations.

Normal distribution notation: N(μ, σ)
Standard normal distribution (z-distribution) notation: N(0, 1). μ = 0 σ = 1. → Tabel Field p. 995-998.
Standard normal distribution: number of standard deviations
from the mean. Number: how much of the total observations
is lower than the z-value?

Rules of thumb normal distribution:
Generally, 50% is lower than the mean.
68% is between + and – 1 standard deviation. 1 SD from the
mean, means 2/3 of the sample (68%), etc

,Kurtosis: indicates the pointiness (how high the top value) is of the distribution. Three possibilities: Leptokurtic = very
high point.
Mesokurtic = normal
Platykurtic = flattened.

Lack of symmetry: skewness. Can be tricky as
the mean can no longer be used as a central
tendency value of the data.
Positive skewness = longer tail towards positive
values
Negative skewness = longer tail towards
negative values.

Checks for normal distribution/normality:
1) Histogram: does it look like a bell-shaped curve/ND?
2) Boxplot: median is given, around that box of 50% of all observations. Symmetric in box and whiskers? Whiskers
(uiteinden) should capture about 95% of the values.
3) Q-Q plot: are the predicted residuals under normality the same as the observed residuals (difference between
mean)? Ideally all residuals should be on the straight line.

Fixing non-normality:
Many real world situations have a lowest possible value of 0, e.g. income, distance, time spent on task. Then you get
a positively skewed distribution (figure above), which is called log-normal. In cases where it makes sense to think
about doubling distance or times (e.g. spending 1 or 2 secs on a task, or 1 or 2 minutes), then you can calculate the
logarithm of such a scale. Then the skewed data could transforms to a normal distribution.

Sample and population:
Population = every case of interest
Sample = part of the population, which we try to generalize to the population at large

Population estimates require random samples. Inferential statistics: making population claims based on sample.

Estimate values for population through sample:
μ: sample mean (M or 𝑥̅ ) is an estimate for population mean (μ)
σ: sample SD (s) is an estimate of population SD (σ). N-1 is a correction for small samples

Sample distribution (bell figure) will become narrower when the sample is larger. Meaning,
the larger the number of observations, the better the sample mean is an estimate of the population.

Standard error of the mean (SE): the standard deviation of the sample distribution. Larger sample, smaller SE.
Estimator formula: sample standard deviation / square root N.

,Lecture 1.2
Sample distribution: is normally distributed around the population mean, with SD called standard error (σ/√𝑛).
Standard error = the standard deviation of the sampling distribution.

When one sample is outside the e.g. 95% range, we conclude it does not belong to H0. (alpha = 0.05). Meaning, it is
unlikely that the sample was drawn from a population that had that actual population mean mu.

Significance only indicates whether there’s evidence for a difference, however small. We conclude that something
does not belong to a general population. Says little/nothing about relevance.

Transform data to a z-distribution:
(Sample mean – population mean) / standard deviation of the sampling distribution.
After getting the sample z-value, the new sample distribution follows the N(0,1).

Z-distribution

T-distribution

Estimate SE of population through SE of sample. Calculate
standard error of the sample by taking the standard
deviation and divide by square root n. The smaller the
sample, the flatter the t-distribution.

Difference in critical values: 95% z-distribution is always + - 1.96. In a t-distribution this depends on the number of
observations if that number becomes larger. → book p. 999-1000

Df (degrees freedom): number of total observation – number of parameters used to estimate situation.

T-distribution has heavier tails, a bit flatter than the ND (more probability over extreme ranges). How flat/heavy the
tails is determined by df. The t-distribution becomes standard normal (z-)distribution if df becomes infinite.

Assumptions t-distribution:
• Data is measured on interval or ratio scale
• Observations follow the normal distribution
• Based on independent observations.

The more observations (df), the steeper t gets. Especially with a
small group < 20, than the t is really different from the z.

Rule inferential statistics: we can only conclude something at a
given confidence, not 100% certain. We decide the confidence.

Type 1 and Type 2 error
Type 1 error: when in reality the null hypothesis is true, but we
reject it. Incorrectly conclude something is going on, while it’s not.

Type 2 error: something is going on, but we didn’t see it based on
sample. Beta depends on effect size, # observations, alpha (acceptance
for type 1.

Problem: The more critical on not having false positives (type 1, alpha),
the larger the chance that we miss something (type 2, beta). We want to have more compelling evidence.

, In sum:
α (alpha) = critical p-value: proportion of sample where we accept that if less than 5% of samples is beyond the point
we accept, it is probably not part of the null hypothesis.

Test statistic = calculated value (z or t). We have to find a reference point; critical t-value found with df.

Confidence interval = range in which a specific value is likely to be with given confidence. Complement of alpha: 1 – α

Rejection region= outcomes for the test statistic where we conclude H0 is not true (reject H0, support Ha). Dit is dan
buiten de 95% curve. De Rejection Region zijn de Test Statistic uitkomsten die buiten de level of significance/alpha vallen. Als je dus 0.10 en
two-sided hypothesis, heb je een rejection region van 0.10, met aan de linkerkant 0.05 en de rechterkant 0.05. One-sided: 0.10 aan die zijde.

Rejecting and accepting H0:
Outcome probability > alpha: we accept H0, Ha has not been shown
Outcome probability < (of gelijk) alpha: we reject H0, Ha has been shown

Statistical test-procedure:

Meld schending auteursrecht

Geschreven voor

Instelling: Wageningen University (WUR)
Studie: MSc MME Management, Economics And Consumer Studies
Vak: Quantitative Research Methodology and Statistics

Documentinformatie

Geüpload op: 9 september 2021
Aantal pagina's: 31
Geschreven in: 2021/2022
Type: College aantekeningen
Docent(en): Jos hageman
Bevat: Alle colleges

Onderwerpen

spss
statistics
regression analysis
factor analysis
analysis of variance

$4.11

Krijg toegang tot het volledige document:

Geschreven door studenten die geslaagd zijn

Direct beschikbaar na je betaling

Online lezen of als PDF

Maak kennis met de verkoper

Nerine

3.9

(12)

Beoordelingen van geverifieerde kopers

Alle 2 reviews worden weergegeven

JoostGooi MSc MME Management, Economics And Consumer Studies

4 jaar geleden

iwur AEP · 17 beoordelingen

4 jaar geleden

4.0

2 beoordelingen

Betrouwbare reviews op Stuvia

Alle beoordelingen zijn geschreven door echte Stuvia-gebruikers na geverifieerde aankopen.

Maak kennis met de verkoper

Nerine Wageningen University

Bekijk profiel

Volgen

Verkocht

Lid sinds

10 jaar

Aantal volgers

Documenten

Laatst verkocht

9 maanden geleden

3.9

12 beoordelingen

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper Nerine. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor $4.11. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 41602 samenvattingen verkocht Opgericht in 2010, al 16 jaar dé plek om samenvattingen te kopen

MAT-22306 Lectures Quantitative Research Methodology and Statistics

Voorbeeld van de inhoud

Geschreven voor

Documentinformatie

Onderwerpen

Beoordelingen van geverifieerde kopers

Maak kennis met de verkoper

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Niet tevreden? Kies een ander document

Betaal zoals je wilt, start meteen met leren

Bezig met je bronvermelding?

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?