100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4,6 TrustPilot
logo-home
Samenvatting

Summary Statistics GSS + cheat sheet

Beoordeling
-
Verkocht
1
Pagina's
66
Geüpload op
10-09-2025
Geschreven in
2025/2026

Summary of all statistics GSS lectures, including 2 cheat sheets (1 for the first and 1 for the second exam)

Voorbeeld van de inhoud

Statistics GSS – GEO2-2428
Course aims
1. Understand the theoretical and mathematical basis of statistical methods
2. Determine the appropriate statistical analysis method for a research question
3. Conduct the statistical analysis in R
4. Interpret the findings of the statistical analysis
5. Report the results of statistical analyses in a clear and accurate way

Lecture 1 –Introduction to Statistics GSS – 5/2/2025
Assessment →
Exam 1 (35%) → 28/2/2025
Assignment 1 (15%) → 6/3/2025
Exam 2 (35%) → 4/4/2025
Assignment 2 (15%) → 9/4/2025


Lecture 2 – Descriptive statistics and theory estimates –
5/2/2025
Data variables
- Data variables = different types of data
Experimental setups include:
- Response (dependent): What is under observation (Y)
- Explanatory (independent): what is under control (X)
- In an XY-plot, typically, the response is y, the explanatory is x




Why is understanding data types so important?
- The hardest part of any statistical work.. is choosing the right statistical analysis.
The choice depends on the nature of your data and the particular question you
are trying to answer
Types of data: dimensions and units are important!
Numeric vs categorical data
Numeric data is recorded as a quantifiable number.
- It can be continuous – infinitely spread over a range of values (can have (a lot of)
decimals, not per an exact number) → e.g., time, length, weight, area, etc.
- It can also be discrete – whole number values → e.g., data collection day,
number of individuals, count of an occurrence, etc.

,Categorical data is recorded as a qualitative characteristic
- Ordinal – categories with an ordered relation → e.g., small, medium, large;
none, low, moderate, high
- Nominal – categories without ordered relation →, e.g., municipality, color,
species
- Binominal – categories with two possibilities → e.g., yes/no
Organizing our data: how to construct a data frame
- In a data frame, data for each variable should be organized into a column
- The number of rows should be even to the number of observations (n)
- Data frames provide a clear format (matrix) in which data analysis tools such as
Excel and Rstudio can best interpret
- Proper data input = proper plotting and statistics
What comes next in a statistical analysis
- Descriptive statistics: what does our data look like?
- Inferential statistics: what can we infer from that?
Descriptive vs inferential statistics
Descriptive statistics describe data using:
- Graphs, e.g., boxplots, histograms, scatterplots
- Tables
- Summary calculations, e.g., medium, mean/average, standard deviation
Inferential statistics make general conclusions by analyzing trends within a sample
and comparing them to standard models to (try to) understand:
- How does a sample relate to generalized findings and vice-versa?
- Are any differences more than a coincidence (i.e., is it statistically significant?)
- How can past and current data help to project future outcomes?
Why is central tendency important?
- Mode: most often recorded value
- Median: middle value
- Mean: average value
- In normal distribution: mode = mean = median
- Central limit theory: large enough sample sizes will
generally present a ‘normal’ spread from the center
value
- -/+ 1 quartile from the median contains 50% of the
observations
- -/+ 1 standard deviation from the mean contains appr.
68% of the observations




- Data is often not ‘normal

, - Right skew: mode < median < mean
- Left skew: mean < median < mode
- The first step in stats is to check how ‘normally’ spread your data is from its
middle
- Mean = average (sum of observations)/(total
number of observations)
- Median = middle value if you reorder values from
smallest to largest. If there is no middle-value sum
up the two middle ones and take the average
Dispersion: deviation from the mean
- Deviation (dev) = by how much a data point differs
from the mean
1. Sum of squares
- 𝒔𝒔𝒙 = ∑(𝒙 − 𝒂𝒗𝒆𝒓𝒂𝒈𝒆)𝟐
2. Degrees of freedom and the variance
- 𝑑𝑓 = 𝑛 − 1 , Mean = set value for comparison (hence, -1), Df = maximum number
of values that can vary from the mean
𝒔𝒔
- 𝑆 2 = 𝑣𝑎𝑟𝑥 = 𝑑𝑓𝒙
- A variance of 0 means none of the data points diverge from the mean, there is no
variation
3. Standard deviation
- Variance is a squares metric (var or 𝑆 2 ), to standardize it, we need to find the
square root, or standard deviation (sd or S)
- 𝑠𝑑𝑥 = √𝑣𝑎𝑟𝑥
- The standard deviation from the mean tells us how spread out our data is from
the mean
4. Coefficient of variation
- The ratio of standard deviation over the mean
𝑠𝑑𝑥
- Coefficient of variation = 𝐶𝑉 = 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 ∗ 100
- Tells how relatively spread out data is from sample mean
- High CV = large spread/variation from the mean, less central tendency, flatter
bell curve
- Low CV = low spread/variation from the mean, more central tendency, steeper
bell curve
Data quartiles
1. First quartile, Q1
𝑋
- 𝑄1 = 𝑛+1 4
2. Second quartile, Q2
𝑋
- 𝑄2 = 𝑛+1 2
3. Third quartile
𝑋
- 𝑄3 = 𝑛+3 4
Interquartile range (IQR)
- Measures spread of the middle 50% of the data
- IQR = Q3 – Q1
- Large IQR = more dispersed mid-range

, - Small IQR = more clusters mid-range
Outliers
- Values outside of the min. to max. quartile range
- Min = Q1 – 1.5 * IQR
- Max = Q3 + 1.5 * IQR
Statistical Toolbox Part 1
Measures of central tendency
- Mean (average)
- Median
Measures of dispersion (spread)
- The sum of squares (SS)
- Degrees of freedom (df)
- Variance (S, var)
- Standard deviation (sd)
- Coefficient of variation (CV)
- Inter-quartile range (IQR)
Descriptive statistics in research
Very useful for:
- Data cleaning
- Data preparation
- Providing (initial) insights into the dataset
Where to find/include in a report:
- Methods: data cleaning, preparation, and characterization
- Results: show/use (in part) descriptive statistics
Population vs. sample
- Population = universe of units
- Sample = segment of population selected for research
Why a sample?
- Resources
- Data availability
- The main reason is efficiency, and the disadvantage is uncertainty
Population vs. sample: standard notation
Population parameter Sample statistic
Size = N (number of observations) Size = n (number of observations)
Average/mean =  Mean = m or ỹ
Standard deviation =  = √∑(𝒙 − )𝟐 /𝑵 Standard deviation = s, sd, or dev =
√∑(𝒚𝒊 − ỹ)𝟐 /(𝒏 − 𝟏)
Ensuring adequate sample size: why it is important?
- Central limit theory: Samples of at least 30 observations should generally
present a normal distribution
- More samples = higher n = higher df → more certainty in dataset/results +
stronger statistical inference
Randomization
- The process of assigning participants to treatment and control groups assumes
that each participant has an equal chance of being assigned to any group
Hypothesis testing

Documentinformatie

Geüpload op
10 september 2025
Aantal pagina's
66
Geschreven in
2025/2026
Type
SAMENVATTING

Onderwerpen

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
rooslip Universiteit Utrecht
Bekijk profiel
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
33
Lid sinds
4 jaar
Aantal volgers
1
Documenten
8
Laatst verkocht
1 week geleden

4,7

3 beoordelingen

5
2
4
1
3
0
2
0
1
0

Populaire documenten

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen