100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

Summary Statistics GSS + cheat sheet

Rating
-
Sold
-
Pages
66
Uploaded on
10-09-2025
Written in
2025/2026

Summary of all statistics GSS lectures, including 2 cheat sheets (1 for the first and 1 for the second exam)

Institution
Course











Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Course

Document information

Uploaded on
September 10, 2025
Number of pages
66
Written in
2025/2026
Type
Summary

Subjects

Content preview

Statistics GSS – GEO2-2428
Course aims
1. Understand the theoretical and mathematical basis of statistical methods
2. Determine the appropriate statistical analysis method for a research question
3. Conduct the statistical analysis in R
4. Interpret the findings of the statistical analysis
5. Report the results of statistical analyses in a clear and accurate way

Lecture 1 –Introduction to Statistics GSS – 5/2/2025
Assessment →
Exam 1 (35%) → 28/2/2025
Assignment 1 (15%) → 6/3/2025
Exam 2 (35%) → 4/4/2025
Assignment 2 (15%) → 9/4/2025


Lecture 2 – Descriptive statistics and theory estimates –
5/2/2025
Data variables
- Data variables = different types of data
Experimental setups include:
- Response (dependent): What is under observation (Y)
- Explanatory (independent): what is under control (X)
- In an XY-plot, typically, the response is y, the explanatory is x




Why is understanding data types so important?
- The hardest part of any statistical work.. is choosing the right statistical analysis.
The choice depends on the nature of your data and the particular question you
are trying to answer
Types of data: dimensions and units are important!
Numeric vs categorical data
Numeric data is recorded as a quantifiable number.
- It can be continuous – infinitely spread over a range of values (can have (a lot of)
decimals, not per an exact number) → e.g., time, length, weight, area, etc.
- It can also be discrete – whole number values → e.g., data collection day,
number of individuals, count of an occurrence, etc.

,Categorical data is recorded as a qualitative characteristic
- Ordinal – categories with an ordered relation → e.g., small, medium, large;
none, low, moderate, high
- Nominal – categories without ordered relation →, e.g., municipality, color,
species
- Binominal – categories with two possibilities → e.g., yes/no
Organizing our data: how to construct a data frame
- In a data frame, data for each variable should be organized into a column
- The number of rows should be even to the number of observations (n)
- Data frames provide a clear format (matrix) in which data analysis tools such as
Excel and Rstudio can best interpret
- Proper data input = proper plotting and statistics
What comes next in a statistical analysis
- Descriptive statistics: what does our data look like?
- Inferential statistics: what can we infer from that?
Descriptive vs inferential statistics
Descriptive statistics describe data using:
- Graphs, e.g., boxplots, histograms, scatterplots
- Tables
- Summary calculations, e.g., medium, mean/average, standard deviation
Inferential statistics make general conclusions by analyzing trends within a sample
and comparing them to standard models to (try to) understand:
- How does a sample relate to generalized findings and vice-versa?
- Are any differences more than a coincidence (i.e., is it statistically significant?)
- How can past and current data help to project future outcomes?
Why is central tendency important?
- Mode: most often recorded value
- Median: middle value
- Mean: average value
- In normal distribution: mode = mean = median
- Central limit theory: large enough sample sizes will
generally present a ‘normal’ spread from the center
value
- -/+ 1 quartile from the median contains 50% of the
observations
- -/+ 1 standard deviation from the mean contains appr.
68% of the observations




- Data is often not ‘normal

, - Right skew: mode < median < mean
- Left skew: mean < median < mode
- The first step in stats is to check how ‘normally’ spread your data is from its
middle
- Mean = average (sum of observations)/(total
number of observations)
- Median = middle value if you reorder values from
smallest to largest. If there is no middle-value sum
up the two middle ones and take the average
Dispersion: deviation from the mean
- Deviation (dev) = by how much a data point differs
from the mean
1. Sum of squares
- 𝒔𝒔𝒙 = ∑(𝒙 − 𝒂𝒗𝒆𝒓𝒂𝒈𝒆)𝟐
2. Degrees of freedom and the variance
- 𝑑𝑓 = 𝑛 − 1 , Mean = set value for comparison (hence, -1), Df = maximum number
of values that can vary from the mean
𝒔𝒔
- 𝑆 2 = 𝑣𝑎𝑟𝑥 = 𝑑𝑓𝒙
- A variance of 0 means none of the data points diverge from the mean, there is no
variation
3. Standard deviation
- Variance is a squares metric (var or 𝑆 2 ), to standardize it, we need to find the
square root, or standard deviation (sd or S)
- 𝑠𝑑𝑥 = √𝑣𝑎𝑟𝑥
- The standard deviation from the mean tells us how spread out our data is from
the mean
4. Coefficient of variation
- The ratio of standard deviation over the mean
𝑠𝑑𝑥
- Coefficient of variation = 𝐶𝑉 = 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 ∗ 100
- Tells how relatively spread out data is from sample mean
- High CV = large spread/variation from the mean, less central tendency, flatter
bell curve
- Low CV = low spread/variation from the mean, more central tendency, steeper
bell curve
Data quartiles
1. First quartile, Q1
𝑋
- 𝑄1 = 𝑛+1 4
2. Second quartile, Q2
𝑋
- 𝑄2 = 𝑛+1 2
3. Third quartile
𝑋
- 𝑄3 = 𝑛+3 4
Interquartile range (IQR)
- Measures spread of the middle 50% of the data
- IQR = Q3 – Q1
- Large IQR = more dispersed mid-range

, - Small IQR = more clusters mid-range
Outliers
- Values outside of the min. to max. quartile range
- Min = Q1 – 1.5 * IQR
- Max = Q3 + 1.5 * IQR
Statistical Toolbox Part 1
Measures of central tendency
- Mean (average)
- Median
Measures of dispersion (spread)
- The sum of squares (SS)
- Degrees of freedom (df)
- Variance (S, var)
- Standard deviation (sd)
- Coefficient of variation (CV)
- Inter-quartile range (IQR)
Descriptive statistics in research
Very useful for:
- Data cleaning
- Data preparation
- Providing (initial) insights into the dataset
Where to find/include in a report:
- Methods: data cleaning, preparation, and characterization
- Results: show/use (in part) descriptive statistics
Population vs. sample
- Population = universe of units
- Sample = segment of population selected for research
Why a sample?
- Resources
- Data availability
- The main reason is efficiency, and the disadvantage is uncertainty
Population vs. sample: standard notation
Population parameter Sample statistic
Size = N (number of observations) Size = n (number of observations)
Average/mean =  Mean = m or ỹ
Standard deviation =  = √∑(𝒙 − )𝟐 /𝑵 Standard deviation = s, sd, or dev =
√∑(𝒚𝒊 − ỹ)𝟐 /(𝒏 − 𝟏)
Ensuring adequate sample size: why it is important?
- Central limit theory: Samples of at least 30 observations should generally
present a normal distribution
- More samples = higher n = higher df → more certainty in dataset/results +
stronger statistical inference
Randomization
- The process of assigning participants to treatment and control groups assumes
that each participant has an equal chance of being assigned to any group
Hypothesis testing

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
rooslip Universiteit Utrecht
Follow You need to be logged in order to follow users or courses
Sold
30
Member since
4 year
Number of followers
1
Documents
8
Last sold
1 week ago

4.7

3 reviews

5
2
4
1
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions