Summary

Summary Statistics GSS + cheat sheet

Rating

Sold

Pages

Uploaded on

10-09-2025

Written in

2025/2026

Summary of all statistics GSS lectures, including 2 cheat sheets (1 for the first and 1 for the second exam)

Institution

Course

Whoops! We can’t load your doc right now. Try again or contact support.

Report Copyright Violation

Written for

Institution: Universiteit Utrecht (UU)
Study: Global Sustainability Science
Course: Statistics (GEO22428)

All documents for this subject (3)

Document information

Uploaded on: September 10, 2025
Number of pages: 66
Written in: 2025/2026
Type: Summary

Subjects

rstudio
variables
numeric
categorical
ordinal
nominal
binominal
descriptive
inferential
mean
median
left skew
right skew
sum of squares
standard deviation
p value
z score
t test
x
y

Content preview

Statistics GSS – GEO2-2428
Course aims
1. Understand the theoretical and mathematical basis of statistical methods
2. Determine the appropriate statistical analysis method for a research question
3. Conduct the statistical analysis in R
4. Interpret the findings of the statistical analysis
5. Report the results of statistical analyses in a clear and accurate way

Lecture 1 –Introduction to Statistics GSS – 5/2/2025
Assessment →
Exam 1 (35%) → 28/2/2025
Assignment 1 (15%) → 6/3/2025
Exam 2 (35%) → 4/4/2025
Assignment 2 (15%) → 9/4/2025

Lecture 2 – Descriptive statistics and theory estimates –
5/2/2025
Data variables
- Data variables = different types of data
Experimental setups include:
- Response (dependent): What is under observation (Y)
- Explanatory (independent): what is under control (X)
- In an XY-plot, typically, the response is y, the explanatory is x

Why is understanding data types so important?
- The hardest part of any statistical work.. is choosing the right statistical analysis.
The choice depends on the nature of your data and the particular question you
are trying to answer
Types of data: dimensions and units are important!
Numeric vs categorical data
Numeric data is recorded as a quantifiable number.
- It can be continuous – infinitely spread over a range of values (can have (a lot of)
decimals, not per an exact number) → e.g., time, length, weight, area, etc.
- It can also be discrete – whole number values → e.g., data collection day,
number of individuals, count of an occurrence, etc.

,Categorical data is recorded as a qualitative characteristic
- Ordinal – categories with an ordered relation → e.g., small, medium, large;
none, low, moderate, high
- Nominal – categories without ordered relation →, e.g., municipality, color,
species
- Binominal – categories with two possibilities → e.g., yes/no
Organizing our data: how to construct a data frame
- In a data frame, data for each variable should be organized into a column
- The number of rows should be even to the number of observations (n)
- Data frames provide a clear format (matrix) in which data analysis tools such as
Excel and Rstudio can best interpret
- Proper data input = proper plotting and statistics
What comes next in a statistical analysis
- Descriptive statistics: what does our data look like?
- Inferential statistics: what can we infer from that?
Descriptive vs inferential statistics
Descriptive statistics describe data using:
- Graphs, e.g., boxplots, histograms, scatterplots
- Tables
- Summary calculations, e.g., medium, mean/average, standard deviation
Inferential statistics make general conclusions by analyzing trends within a sample
and comparing them to standard models to (try to) understand:
- How does a sample relate to generalized findings and vice-versa?
- Are any differences more than a coincidence (i.e., is it statistically significant?)
- How can past and current data help to project future outcomes?
Why is central tendency important?
- Mode: most often recorded value
- Median: middle value
- Mean: average value
- In normal distribution: mode = mean = median
- Central limit theory: large enough sample sizes will
generally present a ‘normal’ spread from the center
value
- -/+ 1 quartile from the median contains 50% of the
observations
- -/+ 1 standard deviation from the mean contains appr.
68% of the observations

- Data is often not ‘normal

, - Right skew: mode < median < mean
- Left skew: mean < median < mode
- The first step in stats is to check how ‘normally’ spread your data is from its
middle
- Mean = average (sum of observations)/(total
number of observations)
- Median = middle value if you reorder values from
smallest to largest. If there is no middle-value sum
up the two middle ones and take the average
Dispersion: deviation from the mean
- Deviation (dev) = by how much a data point differs
from the mean
1. Sum of squares
- 𝒔𝒔𝒙 = ∑(𝒙 − 𝒂𝒗𝒆𝒓𝒂𝒈𝒆)𝟐
2. Degrees of freedom and the variance
- 𝑑𝑓 = 𝑛 − 1 , Mean = set value for comparison (hence, -1), Df = maximum number
of values that can vary from the mean
𝒔𝒔
- 𝑆 2 = 𝑣𝑎𝑟𝑥 = 𝑑𝑓𝒙
- A variance of 0 means none of the data points diverge from the mean, there is no
variation
3. Standard deviation
- Variance is a squares metric (var or 𝑆 2 ), to standardize it, we need to find the
square root, or standard deviation (sd or S)
- 𝑠𝑑𝑥 = √𝑣𝑎𝑟𝑥
- The standard deviation from the mean tells us how spread out our data is from
the mean
4. Coefficient of variation
- The ratio of standard deviation over the mean
𝑠𝑑𝑥
- Coefficient of variation = 𝐶𝑉 = 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 ∗ 100
- Tells how relatively spread out data is from sample mean
- High CV = large spread/variation from the mean, less central tendency, flatter
bell curve
- Low CV = low spread/variation from the mean, more central tendency, steeper
bell curve
Data quartiles
1. First quartile, Q1
𝑋
- 𝑄1 = 𝑛+1 4
2. Second quartile, Q2
𝑋
- 𝑄2 = 𝑛+1 2
3. Third quartile
𝑋
- 𝑄3 = 𝑛+3 4
Interquartile range (IQR)
- Measures spread of the middle 50% of the data
- IQR = Q3 – Q1
- Large IQR = more dispersed mid-range

, - Small IQR = more clusters mid-range
Outliers
- Values outside of the min. to max. quartile range
- Min = Q1 – 1.5 * IQR
- Max = Q3 + 1.5 * IQR
Statistical Toolbox Part 1
Measures of central tendency
- Mean (average)
- Median
Measures of dispersion (spread)
- The sum of squares (SS)
- Degrees of freedom (df)
- Variance (S, var)
- Standard deviation (sd)
- Coefficient of variation (CV)
- Inter-quartile range (IQR)
Descriptive statistics in research
Very useful for:
- Data cleaning
- Data preparation
- Providing (initial) insights into the dataset
Where to find/include in a report:
- Methods: data cleaning, preparation, and characterization
- Results: show/use (in part) descriptive statistics
Population vs. sample
- Population = universe of units
- Sample = segment of population selected for research
Why a sample?
- Resources
- Data availability
- The main reason is efficiency, and the disadvantage is uncertainty
Population vs. sample: standard notation
Population parameter Sample statistic
Size = N (number of observations) Size = n (number of observations)
Average/mean =  Mean = m or ỹ
Standard deviation =  = √∑(𝒙 − )𝟐 /𝑵 Standard deviation = s, sd, or dev =
√∑(𝒚𝒊 − ỹ)𝟐 /(𝒏 − 𝟏)
Ensuring adequate sample size: why it is important?
- Central limit theory: Samples of at least 30 observations should generally
present a normal distribution
- More samples = higher n = higher df → more certainty in dataset/results +
stronger statistical inference
Randomization
- The process of assigning participants to treatment and control groups assumes
that each participant has an equal chance of being assigned to any group
Hypothesis testing

$7.91

Get access to the full document:

100% satisfaction guarantee

Immediately available after payment

Both online and in PDF

No strings attached

Get to know the seller

rooslip

4.7

(3)

Get to know the seller

rooslip Universiteit Utrecht

View profile

Sold

Member since

4 year

Number of followers

Documents

Last sold

1 week ago

4.7

3 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller rooslip. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $7.91. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 45172 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 15 years now

Summary Statistics GSS + cheat sheet

Written for

Document information

Subjects

Content preview

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?