Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Summary

2023 Exam summary, Introduction to Statistical Analysis, Week 1-6 (CM1005)

Rating
-
Sold
-
Pages
28
Uploaded on
12-01-2023
Written in
2022/2023

This summary includes all lecture and tutorial information of week 1-6.

Institution
Course

Content preview

Summary Statistics

Week 1
Statistics: “The study of how we describe and make inferences from data.” (Sirkin)
Ø An inference is “a conclusion reached on the basis of evidence and reasoning.”
Ø Distinction between descriptive & inferential statistics

Different levels of statistics:
1. Univariate (one variable)
2. Bivariate (two variables)
3. Multivariate (more variables)

Descriptive vs inferential statistics: with descriptive statistics one describes only a specific
sample. Inferential statistics is about what a sample says about the whole population.




Unit of analysis: the what or who that is being studied. Also: the unit that you will be able to
draw conclusions about. Typically, all units are the same type of “thing” in a single data set.
Variable: a measured property of each of the units of analysis.

Levels of measurement
- Nominal: group classifications where no meaningful ranking is possible (e.g., religion,
country)
- Ordinal (ORDinal): There is meaningful ranking/ ordering but the distance between
categories is unknown or not equal.
- Interval: similar to ordinal because it is a ranking but the rankings are meaningful,
and the distances are equal. But: 0 does not mean anything/ means ‘lack of’
- Ratio: same as interval but zero is meaningful/ absolute zero point.

We always first need to know the level of measurement in order to know which statistical
techniques we may use for the given variables.

“A continuous variable is measured along a continuum, whereas a discrete variable is
measured in whole units or categories.”  Continuous variables have decimals, discrete do
not.

,Measures of central tendency (CT): To (univariately) describe the distribution of variables on
different levels of measurement.
- The mean (interval/ ratio): all values are added up and divided by n, which is the
number of observations in the sample



Almost the same formula for the population mean:



Characteristics of the mean:
o Changing any score will change the mean
o Adding or removing a score will change the mean (unless that score is equal
to the mean)
o Adding, substracting, multipluing, dividing each score by a given value (same
‘constant’ value) causes the mean to change accordingly
o Sum of differences from the mean is zero:


o Sum of squared differences from the mean is minimal
o Most useful for describing (more or less) normally distributed variables.
- The median (ordinal, interval/ratio): the median is the middle case when sorting all
cases based on their value. Equal amount of cases above and below the median.
Also: 50th percentile.
o The median is not as sensitive to ouliers as the mean.
o Whenever n is an even number, the median is the mean value of the two
middle cases.
o Often used for interval/ratio variables that have skewed distributions.
- The mode (nominal, ordinal, interval/ratio): the mode is the category with the
largest amount of cases.

Measures of CT and distributions: Normal distribution: the mean, median, and mode are all
the same. In a skewed distribution the line is shifter to the left or right.

, Week 2 Lecture
Measures of variability: measures of CT alone carry not enough information to adequately
describe distributions of variables, we need a second type of measures.
E.g., Group 1 has 10 people aged 20 and 10 aged 60, group 2 has 10 people aged 39 and
then aged 41. In this case, the mean does not differ. However, the dispersion/ variability
differs.

The range (ordinal, interval/ratio): distance between highest and lowest score. Is always
reported together with maximum & minimum score and is sensitive to outliers.

The interquartile range (IQR) (ordinal, interval/ratio): based on ‘quartiles’ that spit the data
into four groups of cases. IQE is based on the distance between Q1 and Q3 and insensitive
to outliers since the range describes half of the data.

The variance (interval/ ratio): based on the Sum of Squares, i.e., the squared distance from
the mean. For the calculation of the variance, it matters whether we have the sample data
or the population data (typically we have sample data).
- Variance in a sample is expressed as:



To calculate the sample variance s² of a given variable:
o For each case, we calculate the distance to the sample mean and square that
distance (removes possible minus sign)
o All those squared distances are then added up and divided by the number of
cases in the sample minus one (n-1)
- Variance in a population is expressed as:



(Greek u = population mean)
To calculate the population variance σ² (sigma square) of a given variable:
o For each case, we calculate the distance to the population mean and square
that distance (removes possible minus sign)
o All those squared distances are then added up and divided by the number of
cases in the population (N)

Ø How can we interpret the value of the variance? (e.g., 4.67)
• We don’t, but: “everything is meaningful in comparison”
(i.e. when comparing variances across groups, we can make comparative
statements about more/less dispersion around the mean)
• For the purpose of interpretation, we calculate another measure of
variability: the standard deviation
Ø Why are there two different variance formulas for sample data / population data?
• We often use the sample variance as an ‘estimator’ for the population
variance (which is typically unknown)

Written for

Institution
Study
Course

Document information

Uploaded on
January 12, 2023
Number of pages
28
Written in
2022/2023
Type
SUMMARY

Subjects

$10.70
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF

Get to know the seller
Seller avatar
SS1000

Get to know the seller

Seller avatar
SS1000 Erasmus Universiteit Rotterdam
Follow You need to be logged in order to follow users or courses
Sold
-
Member since
9 year
Number of followers
0
Documents
1
Last sold
-

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Working on your references?

Create accurate citations in APA, MLA and Harvard with our free citation generator.

Working on your references?

Frequently asked questions