100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4,6 TrustPilot
logo-home
Summary

2023 Exam summary, Introduction to Statistical Analysis, Week 1-6 (CM1005)

Rating
-
Sold
-
Pages
28
Uploaded on
12-01-2023
Written in
2022/2023

This summary includes all lecture and tutorial information of week 1-6.

Institution
Course










Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Course

Document information

Uploaded on
January 12, 2023
Number of pages
28
Written in
2022/2023
Type
Summary

Subjects

Content preview

Summary Statistics

Week 1
Statistics: “The study of how we describe and make inferences from data.” (Sirkin)
Ø An inference is “a conclusion reached on the basis of evidence and reasoning.”
Ø Distinction between descriptive & inferential statistics

Different levels of statistics:
1. Univariate (one variable)
2. Bivariate (two variables)
3. Multivariate (more variables)

Descriptive vs inferential statistics: with descriptive statistics one describes only a specific
sample. Inferential statistics is about what a sample says about the whole population.




Unit of analysis: the what or who that is being studied. Also: the unit that you will be able to
draw conclusions about. Typically, all units are the same type of “thing” in a single data set.
Variable: a measured property of each of the units of analysis.

Levels of measurement
- Nominal: group classifications where no meaningful ranking is possible (e.g., religion,
country)
- Ordinal (ORDinal): There is meaningful ranking/ ordering but the distance between
categories is unknown or not equal.
- Interval: similar to ordinal because it is a ranking but the rankings are meaningful,
and the distances are equal. But: 0 does not mean anything/ means ‘lack of’
- Ratio: same as interval but zero is meaningful/ absolute zero point.

We always first need to know the level of measurement in order to know which statistical
techniques we may use for the given variables.

“A continuous variable is measured along a continuum, whereas a discrete variable is
measured in whole units or categories.”  Continuous variables have decimals, discrete do
not.

,Measures of central tendency (CT): To (univariately) describe the distribution of variables on
different levels of measurement.
- The mean (interval/ ratio): all values are added up and divided by n, which is the
number of observations in the sample



Almost the same formula for the population mean:



Characteristics of the mean:
o Changing any score will change the mean
o Adding or removing a score will change the mean (unless that score is equal
to the mean)
o Adding, substracting, multipluing, dividing each score by a given value (same
‘constant’ value) causes the mean to change accordingly
o Sum of differences from the mean is zero:


o Sum of squared differences from the mean is minimal
o Most useful for describing (more or less) normally distributed variables.
- The median (ordinal, interval/ratio): the median is the middle case when sorting all
cases based on their value. Equal amount of cases above and below the median.
Also: 50th percentile.
o The median is not as sensitive to ouliers as the mean.
o Whenever n is an even number, the median is the mean value of the two
middle cases.
o Often used for interval/ratio variables that have skewed distributions.
- The mode (nominal, ordinal, interval/ratio): the mode is the category with the
largest amount of cases.

Measures of CT and distributions: Normal distribution: the mean, median, and mode are all
the same. In a skewed distribution the line is shifter to the left or right.

, Week 2 Lecture
Measures of variability: measures of CT alone carry not enough information to adequately
describe distributions of variables, we need a second type of measures.
E.g., Group 1 has 10 people aged 20 and 10 aged 60, group 2 has 10 people aged 39 and
then aged 41. In this case, the mean does not differ. However, the dispersion/ variability
differs.

The range (ordinal, interval/ratio): distance between highest and lowest score. Is always
reported together with maximum & minimum score and is sensitive to outliers.

The interquartile range (IQR) (ordinal, interval/ratio): based on ‘quartiles’ that spit the data
into four groups of cases. IQE is based on the distance between Q1 and Q3 and insensitive
to outliers since the range describes half of the data.

The variance (interval/ ratio): based on the Sum of Squares, i.e., the squared distance from
the mean. For the calculation of the variance, it matters whether we have the sample data
or the population data (typically we have sample data).
- Variance in a sample is expressed as:



To calculate the sample variance s² of a given variable:
o For each case, we calculate the distance to the sample mean and square that
distance (removes possible minus sign)
o All those squared distances are then added up and divided by the number of
cases in the sample minus one (n-1)
- Variance in a population is expressed as:



(Greek u = population mean)
To calculate the population variance σ² (sigma square) of a given variable:
o For each case, we calculate the distance to the population mean and square
that distance (removes possible minus sign)
o All those squared distances are then added up and divided by the number of
cases in the population (N)

Ø How can we interpret the value of the variance? (e.g., 4.67)
• We don’t, but: “everything is meaningful in comparison”
(i.e. when comparing variances across groups, we can make comparative
statements about more/less dispersion around the mean)
• For the purpose of interpretation, we calculate another measure of
variability: the standard deviation
Ø Why are there two different variance formulas for sample data / population data?
• We often use the sample variance as an ‘estimator’ for the population
variance (which is typically unknown)
$11.00
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Get to know the seller
Seller avatar
SS1000

Get to know the seller

Seller avatar
SS1000 Erasmus Universiteit Rotterdam
Follow You need to be logged in order to follow users or courses
Sold
0
Member since
9 year
Number of followers
0
Documents
1
Last sold
-

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can immediately select a different document that better matches what you need.

Pay how you prefer, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card or EFT and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions