100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Class notes

Statistics I

Rating
-
Sold
-
Pages
38
Uploaded on
16-02-2025
Written in
2022/2023

This course is intended to introduce students without prior training to the use of quantitative methods in the social sciences. The course introduces students to basic statistics: how to summarise large amounts of information efficiently and how to draw basic inferences. As political scientists, we are interested in answering questions such as “What is the association between country size and civil war?”. To answer such questions, we require 1) data and 2) methods and techniques to process such data. In this course, you will learn how to describe data, apply and interpret the results from simple statistical tests as well as familiarise yourself with the statistical software R.

Show more Read less
Institution
Course











Whoops! We can’t load your doc right now. Try again or contact support.

Connected book

Written for

Institution
Study
Course

Document information

Uploaded on
February 16, 2025
Number of pages
38
Written in
2022/2023
Type
Class notes
Professor(s)
Tim mickler
Contains
All classes

Subjects

Content preview

Statistics I
STAT

Grade
Three assignments of the workgroups accounting for 20%
 Participation accounting for 10%

Exam – 70%
 Open and closed questions
 Minimum grade of five

Lectures – week 1

Variables and levels of measurement
Variable – any characteristics, number, or quantity that can be measured
and can differ across entities or across time

Variables have different scales or levels of measurement

Levels of measurement – nature of
information of the values assigned to
variables

Types of levels
Nominal variable – type of categorical
variable that includes two or more
exclusive categories with no natural order

Ordinal variable – type of categorical variable with clear ordering of the
values
 E.g. low <-> high, little <-> much, small <-> large
 Distance between values not the same across the levels
 Relative comparison

Numerical variable – a variable where the measurement is typically
represented by numbers

Continuous variable – a continuous numeric variable can be measured to
any level of precision
 Alternative levels of measurement - two forms of continuous variables
(Stanley smith Stevens);
- Interval – numerical variable but the zero is arbitrary/meaningless
- Ratio – like interval but meaningful zero

Discrete variables – cannot be measured to any level of precision, only
certain, countable values (usually whole numbers) are possible

,Explanatory & response variables
Explanatory (independent) variable
 Cause
 Often written as X

Response (dependent) variable
 Outcome
 Often written as Y

Organizing variables
Common format of dataset
- Each column = particular variable
- Each row = given record of the data set in question
- Each cell = one observation on one element in our dataset

Measures of central tendency

Distribution
When we collect data, we can show how the values are distributed in
relation to other values

Frequency distribution – display of the pattern of frequencies of a variable
of a data set
 Show all the possible values (or intervals) of the data and how often
they occur

Skewness and symmetry
There is an infinite number of distributions – symmetrical, bimodal,
multimodal




Asymmetrical distributions
 Negative (left) skew – mass concentrated on the right; left tail is longer
 Positive (right) skew – mass concentrated on the left; right tail is longer

How can we summarise/describe distributions of variables
Option 1 – visualize data

Option 2 – calculate measures to summarise data

, Measure of central tendency – a value that describes a set of data by
identifying the central position within that set of data
 Measure of dispersion – how stretched or squeezed is the distribution

Level of measurement Measures of central tendency
Nominal Mode
Ordinal Median + mode
Numeric Mean + median + mode


Mode – the most frequent score in a data set
 There can be several modes

Median – middle score for a set of data that has been arranged in order of
magnitude
 Even number of scores -> convention add two numbers in the middle
and divide them by two

(Arithmetic) Mean:



 Mean is sensitive to extreme values (outliers)
- If extreme values are in the data set the median may be more useful
 Median – robust statistic

Measures of dispersion

How stretched or squeezed is the distribution
Level of measurement Measure of dispersion
Nominal No measure of dispersion possible
Ordinal Range, inter-quartile range
Numeric Range, inter-quartile range,
variance/standard deviation

The range – the difference between the lowest and highest value
 Range = maximum – minimum

Range & interquartile range
We can split data into chunks (quantiles)

Many quantiles exist but some are common;
 Percentile; distribution is divided into 100 parts
 Deciles; distribution is divided into 1o parts
 Quintiles; distribution is divided into 5 parts
 Quartiles; distribution is divided into 3 parts

A common form of range – interquartile range
 The IQR is the range of the middle 50% of the data

,  Calculated by subtracting the 1st quartile from the 3rd quartile
- First quartile Q1 – median of the 50% smallest entries
- Third quartile Q3 – median of the 50% largest entries

Variance and standard deviation
Problem – interquartile range uses only a selection of data (which makes it
robust against outliers)

Measures of spread using all data – deviance = Xi – X (difference between
value and mean)

Once we have the deviance of all we can calculate the sum of all
deviances: total deviance ->

Total deviance:
Problem – total deviance is always zero (negative and positive deviations)
 Not a useful measure of spread

Instead we calculate the sum of squared errors (SS) ->



Two steps
1) Square the deviances (difference between mean and values)
2) Add the squared deviances

Variances (s2)
Problem – increase of n (number of observations) – increase of sum of
squared errors
- Not a useful measure to compare

Solution – divide sum of squared errors by number of observations (N)



minus 1
*n – 1 is bessels correction

Standard deviation (s)




Larger standard deviation – bigger spread/dispersion around the mean

The standard deviation is dependent on the scale
$8.38
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Get to know the seller
Seller avatar
semwaltman

Also available in package deal

Get to know the seller

Seller avatar
semwaltman Leiden University College The Hague
Follow You need to be logged in order to follow users or courses
Sold
7
Member since
3 year
Number of followers
4
Documents
10
Last sold
2 year ago

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions