100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Class notes

Note and StudyGuide of Statistics for Premasters DSS

Rating
5.0
(1)
Sold
3
Pages
22
Uploaded on
05-12-2024
Written in
2024/2025

An detailed, well-structured summary including all the course materials: class slides, final exam example questions, quiz questions posted by the professors, with clear chart and beautiful, neat layout. This is for the course "Statistics for Premasters DSS" at Tilburg University which is part of the Pre-Master Data Science and Society, during the first semester of the academic year 2024 / 2025 (August to December 2024).

Show more Read less
Institution
Course















Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Course

Document information

Uploaded on
December 5, 2024
File latest updated on
December 19, 2024
Number of pages
22
Written in
2024/2025
Type
Class notes
Professor(s)
Eriko fukuda, sasha kenjeeva
Contains
All classes

Subjects

Content preview

Statistics Pre DSS (24fall)
1



Notes & Study Guide




Green Text: code in R

Red Text: differences worth no ng

With sign: very important key points
(referring to given quiz & exam sample ques ons)

To pass or get a good score in final exam, it is strongly
recommended to thoroughly engage with the material
and gain a deep understanding of the concepts and
terms, rather than simply memorizing key points.

Any ques on, please email to:

Version: 202412190011
By: Alice

,Content
Research Methods/Terms .................................................................................................................................................... 3

Different Plot in R Part 1 ....................................................................................................................................................... 4

Different Plot in R Part 2 ....................................................................................................................................................... 5
2

Measures of Data ..................................................................................................................................................................... 6

Population and Sample ......................................................................................................................................................... 7

Hypothesis Testing .................................................................................................................................................................. 8

Z Test / P value / Confidence Intervals ........................................................................................................................... 9

Categorical Variable and Pearson Chi-squared test............................................................................................... 10

Continuous Variable and T test ....................................................................................................................................... 11

Paired T test and One-Sided T-Test ............................................................................................................................... 12

One-Way ANOVA .................................................................................................................................................................. 13

F distribution & run ANOVA in R .................................................................................................................................... 14

Effect Size & Further test of One-Way ANOVA ........................................................................................................ 15

Assumptions of One-Way ANOVA & Factorial ANOVA ....................................................................................... 16

Two-Way / Factorial ANOVA ............................................................................................................................................ 17

Two-Way / Factorial ANOVA in R and Affect Size ................................................................................................... 18

Appendix 1:Basics of R ..................................................................................................................................................... 19

Appendix 2:Basic Operation of R ................................................................................................................................. 20

Appendix 3:Data Graphing in R .................................................................................................................................... 21

Appendix 4:Tests Function in R .................................................................................................................................... 22

, Research Methods/Terms

Types of research
Correlation Observing what naturally goes on in world without directly interfering with it.
Cross-sectional data from people at different age
=> (quasi-experimental, case study, naturalistic observation)
Experimental one or more variables is systematically manipulated to see their effect
=> (cause and effect statement, random sampling)


Type of reliability ability of measure or produce same results under same condition
Test-retest same entities + two different points in time = consistent result
Inter-rater across people = same answer 3
Parallel forms different measures for same thing, result should be same
(eg. four different bathroom scales to measure participants' weight)
Internal consistency whether measurement actually measures it


Type of validity
Internal (the extent) causal relationship of variables can draw correct conclusion
(In an experiment testing a new drug, internal validity ensures that changes in
health outcomes are due to the drug and not other factors like diet or exercise. )
External (the extent) same pattern in real life


Construct whether you are actually measures what you want to measure
Face whether a measure “looks like” it is doing what it supposed to do
( math exams has questions about arithmetic will have high face validity, while if it
has history-related questions, then low face validity )
Ecological whether set up of study = real world scenario, it often comes with practical,
actionable insight outside the research setting.
( memory study on a quiet, controlled lab will lack of ecological validity )


Confounds unmeasured variable that is interested, what threatens internal validity.
Artefacts what threatens the external validity or construct validity of results
=> (movement noise in an EEF signal)

dependent variable (DV) “to be explained/outcome
( study testing the effect of sunlight on plant growth, the plant growth, measured
in height, number of leaves, or weight. )
independent variable (IV) “to do the explaining” / predictor
(In the same study, amount of sunlight)
check Two-Way ANOVA section




SUMMARY

, Different Plot in R Part 1

What are the different types of plots?




Histograms
- identify the shape of distribution
- show skew and kurtosis


(eg. visualize the shape of the
distribution of weight for people
in a weight loss program)
4




Scatter & Line


- display the relationship


(eg. visualize the relation between
amount of sleep and
the level of grumpiness)




SUMMARY

, Different Plot in R Part 2


Box
- depicts median, IQR and range
- to detect outliers




5




Bar


- shows mean score
- error bar displays one of following:
1. confidence interval (usually 95%)
2. standard deviation
3. standard error of the mean




- to compare discrete categories,
therefore, especially for categorical
(ordinal/nominal/binary) data
(eg. visualize the relative frequency
of various ethnicities represented
at an IT company)




SUMMARY

, Measures of Data

mean central of gravity
(for interval and ratio scale data, but sensitive to extreme value)


median middle value : (𝑛 + 1)/2
(for ordinal scale data or interval and ratio scale data, less affected by outliers)


mode frequency (for nominal scale data)
range max − 𝑚𝑖𝑛
percentiles Q2 = 50% = median (Q1 = 25% | Q3 = 75%)
6
Interquartile Range IQR 𝐼𝑄𝑅 = 𝑄 − 𝑄
=> excluding extreme values/outliners ( resistant to outliers )
How to calculate outliners? < Q1-1.5 * IQR or > Q3 + 1.5 * IQR
Skew left/negative-skewed: mean < median
(-1, 1) right/positive-skewed: mean > median
(the direction of the tail)
(Negative numbers are located to
the left of zero on the number line,
and positive numbers are to right)




Kurtosis <0 too flat (platykurtic) => has fewer extreme values, fatter tails
(-2, 2) =0 normal distribution (mesokurtic)
>0 too pointy (leptokurtic) => has more extreme value, lighter tails

Deviation 𝒅𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏 = 𝑿𝒊 − 𝑿

Sum of squared errors (SS) 𝑺𝑺 = ∑𝑵
𝒊 𝟏(𝑿𝒊 − 𝑿)
𝟐



𝑺𝑺
Variances(s2) 𝒔𝟐 = 𝑵 𝟏
(variance is always biased, ≤ true variance)


Standard deviation(s) or (sd) 𝒔 = √𝒔𝟐
how well mean represents the data => large sd: more spread out, small sd: more central to mean
𝐒
Standard error => to quantify how reliable it is, we do that in terms of standard error
√𝐍


The purpose of descriptive statistics is to characterize the data we collected without attempting to understand a
population.


Report descriptives
mean(M), SD, sample size, description characteristics (skewness, kurtosis and SE)

, Population and Sample
Key Idea There is always a discrepancy between sample mean and popula on mean
=> test based on sample is not always reliable, may lead to wrong conclusion


Statistical Model 𝒐𝒖𝒕𝒄𝒐𝒎𝒆𝒊 = (𝒎𝒐𝒅𝒆𝒍/𝒎𝒆𝒂𝒏/𝑿) + 𝒆𝒓𝒓𝒐𝒓𝒊

#Almost never known mu/µ true popula on mean
σ popula on standard devia on
∑ ( µ)
popula on variance 𝜎 =
7
unbiased estimate of variance 𝜎 = ∑ (𝑋 − 𝑋)


n sample size
𝒙 mean of sample
s standard devia on of sample

sample variance 𝑠 = ∑ (𝑥 − 𝑥̅ )
(∑ )
∑ ( ̅) ∑
unbiased sample variance 𝑠 = ( )
= ( )


R provides estimates of the population and not the sample statistics
µ es mate of popula on mean = sample mean = hypothesis popula on mean µ0

Central Limit Theorem 1. mean of sample (𝑥̅ ) = mean of the population (µ)
2. standard error (variability, SE𝑥̅ ) of sample distribution
gets smaller as the sample size (N) increases
3. the shape of the sample distribution
becomes normal as the sample size increases
=> larger samples are more reliable




SUMMARY

, Hypothesis Testing

What is the Goal of hypothesis? to rule out the chance (sample error) as a plausible explana on for the result

What are the Steps:
1. Null Hypothesis H0: a claim of no difference in the popula on (or that an effect is zero)
(MUST before the experiment)

alterna ve hypothesis·Ha Actual Research Aim: H0 is false
1.1 select an α level 1. “cut off” for decision on null
normal 0.05 2. type I error (possible)
3. How certain we want to be when rejec ng a hypothesis
8
also, the threshold used for significance

2. locate cri cal region 1. outcomes that are very unlikely to occur if null hypothesis is true
2. sample means that are not likely to occur if variable actually has no effect

3. compute test sta s c a ra o: compare the obtained differences between the sample mean and the
hypothesized popula on mean with the amount of difference we would
expect without any treatment effect (the standard error)

𝒆𝒔𝒕𝒊𝒎𝒂𝒕𝒆 − 𝒗𝒂𝒍𝒖𝒆 𝒘𝒆 𝒉𝒚𝒑𝒐𝒕𝒉𝒆𝒔𝒊𝒛𝒆
𝒕𝒆𝒔𝒕 𝒔𝒕𝒂𝒕𝒊𝒔𝒕𝒊𝒄 =
𝒔𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒆𝒓𝒓𝒐𝒓

4. whether to reject null hypothesis if: test sta s c = large value => obtained mean difference more than expected
if: large enough in cri cal region => the difference is significant => reject the null
if: test sta s c = rela vely small => the difference is not sufficient => fail reject




SUMMARY
$7.82
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Get to know the seller
Seller avatar
AliceOuterspace
5.0
(1)

Reviews from verified buyers

Showing all reviews
3 months ago

5.0

1 reviews

5
1
4
0
3
0
2
0
1
0
Trustworthy reviews on Stuvia

All reviews are made by real Stuvia users after verified purchases.

Get to know the seller

Seller avatar
AliceOuterspace Tilburg University
Follow You need to be logged in order to follow users or courses
Sold
4
Member since
1 year
Number of followers
0
Documents
4
Last sold
7 months ago

5.0

1 reviews

5
1
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions