Samenvatting

Summary Statistics for CSAI I

Beoordeling

Verkocht

Pagina's

Geüpload op

21-05-2025

Geschreven in

2024/2025

This comprehensive summary covers everything from Statistics for CSAI I. All lectures (1 to 11) are summarized with clear explanations, relevant formulas, R code examples (including ggplot2, aov(), ()), and practical exam-oriented questions. You’ll find explanations on: - Descriptive statistics (mean, median, dispersion) - Hypothesis testing (z-test, t-test, one- and two-tailed) - Central Limit Theorem and confidence intervals - ANOVA & Factorial ANOVA with effect sizes (η², Cohen’s d) - Chi-squared tests and alternatives (Fisher, McNemar) - Graphs in base R and ggplot2 - APA-style reporting examples and decision rules - Assumption checks like normality (Shapiro), Levene’s test Perfectly suited for exam preparation or resits.

Meer zien Lees minder

Instelling

Vak

Voorbeeld van de inhoud

Statistics for CSAI I

Lecture 1:
Lecture 1 was the introduction to the course and information on the assignments and
exam. Not needed for the exam.

Lecture 2: R Programming
1. RStudio Environment
• Understand RStudio’s layout: Console, Script, Environment, Plots, Packages, etc.
• R runs in the console, but scripts should be used for analyses.
• Learn how to navigate, maximize/minimize, and switch between panels.

2. Basic Commands in R

🧮 Arithmetic Operators:
• +, , , /, ^
• Use parentheses for order of operations.

🔍 Logical Operators:
• Comparisons: ==, !=, <, <=, >, >=
• Logical: & (AND), | (OR), ! (NOT)
• These return TRUE or FALSE.

3. Functions
• Most tasks in R are performed using functions.
• Structure: function_name(argument1, argument2, …)
• Examples:
– sqrt(), round(), log(), exp(), abs()
– Many functions have default arguments (e.g., round(x) rounds to 0 digits if
not specified).
• Use named arguments for clarity: round(x = 3.1415, digits = 2)

4. Variables
• Use <- to assign values to variables.
• Variables types: numeric, character, logical

, • Special values: NA, Inf, Inf, NaN, NULL

5. Vectors
• Use c() to combine elements into a vector.
• You can name elements in a vector.
• Vectors can be indexed using [ ], e.g., x[2]

6. Importing Data
• Use read.csv("file.csv") to import data.
• CSV = comma-separated values.
• Use View() to inspect data in a spreadsheet view.
• Use summary() or head() for previews of the data

7. Data Frames
• Most datasets are stored as data frames.
• Use $ to access columns: data$column
• Use indexing: data[1, "column"], data[row, col]
• Use subset() or logical indexing to filter rows.
• Create new columns, modify or delete columns (NULL removes a column).

8. Factors
• Factors are used to store categorical data.
• Use as.factor() to convert character vectors.
• Levels are important for statistical analysis and graphing.

9. Lists & Matrices
• Lists: combine different types of elements (list(name="Anna", age=21))
• Matrices: like data frames, but all elements must be the same type.

10. Packages
• Packages extend R’s functionality.
• Install: install.packages("packageName")
• Load: library(packageName)
• Only loaded packages can be used.

,11. Saving & Loading Workspace
• Save workspace: save.image("filename.Rdata")
• Load workspace: load("filename.Rdata")

Practical Exercises (you should practice):
• Create variables and vectors (numeric, character, logical).
• Import a .csv file and access columns and rows.
• Subset a data frame based on conditions.
• Create and manipulate factors.
• Install and load packages.
• Save and reload your workspace.

How this aligns with the study guide:
This presentation directly covers the R programming section of your study guide,
including:
• Operators, functions, variables
• Data structures: vectors, data frames, factors
• File handling: importing and saving data
• Using and managing packages

Lecture 3: Descriptive Statistics
1. What Are Descriptive Statistics?
Descriptive statistics describe and summarize features of a dataset without generalizing to
the population.
Main types:
• Central tendency: mean, median, mode
• Dispersion: standard deviation, variance, IQR, range
• Distribution shape: skewness, kurtosis

2. The Mean as a Model
Mean formula:
"
1
𝑋‾ = & 𝑋!
𝑁
!#$

,Deviation from the mean:
deviation = 𝑋! − 𝑋‾
Sum of squared errors:
"

𝑆𝑆 = &(𝑋! − 𝑋‾)%
!#$

Variance:
𝑆𝑆
𝑠% =
𝑁−1
Standard deviation:

𝑆𝑆
𝑠=,
𝑁−1

3. Central Tendency in R
mean(x) # Mean
median(x) # Median
modeOf(x) # Mode (from lsr package)

Handling missing values:
mean(x) # Returns NA if NA is present
mean(x, na.rm = TRUE) # Ignores NA values

Trimmed mean (for reducing influence of outliers):
mean(x, trim = 0.1)

4. Measures of Dispersion in R
sd(x) # Standard deviation
range(x) # Min and max
IQR(x) # Interquartile range
quantile(x, probs = c(0.1, 0.25, 0.75, 0.9)) # Quantiles/percentiles

IQR definition:
IQR = 𝑄& − 𝑄$

,5. Skewness and Kurtosis
Use the psych package:
library(psych)
skew(x)
kurtosi(x)

Acceptable normality ranges:
−1 < skew < 1
−2 < kurtosis < 2

6. Summarizing a Data Frame
summary(df) # Basic summary
describe(df) # Detailed summary from psych package

Notes:
• Asterisk in the output indicates factor variables — avoid interpreting means/SDs for
these.

7. Group-Based Descriptives
describeBy(df, group = df$gender)
aggregate(x ~ group, data = df, FUN = mean)

Examples:
• Compare mean age by gender
• Compare RTs by distractor condition

8. APA-Style Reporting
Examples:
• The average age was 25.5 years (SD = 7.94).
• Age ranged from 18 to 70 (𝑀 = 25.5, 𝑆𝐷 = 7.94), with skewness of 1.87 (SE = 0.05)
and kurtosis of 3.93 (SE = 0.10).
• Males: 𝑀 = 24.2, 𝑆𝐷 = 5.1; Females: 𝑀 = 26.1, 𝑆𝐷 = 4.8
Formatting tips:
• Italicize M, SD
• Include group sizes if applicable

,9. Practice Checklist
• Load driving.csv
• Use mean(), median(), sd(), IQR(), quantile() on numeric variables
• Handle missing data with na.rm = TRUE
• Use describe() and describeBy() to explore data
• Report findings in APA format including skew/kurtosis if relevant

10. Exam Relevance
This presentation supports:
• Descriptive statistics (core concepts and calculations)
• Interpreting output and checking assumptions
• Using R functions for statistical description
• Reporting in APA format

Lecture 4. Graphing and Exploring Data
1. Principles of Effective Graphs
Based on Tufte (2001), good graphs should:
• Show the data clearly
• Encourage critical thinking about the content (not the design)
• Avoid distortion or distraction
• Use minimal ink for maximum information
• Enable comparisons between groups or variables
• Reveal structure in the data

2. Scatterplots in Base R
Create a simple scatterplot:
plot(x = expt$age, y = expt$happy)

Add labels and title:
plot(x = expt$age, y = expt$happy,
xlab = "Age", ylab = "Happy", main = "A scatterplot")

Customize appearance:
plot(x = expt$age, y = expt$happy, pch = 4, col = "red")

pch sets the point symbol. Valid values range from 0 to 25.

,3. Box Plots
Boxplot structure:
• Box: middle 50% of data, with the median and quartiles
• Whiskers: extend to the min and max (excluding outliers)
Create a boxplot by group:
boxplot(age ~ gender, data = expt,
xlab = "Gender", ylab = "Age", main = "Age by Gender")

Customize further with box colors and labels.

4. Histograms
Used to visualize distributions and check for:
• Skewness
• Kurtosis
• Spread and outliers
Create a basic histogram:
hist(expt$age)

Customize:
hist(expt$age, breaks = 10, col = "lightblue",
xlab = "Age", main = "Distribution of Age")

5. Bar Plots

Categorical data frequency:
counts <- table(expt$treatment)
barplot(counts, xlab = "Treatment", ylab = "Frequency", main = "Group
Counts")

Customize labels and colors:
barplot(height = counts,
names.arg = c("Control", "Drug A", "Drug B"),
col = c("red", "green", "blue"))

Means with error bars (lsr package):
library(lsr)
bars(happy ~ treatment, data = expt)

,Add grouping:
bars(happy ~ treatment + gender, data = expt)

Remove error bars:
bars(happy ~ treatment + gender, data = expt, errorFun = FALSE)

Interpretation Tip: Error bars may represent:
• Confidence intervals
• Standard deviation
• Standard error

6. Line Graphs
Points only:
plot(xval, yval, type = "p")

Line only:
plot(xval, yval, type = "l")

Points and line:
plot(xval, yval, type = "b")

7. Saving Graphs
Via R command:
dev.print(device = pdf, file = "scatterplot.pdf", width = 8, height = 8)

Or use the Plots tab in RStudio → “Export” → Save as image or PDF.

8. ggplot2 Overview
ggplot2 builds plots using layers.
Start with:
library(ggplot2)
ggplot(data, aes(x, y)) + geom_point()

Scatterplot with trend line:
ggplot(examData, aes(Anxiety, Exam)) +
geom_point() +

, geom_smooth(method = "lm", color = "red") +
labs(x = "Exam Anxiety", y = "Exam Performance %")

Histogram:
ggplot(examData, aes(Anxiety)) +
geom_histogram(binwidth = 10) +
labs(x = "Anxiety", y = "Frequency")

Boxplot:
ggplot(examData, aes(Gender, Exam)) +
geom_boxplot()

Bar chart (grouped by gender):
ggplot(chickFlick, aes(film, arousal, fill = film)) +
stat_summary(fun = mean, geom = "bar") +
stat_summary(fun.data = mean_cl_normal, geom = "errorbar", width = 0.2) +
facet_wrap(~ gender) +
labs(x = "Film", y = "Mean Arousal") +
theme(legend.position = "none")

9. Graph Types: When to Use
Graph Type Use Case
Scatterplot Continuous × continuous
Boxplot Distribution by category
Histogram Distribution of one continuous var
Bar chart Frequencies or means by category
Line graph Trends or time series

10. TeX Connection (notation from previous weeks)
To plot variables relating to summary statistics (e.g. means or SD), remember:
"
1
𝑋‾ = & 𝑋!
𝑁
!#$

𝑆𝑆
𝑠=,
𝑁−1

These are the basis for understanding what’s behind bar chart heights and error bar sizes.

, 11. Exam Relevance
This presentation directly supports:
• Using R to visualize data
• Identifying patterns, skewness, or outliers
• Using both base R and ggplot2 for professional figures
• Understanding what graphs are appropriate for what types of data

Lecture 5. Hypothesis Testing
1. Parameters vs. Statistics
Measure Population (parameter) Sample (statistic)
Mean 𝜇 𝑥‾
Proportion 𝜋 𝑃
%
Variance 𝜎 𝑠%
Standard deviation 𝜎 𝑠
Correlation 𝜌 𝑟
Regression coefficient 𝛽 𝑏

2. Sampling Error
Even if the population has known parameters (e.g., 𝜇 = 100, 𝜎 = 15), different samples will
yield slightly different means and SDs due to random variation.
Larger sample sizes → lower sampling error → more stable estimates.

3. Estimating Parameters
The sample mean 𝑥‾ is the best point estimate of the population mean 𝜇:
𝜇 ≈ 𝑥‾
But for standard deviation:

1
𝜎≈, ∑(𝑋! − 𝑋‾)%
𝑁−1

This adjustment (dividing by 𝑁 − 1 instead of 𝑁) makes it an unbiased estimator.

Meld schending auteursrecht

Geschreven voor

Instelling: Tilburg University (UVT)
Studie: CSAI / PMDSS
Vak: Statistics for CSAI I (822187B6)

Alle documenten voor dit vak (1)

Documentinformatie

Geüpload op: 21 mei 2025
Aantal pagina's: 31
Geschreven in: 2024/2025
Type: SAMENVATTING

Onderwerpen

statistics
r programming
test statistics
t test
chi square
anova
apa
skewness
kurtosis
standard deviation
standard error
descriptive statistics
inferential statistics
mean
mode
sigma
summary

€5,94

Krijg toegang tot het volledige document:

100% tevredenheidsgarantie

Direct beschikbaar na je betaling

Online lezen of als PDF

Geen vaste maandelijkse kosten

Maak kennis met de verkoper

aukehilbrands

Maak kennis met de verkoper

aukehilbrands Tilburg University

Bekijk profiel

Volgen

Verkocht

Lid sinds

1 jaar

Aantal volgers

Documenten

Laatst verkocht

0,0

0 beoordelingen

Populaire documenten

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper aukehilbrands. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €5,94. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 50704 samenvattingen verkocht Opgericht in 2010, al 16 jaar dé plek om samenvattingen te kopen