Summary

Summary Statistics for CSAI I

Rating

Sold

Pages

Uploaded on

21-05-2025

Written in

2024/2025

This comprehensive summary covers everything from Statistics for CSAI I. All lectures (1 to 11) are summarized with clear explanations, relevant formulas, R code examples (including ggplot2, aov(), ()), and practical exam-oriented questions. You’ll find explanations on: - Descriptive statistics (mean, median, dispersion) - Hypothesis testing (z-test, t-test, one- and two-tailed) - Central Limit Theorem and confidence intervals - ANOVA & Factorial ANOVA with effect sizes (η², Cohen’s d) - Chi-squared tests and alternatives (Fisher, McNemar) - Graphs in base R and ggplot2 - APA-style reporting examples and decision rules - Assumption checks like normality (Shapiro), Levene’s test Perfectly suited for exam preparation or resits.

Show more Read less

Institution

Course

Content preview

Statistics for CSAI I

Lecture 1:
Lecture 1 was the introduction to the course and information on the assignments and
exam. Not needed for the exam.

Lecture 2: R Programming
1. RStudio Environment
• Understand RStudio’s layout: Console, Script, Environment, Plots, Packages, etc.
• R runs in the console, but scripts should be used for analyses.
• Learn how to navigate, maximize/minimize, and switch between panels.

2. Basic Commands in R

🧮 Arithmetic Operators:
• +, , , /, ^
• Use parentheses for order of operations.

🔍 Logical Operators:
• Comparisons: ==, !=, <, <=, >, >=
• Logical: & (AND), | (OR), ! (NOT)
• These return TRUE or FALSE.

3. Functions
• Most tasks in R are performed using functions.
• Structure: function_name(argument1, argument2, …)
• Examples:
– sqrt(), round(), log(), exp(), abs()
– Many functions have default arguments (e.g., round(x) rounds to 0 digits if
not specified).
• Use named arguments for clarity: round(x = 3.1415, digits = 2)

4. Variables
• Use <- to assign values to variables.
• Variables types: numeric, character, logical

, • Special values: NA, Inf, Inf, NaN, NULL

5. Vectors
• Use c() to combine elements into a vector.
• You can name elements in a vector.
• Vectors can be indexed using [ ], e.g., x[2]

6. Importing Data
• Use read.csv("file.csv") to import data.
• CSV = comma-separated values.
• Use View() to inspect data in a spreadsheet view.
• Use summary() or head() for previews of the data

7. Data Frames
• Most datasets are stored as data frames.
• Use $ to access columns: data$column
• Use indexing: data[1, "column"], data[row, col]
• Use subset() or logical indexing to filter rows.
• Create new columns, modify or delete columns (NULL removes a column).

8. Factors
• Factors are used to store categorical data.
• Use as.factor() to convert character vectors.
• Levels are important for statistical analysis and graphing.

9. Lists & Matrices
• Lists: combine different types of elements (list(name="Anna", age=21))
• Matrices: like data frames, but all elements must be the same type.

10. Packages
• Packages extend R’s functionality.
• Install: install.packages("packageName")
• Load: library(packageName)
• Only loaded packages can be used.

,11. Saving & Loading Workspace
• Save workspace: save.image("filename.Rdata")
• Load workspace: load("filename.Rdata")

Practical Exercises (you should practice):
• Create variables and vectors (numeric, character, logical).
• Import a .csv file and access columns and rows.
• Subset a data frame based on conditions.
• Create and manipulate factors.
• Install and load packages.
• Save and reload your workspace.

How this aligns with the study guide:
This presentation directly covers the R programming section of your study guide,
including:
• Operators, functions, variables
• Data structures: vectors, data frames, factors
• File handling: importing and saving data
• Using and managing packages

Lecture 3: Descriptive Statistics
1. What Are Descriptive Statistics?
Descriptive statistics describe and summarize features of a dataset without generalizing to
the population.
Main types:
• Central tendency: mean, median, mode
• Dispersion: standard deviation, variance, IQR, range
• Distribution shape: skewness, kurtosis

2. The Mean as a Model
Mean formula:
"
1
𝑋‾ = & 𝑋!
𝑁
!#$

,Deviation from the mean:
deviation = 𝑋! − 𝑋‾
Sum of squared errors:
"

𝑆𝑆 = &(𝑋! − 𝑋‾)%
!#$

Variance:
𝑆𝑆
𝑠% =
𝑁−1
Standard deviation:

𝑆𝑆
𝑠=,
𝑁−1

3. Central Tendency in R
mean(x) # Mean
median(x) # Median
modeOf(x) # Mode (from lsr package)

Handling missing values:
mean(x) # Returns NA if NA is present
mean(x, na.rm = TRUE) # Ignores NA values

Trimmed mean (for reducing influence of outliers):
mean(x, trim = 0.1)

4. Measures of Dispersion in R
sd(x) # Standard deviation
range(x) # Min and max
IQR(x) # Interquartile range
quantile(x, probs = c(0.1, 0.25, 0.75, 0.9)) # Quantiles/percentiles

IQR definition:
IQR = 𝑄& − 𝑄$

,5. Skewness and Kurtosis
Use the psych package:
library(psych)
skew(x)
kurtosi(x)

Acceptable normality ranges:
−1 < skew < 1
−2 < kurtosis < 2

6. Summarizing a Data Frame
summary(df) # Basic summary
describe(df) # Detailed summary from psych package

Notes:
• Asterisk in the output indicates factor variables — avoid interpreting means/SDs for
these.

7. Group-Based Descriptives
describeBy(df, group = df$gender)
aggregate(x ~ group, data = df, FUN = mean)

Examples:
• Compare mean age by gender
• Compare RTs by distractor condition

8. APA-Style Reporting
Examples:
• The average age was 25.5 years (SD = 7.94).
• Age ranged from 18 to 70 (𝑀 = 25.5, 𝑆𝐷 = 7.94), with skewness of 1.87 (SE = 0.05)
and kurtosis of 3.93 (SE = 0.10).
• Males: 𝑀 = 24.2, 𝑆𝐷 = 5.1; Females: 𝑀 = 26.1, 𝑆𝐷 = 4.8
Formatting tips:
• Italicize M, SD
• Include group sizes if applicable

,9. Practice Checklist
• Load driving.csv
• Use mean(), median(), sd(), IQR(), quantile() on numeric variables
• Handle missing data with na.rm = TRUE
• Use describe() and describeBy() to explore data
• Report findings in APA format including skew/kurtosis if relevant

10. Exam Relevance
This presentation supports:
• Descriptive statistics (core concepts and calculations)
• Interpreting output and checking assumptions
• Using R functions for statistical description
• Reporting in APA format

Lecture 4. Graphing and Exploring Data
1. Principles of Effective Graphs
Based on Tufte (2001), good graphs should:
• Show the data clearly
• Encourage critical thinking about the content (not the design)
• Avoid distortion or distraction
• Use minimal ink for maximum information
• Enable comparisons between groups or variables
• Reveal structure in the data

2. Scatterplots in Base R
Create a simple scatterplot:
plot(x = expt$age, y = expt$happy)

Add labels and title:
plot(x = expt$age, y = expt$happy,
xlab = "Age", ylab = "Happy", main = "A scatterplot")

Customize appearance:
plot(x = expt$age, y = expt$happy, pch = 4, col = "red")

pch sets the point symbol. Valid values range from 0 to 25.

,3. Box Plots
Boxplot structure:
• Box: middle 50% of data, with the median and quartiles
• Whiskers: extend to the min and max (excluding outliers)
Create a boxplot by group:
boxplot(age ~ gender, data = expt,
xlab = "Gender", ylab = "Age", main = "Age by Gender")

Customize further with box colors and labels.

4. Histograms
Used to visualize distributions and check for:
• Skewness
• Kurtosis
• Spread and outliers
Create a basic histogram:
hist(expt$age)

Customize:
hist(expt$age, breaks = 10, col = "lightblue",
xlab = "Age", main = "Distribution of Age")

5. Bar Plots

Categorical data frequency:
counts <- table(expt$treatment)
barplot(counts, xlab = "Treatment", ylab = "Frequency", main = "Group
Counts")

Customize labels and colors:
barplot(height = counts,
names.arg = c("Control", "Drug A", "Drug B"),
col = c("red", "green", "blue"))

Means with error bars (lsr package):
library(lsr)
bars(happy ~ treatment, data = expt)

,Add grouping:
bars(happy ~ treatment + gender, data = expt)

Remove error bars:
bars(happy ~ treatment + gender, data = expt, errorFun = FALSE)

Interpretation Tip: Error bars may represent:
• Confidence intervals
• Standard deviation
• Standard error

6. Line Graphs
Points only:
plot(xval, yval, type = "p")

Line only:
plot(xval, yval, type = "l")

Points and line:
plot(xval, yval, type = "b")

7. Saving Graphs
Via R command:
dev.print(device = pdf, file = "scatterplot.pdf", width = 8, height = 8)

Or use the Plots tab in RStudio → “Export” → Save as image or PDF.

8. ggplot2 Overview
ggplot2 builds plots using layers.
Start with:
library(ggplot2)
ggplot(data, aes(x, y)) + geom_point()

Scatterplot with trend line:
ggplot(examData, aes(Anxiety, Exam)) +
geom_point() +

, geom_smooth(method = "lm", color = "red") +
labs(x = "Exam Anxiety", y = "Exam Performance %")

Histogram:
ggplot(examData, aes(Anxiety)) +
geom_histogram(binwidth = 10) +
labs(x = "Anxiety", y = "Frequency")

Boxplot:
ggplot(examData, aes(Gender, Exam)) +
geom_boxplot()

Bar chart (grouped by gender):
ggplot(chickFlick, aes(film, arousal, fill = film)) +
stat_summary(fun = mean, geom = "bar") +
stat_summary(fun.data = mean_cl_normal, geom = "errorbar", width = 0.2) +
facet_wrap(~ gender) +
labs(x = "Film", y = "Mean Arousal") +
theme(legend.position = "none")

9. Graph Types: When to Use
Graph Type Use Case
Scatterplot Continuous × continuous
Boxplot Distribution by category
Histogram Distribution of one continuous var
Bar chart Frequencies or means by category
Line graph Trends or time series

10. TeX Connection (notation from previous weeks)
To plot variables relating to summary statistics (e.g. means or SD), remember:
"
1
𝑋‾ = & 𝑋!
𝑁
!#$

𝑆𝑆
𝑠=,
𝑁−1

These are the basis for understanding what’s behind bar chart heights and error bar sizes.

, 11. Exam Relevance
This presentation directly supports:
• Using R to visualize data
• Identifying patterns, skewness, or outliers
• Using both base R and ggplot2 for professional figures
• Understanding what graphs are appropriate for what types of data

Lecture 5. Hypothesis Testing
1. Parameters vs. Statistics
Measure Population (parameter) Sample (statistic)
Mean 𝜇 𝑥‾
Proportion 𝜋 𝑃
%
Variance 𝜎 𝑠%
Standard deviation 𝜎 𝑠
Correlation 𝜌 𝑟
Regression coefficient 𝛽 𝑏

2. Sampling Error
Even if the population has known parameters (e.g., 𝜇 = 100, 𝜎 = 15), different samples will
yield slightly different means and SDs due to random variation.
Larger sample sizes → lower sampling error → more stable estimates.

3. Estimating Parameters
The sample mean 𝑥‾ is the best point estimate of the population mean 𝜇:
𝜇 ≈ 𝑥‾
But for standard deviation:

1
𝜎≈, ∑(𝑋! − 𝑋‾)%
𝑁−1

This adjustment (dividing by 𝑁 − 1 instead of 𝑁) makes it an unbiased estimator.

Report Copyright Violation

Written for

Institution: Tilburg University (UVT)
Study: CSAI / PMDSS
Course: Statistics for CSAI I (822187B6)

All documents for this subject (1)

Document information

Uploaded on: May 21, 2025
Number of pages: 31
Written in: 2024/2025
Type: SUMMARY

Subjects

statistics
r programming
test statistics
t test
chi square
anova
apa
skewness
kurtosis
standard deviation
standard error
descriptive statistics
inferential statistics
mean
mode
sigma
summary

$7.09

Get access to the full document:

100% satisfaction guarantee

Immediately available after payment

Both online and in PDF

No strings attached

Get to know the seller

aukehilbrands

Get to know the seller

aukehilbrands Tilburg University

View profile

Sold

Member since

1 year

Number of followers

Documents

Last sold

0.0

0 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller aukehilbrands. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $7.09. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 50704 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

Summary Statistics for CSAI I

Content preview

Written for

Document information

Subjects

Get to know the seller

Trending documents

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?