100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.6 TrustPilot
logo-home
Summary

Summary Statistics for CSAI I

Rating
-
Sold
-
Pages
31
Uploaded on
21-05-2025
Written in
2024/2025

This comprehensive summary covers everything from Statistics for CSAI I. All lectures (1 to 11) are summarized with clear explanations, relevant formulas, R code examples (including ggplot2, aov(), ()), and practical exam-oriented questions. You’ll find explanations on: - Descriptive statistics (mean, median, dispersion) - Hypothesis testing (z-test, t-test, one- and two-tailed) - Central Limit Theorem and confidence intervals - ANOVA & Factorial ANOVA with effect sizes (η², Cohen’s d) - Chi-squared tests and alternatives (Fisher, McNemar) - Graphs in base R and ggplot2 - APA-style reporting examples and decision rules - Assumption checks like normality (Shapiro), Levene’s test Perfectly suited for exam preparation or resits.

Show more Read less
Institution
Course

Content preview

Statistics for CSAI I

Lecture 1:
Lecture 1 was the introduction to the course and information on the assignments and
exam. Not needed for the exam.


Lecture 2: R Programming
1. RStudio Environment
• Understand RStudio’s layout: Console, Script, Environment, Plots, Packages, etc.
• R runs in the console, but scripts should be used for analyses.
• Learn how to navigate, maximize/minimize, and switch between panels.


2. Basic Commands in R

🧮 Arithmetic Operators:
• +, , , /, ^
• Use parentheses for order of operations.

🔍 Logical Operators:
• Comparisons: ==, !=, <, <=, >, >=
• Logical: & (AND), | (OR), ! (NOT)
• These return TRUE or FALSE.


3. Functions
• Most tasks in R are performed using functions.
• Structure: function_name(argument1, argument2, …)
• Examples:
– sqrt(), round(), log(), exp(), abs()
– Many functions have default arguments (e.g., round(x) rounds to 0 digits if
not specified).
• Use named arguments for clarity: round(x = 3.1415, digits = 2)


4. Variables
• Use <- to assign values to variables.
• Variables types: numeric, character, logical

, • Special values: NA, Inf, Inf, NaN, NULL


5. Vectors
• Use c() to combine elements into a vector.
• You can name elements in a vector.
• Vectors can be indexed using [ ], e.g., x[2]


6. Importing Data
• Use read.csv("file.csv") to import data.
• CSV = comma-separated values.
• Use View() to inspect data in a spreadsheet view.
• Use summary() or head() for previews of the data


7. Data Frames
• Most datasets are stored as data frames.
• Use $ to access columns: data$column
• Use indexing: data[1, "column"], data[row, col]
• Use subset() or logical indexing to filter rows.
• Create new columns, modify or delete columns (NULL removes a column).


8. Factors
• Factors are used to store categorical data.
• Use as.factor() to convert character vectors.
• Levels are important for statistical analysis and graphing.


9. Lists & Matrices
• Lists: combine different types of elements (list(name="Anna", age=21))
• Matrices: like data frames, but all elements must be the same type.


10. Packages
• Packages extend R’s functionality.
• Install: install.packages("packageName")
• Load: library(packageName)
• Only loaded packages can be used.

,11. Saving & Loading Workspace
• Save workspace: save.image("filename.Rdata")
• Load workspace: load("filename.Rdata")


Practical Exercises (you should practice):
• Create variables and vectors (numeric, character, logical).
• Import a .csv file and access columns and rows.
• Subset a data frame based on conditions.
• Create and manipulate factors.
• Install and load packages.
• Save and reload your workspace.


How this aligns with the study guide:
This presentation directly covers the R programming section of your study guide,
including:
• Operators, functions, variables
• Data structures: vectors, data frames, factors
• File handling: importing and saving data
• Using and managing packages


Lecture 3: Descriptive Statistics
1. What Are Descriptive Statistics?
Descriptive statistics describe and summarize features of a dataset without generalizing to
the population.
Main types:
• Central tendency: mean, median, mode
• Dispersion: standard deviation, variance, IQR, range
• Distribution shape: skewness, kurtosis


2. The Mean as a Model
Mean formula:
"
1
𝑋‾ = & 𝑋!
𝑁
!#$

,Deviation from the mean:
deviation = 𝑋! − 𝑋‾
Sum of squared errors:
"

𝑆𝑆 = &(𝑋! − 𝑋‾)%
!#$

Variance:
𝑆𝑆
𝑠% =
𝑁−1
Standard deviation:

𝑆𝑆
𝑠=,
𝑁−1



3. Central Tendency in R
mean(x) # Mean
median(x) # Median
modeOf(x) # Mode (from lsr package)

Handling missing values:
mean(x) # Returns NA if NA is present
mean(x, na.rm = TRUE) # Ignores NA values

Trimmed mean (for reducing influence of outliers):
mean(x, trim = 0.1)



4. Measures of Dispersion in R
sd(x) # Standard deviation
range(x) # Min and max
IQR(x) # Interquartile range
quantile(x, probs = c(0.1, 0.25, 0.75, 0.9)) # Quantiles/percentiles

IQR definition:
IQR = 𝑄& − 𝑄$

,5. Skewness and Kurtosis
Use the psych package:
library(psych)
skew(x)
kurtosi(x)

Acceptable normality ranges:
−1 < skew < 1
−2 < kurtosis < 2


6. Summarizing a Data Frame
summary(df) # Basic summary
describe(df) # Detailed summary from psych package

Notes:
• Asterisk in the output indicates factor variables — avoid interpreting means/SDs for
these.


7. Group-Based Descriptives
describeBy(df, group = df$gender)
aggregate(x ~ group, data = df, FUN = mean)

Examples:
• Compare mean age by gender
• Compare RTs by distractor condition


8. APA-Style Reporting
Examples:
• The average age was 25.5 years (SD = 7.94).
• Age ranged from 18 to 70 (𝑀 = 25.5, 𝑆𝐷 = 7.94), with skewness of 1.87 (SE = 0.05)
and kurtosis of 3.93 (SE = 0.10).
• Males: 𝑀 = 24.2, 𝑆𝐷 = 5.1; Females: 𝑀 = 26.1, 𝑆𝐷 = 4.8
Formatting tips:
• Italicize M, SD
• Include group sizes if applicable

,9. Practice Checklist
• Load driving.csv
• Use mean(), median(), sd(), IQR(), quantile() on numeric variables
• Handle missing data with na.rm = TRUE
• Use describe() and describeBy() to explore data
• Report findings in APA format including skew/kurtosis if relevant


10. Exam Relevance
This presentation supports:
• Descriptive statistics (core concepts and calculations)
• Interpreting output and checking assumptions
• Using R functions for statistical description
• Reporting in APA format


Lecture 4. Graphing and Exploring Data
1. Principles of Effective Graphs
Based on Tufte (2001), good graphs should:
• Show the data clearly
• Encourage critical thinking about the content (not the design)
• Avoid distortion or distraction
• Use minimal ink for maximum information
• Enable comparisons between groups or variables
• Reveal structure in the data


2. Scatterplots in Base R
Create a simple scatterplot:
plot(x = expt$age, y = expt$happy)

Add labels and title:
plot(x = expt$age, y = expt$happy,
xlab = "Age", ylab = "Happy", main = "A scatterplot")

Customize appearance:
plot(x = expt$age, y = expt$happy, pch = 4, col = "red")

pch sets the point symbol. Valid values range from 0 to 25.

,3. Box Plots
Boxplot structure:
• Box: middle 50% of data, with the median and quartiles
• Whiskers: extend to the min and max (excluding outliers)
Create a boxplot by group:
boxplot(age ~ gender, data = expt,
xlab = "Gender", ylab = "Age", main = "Age by Gender")

Customize further with box colors and labels.


4. Histograms
Used to visualize distributions and check for:
• Skewness
• Kurtosis
• Spread and outliers
Create a basic histogram:
hist(expt$age)

Customize:
hist(expt$age, breaks = 10, col = "lightblue",
xlab = "Age", main = "Distribution of Age")



5. Bar Plots

Categorical data frequency:
counts <- table(expt$treatment)
barplot(counts, xlab = "Treatment", ylab = "Frequency", main = "Group
Counts")

Customize labels and colors:
barplot(height = counts,
names.arg = c("Control", "Drug A", "Drug B"),
col = c("red", "green", "blue"))

Means with error bars (lsr package):
library(lsr)
bars(happy ~ treatment, data = expt)

,Add grouping:
bars(happy ~ treatment + gender, data = expt)

Remove error bars:
bars(happy ~ treatment + gender, data = expt, errorFun = FALSE)

Interpretation Tip: Error bars may represent:
• Confidence intervals
• Standard deviation
• Standard error


6. Line Graphs
Points only:
plot(xval, yval, type = "p")

Line only:
plot(xval, yval, type = "l")

Points and line:
plot(xval, yval, type = "b")



7. Saving Graphs
Via R command:
dev.print(device = pdf, file = "scatterplot.pdf", width = 8, height = 8)

Or use the Plots tab in RStudio → “Export” → Save as image or PDF.


8. ggplot2 Overview
ggplot2 builds plots using layers.
Start with:
library(ggplot2)
ggplot(data, aes(x, y)) + geom_point()

Scatterplot with trend line:
ggplot(examData, aes(Anxiety, Exam)) +
geom_point() +

, geom_smooth(method = "lm", color = "red") +
labs(x = "Exam Anxiety", y = "Exam Performance %")

Histogram:
ggplot(examData, aes(Anxiety)) +
geom_histogram(binwidth = 10) +
labs(x = "Anxiety", y = "Frequency")

Boxplot:
ggplot(examData, aes(Gender, Exam)) +
geom_boxplot()

Bar chart (grouped by gender):
ggplot(chickFlick, aes(film, arousal, fill = film)) +
stat_summary(fun = mean, geom = "bar") +
stat_summary(fun.data = mean_cl_normal, geom = "errorbar", width = 0.2) +
facet_wrap(~ gender) +
labs(x = "Film", y = "Mean Arousal") +
theme(legend.position = "none")



9. Graph Types: When to Use
Graph Type Use Case
Scatterplot Continuous × continuous
Boxplot Distribution by category
Histogram Distribution of one continuous var
Bar chart Frequencies or means by category
Line graph Trends or time series


10. TeX Connection (notation from previous weeks)
To plot variables relating to summary statistics (e.g. means or SD), remember:
"
1
𝑋‾ = & 𝑋!
𝑁
!#$


𝑆𝑆
𝑠=,
𝑁−1

These are the basis for understanding what’s behind bar chart heights and error bar sizes.

, 11. Exam Relevance
This presentation directly supports:
• Using R to visualize data
• Identifying patterns, skewness, or outliers
• Using both base R and ggplot2 for professional figures
• Understanding what graphs are appropriate for what types of data


Lecture 5. Hypothesis Testing
1. Parameters vs. Statistics
Measure Population (parameter) Sample (statistic)
Mean 𝜇 𝑥‾
Proportion 𝜋 𝑃
%
Variance 𝜎 𝑠%
Standard deviation 𝜎 𝑠
Correlation 𝜌 𝑟
Regression coefficient 𝛽 𝑏


2. Sampling Error
Even if the population has known parameters (e.g., 𝜇 = 100, 𝜎 = 15), different samples will
yield slightly different means and SDs due to random variation.
Larger sample sizes → lower sampling error → more stable estimates.


3. Estimating Parameters
The sample mean 𝑥‾ is the best point estimate of the population mean 𝜇:
𝜇 ≈ 𝑥‾
But for standard deviation:

1
𝜎≈, ∑(𝑋! − 𝑋‾)%
𝑁−1

This adjustment (dividing by 𝑁 − 1 instead of 𝑁) makes it an unbiased estimator.

Written for

Institution
Study
Course

Document information

Uploaded on
May 21, 2025
Number of pages
31
Written in
2024/2025
Type
SUMMARY

Subjects

$7.09
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Get to know the seller
Seller avatar
aukehilbrands

Get to know the seller

Seller avatar
aukehilbrands Tilburg University
Follow You need to be logged in order to follow users or courses
Sold
-
Member since
1 year
Number of followers
0
Documents
4
Last sold
-

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Trending documents

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions