100% tevredenheidsgarantie Direct beschikbaar na je betaling Online lezen of als PDF Geen vaste maandelijkse kosten 4,6 TrustPilot
logo-home
Samenvatting

Summary Statistics for CSAI I

Beoordeling
-
Verkocht
-
Pagina's
31
Geüpload op
21-05-2025
Geschreven in
2024/2025

This comprehensive summary covers everything from Statistics for CSAI I. All lectures (1 to 11) are summarized with clear explanations, relevant formulas, R code examples (including ggplot2, aov(), ()), and practical exam-oriented questions. You’ll find explanations on: - Descriptive statistics (mean, median, dispersion) - Hypothesis testing (z-test, t-test, one- and two-tailed) - Central Limit Theorem and confidence intervals - ANOVA & Factorial ANOVA with effect sizes (η², Cohen’s d) - Chi-squared tests and alternatives (Fisher, McNemar) - Graphs in base R and ggplot2 - APA-style reporting examples and decision rules - Assumption checks like normality (Shapiro), Levene’s test Perfectly suited for exam preparation or resits.

Meer zien Lees minder

Voorbeeld van de inhoud

Statistics for CSAI I

Lecture 1:
Lecture 1 was the introduction to the course and information on the assignments and
exam. Not needed for the exam.


Lecture 2: R Programming
1. RStudio Environment
• Understand RStudio’s layout: Console, Script, Environment, Plots, Packages, etc.
• R runs in the console, but scripts should be used for analyses.
• Learn how to navigate, maximize/minimize, and switch between panels.


2. Basic Commands in R

🧮 Arithmetic Operators:
• +, , , /, ^
• Use parentheses for order of operations.

🔍 Logical Operators:
• Comparisons: ==, !=, <, <=, >, >=
• Logical: & (AND), | (OR), ! (NOT)
• These return TRUE or FALSE.


3. Functions
• Most tasks in R are performed using functions.
• Structure: function_name(argument1, argument2, …)
• Examples:
– sqrt(), round(), log(), exp(), abs()
– Many functions have default arguments (e.g., round(x) rounds to 0 digits if
not specified).
• Use named arguments for clarity: round(x = 3.1415, digits = 2)


4. Variables
• Use <- to assign values to variables.
• Variables types: numeric, character, logical

, • Special values: NA, Inf, Inf, NaN, NULL


5. Vectors
• Use c() to combine elements into a vector.
• You can name elements in a vector.
• Vectors can be indexed using [ ], e.g., x[2]


6. Importing Data
• Use read.csv("file.csv") to import data.
• CSV = comma-separated values.
• Use View() to inspect data in a spreadsheet view.
• Use summary() or head() for previews of the data


7. Data Frames
• Most datasets are stored as data frames.
• Use $ to access columns: data$column
• Use indexing: data[1, "column"], data[row, col]
• Use subset() or logical indexing to filter rows.
• Create new columns, modify or delete columns (NULL removes a column).


8. Factors
• Factors are used to store categorical data.
• Use as.factor() to convert character vectors.
• Levels are important for statistical analysis and graphing.


9. Lists & Matrices
• Lists: combine different types of elements (list(name="Anna", age=21))
• Matrices: like data frames, but all elements must be the same type.


10. Packages
• Packages extend R’s functionality.
• Install: install.packages("packageName")
• Load: library(packageName)
• Only loaded packages can be used.

,11. Saving & Loading Workspace
• Save workspace: save.image("filename.Rdata")
• Load workspace: load("filename.Rdata")


Practical Exercises (you should practice):
• Create variables and vectors (numeric, character, logical).
• Import a .csv file and access columns and rows.
• Subset a data frame based on conditions.
• Create and manipulate factors.
• Install and load packages.
• Save and reload your workspace.


How this aligns with the study guide:
This presentation directly covers the R programming section of your study guide,
including:
• Operators, functions, variables
• Data structures: vectors, data frames, factors
• File handling: importing and saving data
• Using and managing packages


Lecture 3: Descriptive Statistics
1. What Are Descriptive Statistics?
Descriptive statistics describe and summarize features of a dataset without generalizing to
the population.
Main types:
• Central tendency: mean, median, mode
• Dispersion: standard deviation, variance, IQR, range
• Distribution shape: skewness, kurtosis


2. The Mean as a Model
Mean formula:
"
1
𝑋‾ = & 𝑋!
𝑁
!#$

,Deviation from the mean:
deviation = 𝑋! − 𝑋‾
Sum of squared errors:
"

𝑆𝑆 = &(𝑋! − 𝑋‾)%
!#$

Variance:
𝑆𝑆
𝑠% =
𝑁−1
Standard deviation:

𝑆𝑆
𝑠=,
𝑁−1



3. Central Tendency in R
mean(x) # Mean
median(x) # Median
modeOf(x) # Mode (from lsr package)

Handling missing values:
mean(x) # Returns NA if NA is present
mean(x, na.rm = TRUE) # Ignores NA values

Trimmed mean (for reducing influence of outliers):
mean(x, trim = 0.1)



4. Measures of Dispersion in R
sd(x) # Standard deviation
range(x) # Min and max
IQR(x) # Interquartile range
quantile(x, probs = c(0.1, 0.25, 0.75, 0.9)) # Quantiles/percentiles

IQR definition:
IQR = 𝑄& − 𝑄$

,5. Skewness and Kurtosis
Use the psych package:
library(psych)
skew(x)
kurtosi(x)

Acceptable normality ranges:
−1 < skew < 1
−2 < kurtosis < 2


6. Summarizing a Data Frame
summary(df) # Basic summary
describe(df) # Detailed summary from psych package

Notes:
• Asterisk in the output indicates factor variables — avoid interpreting means/SDs for
these.


7. Group-Based Descriptives
describeBy(df, group = df$gender)
aggregate(x ~ group, data = df, FUN = mean)

Examples:
• Compare mean age by gender
• Compare RTs by distractor condition


8. APA-Style Reporting
Examples:
• The average age was 25.5 years (SD = 7.94).
• Age ranged from 18 to 70 (𝑀 = 25.5, 𝑆𝐷 = 7.94), with skewness of 1.87 (SE = 0.05)
and kurtosis of 3.93 (SE = 0.10).
• Males: 𝑀 = 24.2, 𝑆𝐷 = 5.1; Females: 𝑀 = 26.1, 𝑆𝐷 = 4.8
Formatting tips:
• Italicize M, SD
• Include group sizes if applicable

,9. Practice Checklist
• Load driving.csv
• Use mean(), median(), sd(), IQR(), quantile() on numeric variables
• Handle missing data with na.rm = TRUE
• Use describe() and describeBy() to explore data
• Report findings in APA format including skew/kurtosis if relevant


10. Exam Relevance
This presentation supports:
• Descriptive statistics (core concepts and calculations)
• Interpreting output and checking assumptions
• Using R functions for statistical description
• Reporting in APA format


Lecture 4. Graphing and Exploring Data
1. Principles of Effective Graphs
Based on Tufte (2001), good graphs should:
• Show the data clearly
• Encourage critical thinking about the content (not the design)
• Avoid distortion or distraction
• Use minimal ink for maximum information
• Enable comparisons between groups or variables
• Reveal structure in the data


2. Scatterplots in Base R
Create a simple scatterplot:
plot(x = expt$age, y = expt$happy)

Add labels and title:
plot(x = expt$age, y = expt$happy,
xlab = "Age", ylab = "Happy", main = "A scatterplot")

Customize appearance:
plot(x = expt$age, y = expt$happy, pch = 4, col = "red")

pch sets the point symbol. Valid values range from 0 to 25.

,3. Box Plots
Boxplot structure:
• Box: middle 50% of data, with the median and quartiles
• Whiskers: extend to the min and max (excluding outliers)
Create a boxplot by group:
boxplot(age ~ gender, data = expt,
xlab = "Gender", ylab = "Age", main = "Age by Gender")

Customize further with box colors and labels.


4. Histograms
Used to visualize distributions and check for:
• Skewness
• Kurtosis
• Spread and outliers
Create a basic histogram:
hist(expt$age)

Customize:
hist(expt$age, breaks = 10, col = "lightblue",
xlab = "Age", main = "Distribution of Age")



5. Bar Plots

Categorical data frequency:
counts <- table(expt$treatment)
barplot(counts, xlab = "Treatment", ylab = "Frequency", main = "Group
Counts")

Customize labels and colors:
barplot(height = counts,
names.arg = c("Control", "Drug A", "Drug B"),
col = c("red", "green", "blue"))

Means with error bars (lsr package):
library(lsr)
bars(happy ~ treatment, data = expt)

,Add grouping:
bars(happy ~ treatment + gender, data = expt)

Remove error bars:
bars(happy ~ treatment + gender, data = expt, errorFun = FALSE)

Interpretation Tip: Error bars may represent:
• Confidence intervals
• Standard deviation
• Standard error


6. Line Graphs
Points only:
plot(xval, yval, type = "p")

Line only:
plot(xval, yval, type = "l")

Points and line:
plot(xval, yval, type = "b")



7. Saving Graphs
Via R command:
dev.print(device = pdf, file = "scatterplot.pdf", width = 8, height = 8)

Or use the Plots tab in RStudio → “Export” → Save as image or PDF.


8. ggplot2 Overview
ggplot2 builds plots using layers.
Start with:
library(ggplot2)
ggplot(data, aes(x, y)) + geom_point()

Scatterplot with trend line:
ggplot(examData, aes(Anxiety, Exam)) +
geom_point() +

, geom_smooth(method = "lm", color = "red") +
labs(x = "Exam Anxiety", y = "Exam Performance %")

Histogram:
ggplot(examData, aes(Anxiety)) +
geom_histogram(binwidth = 10) +
labs(x = "Anxiety", y = "Frequency")

Boxplot:
ggplot(examData, aes(Gender, Exam)) +
geom_boxplot()

Bar chart (grouped by gender):
ggplot(chickFlick, aes(film, arousal, fill = film)) +
stat_summary(fun = mean, geom = "bar") +
stat_summary(fun.data = mean_cl_normal, geom = "errorbar", width = 0.2) +
facet_wrap(~ gender) +
labs(x = "Film", y = "Mean Arousal") +
theme(legend.position = "none")



9. Graph Types: When to Use
Graph Type Use Case
Scatterplot Continuous × continuous
Boxplot Distribution by category
Histogram Distribution of one continuous var
Bar chart Frequencies or means by category
Line graph Trends or time series


10. TeX Connection (notation from previous weeks)
To plot variables relating to summary statistics (e.g. means or SD), remember:
"
1
𝑋‾ = & 𝑋!
𝑁
!#$


𝑆𝑆
𝑠=,
𝑁−1

These are the basis for understanding what’s behind bar chart heights and error bar sizes.

, 11. Exam Relevance
This presentation directly supports:
• Using R to visualize data
• Identifying patterns, skewness, or outliers
• Using both base R and ggplot2 for professional figures
• Understanding what graphs are appropriate for what types of data


Lecture 5. Hypothesis Testing
1. Parameters vs. Statistics
Measure Population (parameter) Sample (statistic)
Mean 𝜇 𝑥‾
Proportion 𝜋 𝑃
%
Variance 𝜎 𝑠%
Standard deviation 𝜎 𝑠
Correlation 𝜌 𝑟
Regression coefficient 𝛽 𝑏


2. Sampling Error
Even if the population has known parameters (e.g., 𝜇 = 100, 𝜎 = 15), different samples will
yield slightly different means and SDs due to random variation.
Larger sample sizes → lower sampling error → more stable estimates.


3. Estimating Parameters
The sample mean 𝑥‾ is the best point estimate of the population mean 𝜇:
𝜇 ≈ 𝑥‾
But for standard deviation:

1
𝜎≈, ∑(𝑋! − 𝑋‾)%
𝑁−1

This adjustment (dividing by 𝑁 − 1 instead of 𝑁) makes it an unbiased estimator.

Documentinformatie

Geüpload op
21 mei 2025
Aantal pagina's
31
Geschreven in
2024/2025
Type
SAMENVATTING

Onderwerpen

€5,94
Krijg toegang tot het volledige document:

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Online lezen of als PDF
Geen vaste maandelijkse kosten

Maak kennis met de verkoper
Seller avatar
aukehilbrands

Maak kennis met de verkoper

Seller avatar
aukehilbrands Tilburg University
Bekijk profiel
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
-
Lid sinds
1 jaar
Aantal volgers
0
Documenten
4
Laatst verkocht
-

0,0

0 beoordelingen

5
0
4
0
3
0
2
0
1
0

Populaire documenten

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen