100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

Summary STA130 Midterm Aid Sheet

Rating
-
Sold
-
Pages
2
Uploaded on
17-01-2023
Written in
2022/2023

This is the study document I used to study for the midterm. We were able to use an aid sheet during the exam, and you can use this to inspire the content and layout of yours. I would add more information on confidence intervals, and more code examples.

Show more Read less








Whoops! We can’t load your doc right now. Try again or contact support.

Document information

Uploaded on
January 17, 2023
Number of pages
2
Written in
2022/2023
Type
Summary

Content preview

Modern Stats+DS software/programming/computational tools → mathematical+algorithmic data/statistical analysis methodologies →
explained+advocated w/ written+verbal communication → facilitate data-driven and evidence-based decision making
Learning first learning, structured course material is good → it’s faster to learn and troubleshoot problems yourself
Jupyterhub is a cloud-based service → run R/Rstudio from any web browser. Jupyterhub > Rstudio GUI IDE program that wraps… > R) > tidyverse
R Markdown Reproducibility (text+outputs+code)
R methods+algorithms usually built-in/loaded from packages → most R users don’t build algorithms/data types
tidyverse Key set of R packages that help facilitate modern stats+DS
bias survivorship bias → look at data that survived and doesn’t look at group with no data
alpha significance
Basic Functions glimpse() → summary printout shows variables vertically & shows no. of rows
head() → output is tibble & doesn’t show total no. of rows & can see n rows
c() → vector | all() → output is boolean | sum() → translate logical TRUE to numeric 1 and logical FALSE to numeric 0
help() | name() → column names
data/variable types numerical(cont, disc) | categorical (nom, ord, bin→categorical variables = logical T/F boolean variables)
123 & 1.23 same
for R (double)

Coercion




Visualisation Func coord_flip(), order geom_bar, labs(x= , y= )
Distributional 1st → centre/location: median, mean, mode
Characteristics
2nd → Spread/scale statistics: IQR, variance, SD
3rd/Higher order characteristics → skewness+modality+outliers
Truly tidy data Rows→ observations | columns→ variables | cell→ single measurement
Tidy data benefits Can use same tools in similar ways for diff datasets vs hard to reuse untidy data & one-time approaches
print vs head print → outputs n number of rows indicated.
Data Wrangling select() → extract subset of variables | remove variable w/ ‘-’ and rename w ‘=’ vs dplyr::rename(),
Functions (dplyr)
filter() → extract rows based on conditions in one+ columns & filter(is.na())
arrange() → sort observation based on values in one or more variables & desc()
mutate() → make new column w/ interesting variables & case_when(<condition eg. b>=a ~ “Female”,>) → ‘~` =
response (L) DEPEND ON explanatory variables (R)
Aggregation functions → summarise((n=n() → sample size *doesn’t know NA values, <obj>=sum(), median(), mean(),
var(), sd(), IQR(), quantile(<obj>, 0.75), min(), max())
group_by() %>% → group rows by column values
is.na() | !is.na()
na.rm() → ignores/excludes NA
Other: n_distinct()
%in% → see if an element is in dataframe/vector | levels() and nlevels()
Inference Theoretical populations vs Actual samples → population-(sampling)->sample-(inference)->population
Sample statistic


x̄ →
Hypothesis Testing
Functions




[i] → indexing into a vector, matrix, array, list or dataframe
Steps 1. Null Hypothesis → assumed value of parameter H0 : p=0.5 (sampling distribution to be compared against observed
test stat) & Alternative Hypothesis → H1 : p≠0.5 (Null is FALSE)
2. Set α-significance level (the probability we make a wrong decision about a chosen assumption) → reject H0 for
p-values less than α. It’s also probability→Type I error of rejecting a true H0 … Type II error failing to reject true NULL
3. Simulate Sampling Distribution assuming NULL is TRUE & 4. Compute p-value → The probability [can be
approximated] of observing a test statistic that is as or more extreme than the one we got if the NULL Hypothesis is
actually TRUE
5. “Reject H0 at α-significance level” if p-value is less than α OTHERWISE “fail to reject NULL at sig level”
Example Two 1. pick α=0.05 & placebo: 0.58 & actual: 0.75
Sample Hypothesis
Test
2. Test stat μ1=0.58 & μ2=0.75 → p=0.75-0.58
3. H0 : μ1=μ2 → μ1-μ2=0 & H1 : μ1≠μ2
4. Simulate sampling distribution assuming NULL is TRUE → set.seed() and n repetitions

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
ralwab University of Toronto
View profile
Follow You need to be logged in order to follow users or courses
Sold
15
Member since
2 year
Number of followers
11
Documents
4
Last sold
1 year ago

4.0

1 reviews

5
0
4
1
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions