100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

Summary Advanced Statistics

Rating
-
Sold
-
Pages
10
Uploaded on
21-03-2025
Written in
2024/2025

Summary of the advanced statistics course.

Institution
Course









Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Course

Document information

Uploaded on
March 21, 2025
Number of pages
10
Written in
2024/2025
Type
Summary

Subjects

Content preview

Summary Advanced Statistics
Some definitions
P-value => the chance of observing a difference from H0 at least as extreme as the one in you sample
 P-hacking: Performing a large number of statistical tests, only reporting the ones that are
statistically significant, thereby increasing the risk of false positive results.

Standard Error (SE) => a measure of uncertainty of an estimate, so how much the estimate is
expected to vary from the estimate of the true population.
 It helps understand how reliable or representative our sample is as an estimate of the
population.
 A smaller standard error suggests a more reliable estimate, while a larger one indicates more
uncertainty.
Standard deviation (SD) => tells us how spread out or varied a set of data points is from the average
(mean).
 It helps understand the degree of variability or dispersion in a dataset.
 A larger standard deviation means the data points are more spread out, while a smaller one
indicates they are closer to the mean.
Degrees of freedom (DF) => represent the number of values in the final calculation of a statistic that
are free to vary.
 It measures the flexibility or constraints in data.
 It's the number of data points minus the number of parameters estimated or restrictions
imposed in a statistical analysis.

Tidy data => Every row is one is measurement in space and time, columns are variables with meaning
in the context of a hypothesis or model. Long format!
o Minimal number of columns = Degrees of freedom Model
Power => the probability that a statistical test or analysis will correctly detect a true effect or
difference when it exists. It measures the ability of a test to avoid a "false negative" or Type II error,
indicating the test's sensitivity to finding real effects.

Type I Error => incorrectly rejecting a true null hypothesis. In other words, it's a false positive,
indicating that there is an effect or difference when there isn't one. Underestimate of SE.
Type II Error => failing to reject a false null hypothesis. In other words, it's a false negative, indicating
that there is no effect or difference when there actually is one. Overestimate of SE.

Null deviance => measure for the deviance of the null model (maximal deviance explained by model).
Residual deviance => measure for deviance of the residuals (variance not explained by model).
- Residual deviance should be the same or close to degrees of freedom = model fits good
Deviance explained: (Null deviance - residual deviance)/Null deviance
- Overdispersion => having more variation or "spread" in the data than the model predicts,
which can lead to inaccurate model results and conclusions.
o You can deal with this in different ways:
 The dispersion parameter can be used to correct for the
underestimate/overestimate of SE.
 Quasipoisson (poisson but with more variance)
 Negative binomial (poisson but with more variance, more complex, separate
parameters for mean and variance)
 Mixed Models, but only if there is a random effect factor.
- Under dispersion => having less variation or "spread" than the model predicts, which can
also affect the accuracy of model results and conclusions.

, Fisher scoring => how many steps it took to find the best fit (4-8 is good, above 15 bad).

Studies where data is not independent:
- Longitudinal studies: Subject is measured over time
- Repeated measurement: Subject receives multiple treatments.
- Nested designs: One subject nested in treatment (not a factorial design).
- Split plot design: Combination of factorial and nested design.

Statistical Considerations of Study Design
 Balance => Equal sample size per category
o Not always possible => but increases power and simplicity of the analysis
 Replication => true replication is absolutely essential
o The required sample size depends on…
 Variance (the stochastic part of the process)
 Effect size (how large the true differences are)
 Model complexity (more parameters require more samples)
o Variance and effect size can be determined from a pilot study, previous research, or
expert knowledge.
o Model complexity depends on what kind of comparison you want to make, what
distribution you think the outcome has conditional on the explanatory variables,
whether you believe there to be potential confounders that have to be included, etc.
o No pseudo replications (measurements on the same experimental units, like leaves
on one tree instead of multiple trees).
 Randomization => random allocation of treatments, locations, or even the order in which you
process samples.
o Avoiding confounding effects
o Without randomization, samples run first will have slightly different measurement
error than samples run last.
 Blocking => a way to group similar things or subjects together.
o For estimating confounding effects
o A block is a subset of the experimental material within which experimental units are
expected to be homogeneous (e.g., a microarray is a block).
o Blocking can make it easier to detect the true effects of the factors you're studying by
reducing the influence of other variables that could muddy the result.
o Nested mixed models can use blocking (blocks nested in blocks).




Required sample size (n) depends on the complexity of the study design:
- Groups
- Natural variability
- Experimental techniques

 Small n => significance testing
 Medium n => regularized linear models (also small)
 Large n => predictive models
R141,33
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Get to know the seller
Seller avatar
mayastelzer

Document also available in package deal

Get to know the seller

Seller avatar
mayastelzer Universiteit Leiden
Follow You need to be logged in order to follow users or courses
Sold
2
Member since
9 months
Number of followers
0
Documents
9
Last sold
1 month ago

0,0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can immediately select a different document that better matches what you need.

Pay how you prefer, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card or EFT and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions