100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

Summary List of statistics terms

Rating
-
Sold
-
Pages
7
Uploaded on
21-03-2025
Written in
2024/2025

List and explanation of important statistics terms.

Institution
Course









Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Course

Document information

Uploaded on
March 21, 2025
Number of pages
7
Written in
2024/2025
Type
Summary

Subjects

Content preview

Adjusted R2 – the modified version of R2 that has been adjusted for the number
of predictors in the model. This factor increases when a new term improves
the model more than would be expected by chance. It represents the degree to
which the input variables explain the variance of the output or predicted
variables.
AIC – is used to compare different possible models and determine which one is
the best fit for the data. A lower outcome is better. The outcome of this gives
random numbers, so if they are very close the difference might not be
significant. If the differences in AIC are very small, the simpler model with the
smallest deviance is the best.
ANCOVA – tests the main and interaction effects of categorical variables on a
continuous dependent variable. It is a blend of ANOVA and regression. An
ANOVA shows the group means, whereas an ANCOVA shows the groups means
across another variable. Parallel slopes point to no interaction, but different
slopes might mean that there is an interaction.
ANOVA – analysis of variance. It is a model for comparing any number (more
than two, otherwise use a t-test) of group means. The observations are
assumed to be independent, normally distributed and the group variances
are assumed to be equal. A factorial ANOVA takes interaction into account.
BIC– similar to the AIC, but it takes both the variables and the sample size into
consideration.
Binomial distribution – is used when there the observations are discreet and
between 0 and n (sample size). The observations are skewed. Includes binary
data and ratios.
Blocking – grouping experimental material within which experimental units are
expected to be homogeneous. For example, a microarray, a plot of land or a
hospital is a block. To improve the reliability of the experiment, treatments or
conditions should be randomised within a block, not just over the entire
experiment. This reduces the bias, variance and possible confounding effects.
Bonferroni correction – a multiple-comparison correction that is used to
counteract the problem of multiple testing. It controls the familywise error
rate. It multiplies the p-values with the number of tests. This is the same as
dividing the level of significance by the number of tests. Works better when the
number of comparisons is small. Can often lead to a p- value of 1, meaning
that there is no evidence at all for rejecting the null hypothesis. It is quite strict
and can result in many false negatives.
Bootstrapping – a resampling method to obtain standard errors. Resamples
from the data with replacement (so values can be picked multiple times). A new
dataset is then produced with a slightly different distribution. This is repeated
many times. Then the standard deviation of the distribution can be taken, which
is the standard error for the data. The quantiles of this distribution stand for
the confidence interval. Has the assumption that the data set is
representative for the population.
Bootstrapping aggregation – used in random forest models. The data is
resampled, but for every new data set new binary tree is made. All of these trees
can then be combined to make a prediction. This makes a low-variance
estimator out of a high one.
Box-Cox – a plot that can help determine which transformation should be
applied. If the peak is close to 1, you should not transform. You should use
the power of the position of the peak, except for 0 (this is a log transformation).
Chi-squared test – compares the observed to the expected frequencies. The H 0
is that there is no difference between the two. Can also be used for contingency
tables.

, Collinearity – a correlation and a linear relationship between explanatory
variables. This can lead to a very high variance and can then create an
unreliable model. The statistical significance of an independent variable is
undermined. If there is perfect collinearity the correlation is equal to 1 or -1.
It is different from interaction, because it describes the relationship between two
explanatory variables. It does not include an effect on the response
variable, as would be the case with interaction.
Confidence interval – It shows the uncertainty of estimates, not a prediction or
confidence in a specific estimate. Usually 95%, meaning that only 5% of the
samples will fall outside of the range. Can also be used for a regression model,
with the confint function in R. In this case, if the interval includes 0, there is no
significant difference and the parameter might need to be excluded from
the model. There will be a confidence band (when talking about regression),
this says something about the uncertainty of the slope and intercept.
Confounders – unknown underlying factors. Could lead to misinterpretation of
the interactions between variables. They do not necessarily influence the
outcome. Randomization and blinding are ways to average out any
confounders. You can also add the confounder to a model, so the statistics
become more reliable.
Cook’s distance – measures the relative influence of each individual case in a
sample of data on the results of a regression analysis. It shows if one or more
points might have an unproportionally large influence and thus may point to
outliers.
Covariates – continuous variables that you measure alongside the main
variable of interest.
Cross-validation – the data is randomly split into different groups. The model is
fit to all but one group. Then the model is tested on the final group. This is
repeated until every group is used to test the model. This gives an R2 for
every model and can be used to see which model is the best. Can be used
to evaluate the performance of a model on unseen data and prevent overfitting.
Can be used to find the best λ, for regularisation.
Degrees of freedom – the maximum number of logically independent values,
which are values that have the freedom to vary. Gives an indication of the
flexibility or constraint of the data. If the df is the same as the number of
samples, there is a perfect fit. Sample size = dfmodel + dfresiduals
Dummy variable – is used in ANCOVA. When you add a categorical explanatory
variable to a linear model. So, for example: male becomes 0 and female
becomes 1. Now the model contains both categorical and continuous
explanatory variables.
Elastic net – a regularisation method which combines LASSO and ridge
penalties. It penalises with a mixture of the sum of absolute estimates and
the sum of squared estimates. Variables are shrunk towards zero but can
also become zero (so there is variable selection). Due to this, correlations
between explanatory variables are not preserved, but this method still works
well if there is correlation.
Factorial ANOVA – a test for comparing group means. A way of comparing
combinations of categorical independent variables while taking interactions
into account.
False discovery rate – conceptualises the rate of type I errors, when the H0 is
rejected although it is true, during multiple testing. FDR = FP / (FP + TP). Is
used as a correction for multiple testing. Is less strict than the Bonferroni
correction.
Family wise error rate – the probability of one or more false positives (type I
error) when performing multiple hypothesis testing. It is a value between 0 and
$8.48
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Get to know the seller
Seller avatar
mayastelzer

Also available in package deal

Get to know the seller

Seller avatar
mayastelzer Universiteit Leiden
Follow You need to be logged in order to follow users or courses
Sold
2
Member since
9 months
Number of followers
0
Documents
9
Last sold
1 month ago

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions