100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
Samenvatting

Summary List of statistics terms

Beoordeling
-
Verkocht
-
Pagina's
7
Geüpload op
21-03-2025
Geschreven in
2024/2025

List and explanation of important statistics terms.

Instelling
Vak









Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Geschreven voor

Instelling
Studie
Vak

Documentinformatie

Geüpload op
21 maart 2025
Aantal pagina's
7
Geschreven in
2024/2025
Type
Samenvatting

Onderwerpen

Voorbeeld van de inhoud

Adjusted R2 – the modified version of R2 that has been adjusted for the number
of predictors in the model. This factor increases when a new term improves
the model more than would be expected by chance. It represents the degree to
which the input variables explain the variance of the output or predicted
variables.
AIC – is used to compare different possible models and determine which one is
the best fit for the data. A lower outcome is better. The outcome of this gives
random numbers, so if they are very close the difference might not be
significant. If the differences in AIC are very small, the simpler model with the
smallest deviance is the best.
ANCOVA – tests the main and interaction effects of categorical variables on a
continuous dependent variable. It is a blend of ANOVA and regression. An
ANOVA shows the group means, whereas an ANCOVA shows the groups means
across another variable. Parallel slopes point to no interaction, but different
slopes might mean that there is an interaction.
ANOVA – analysis of variance. It is a model for comparing any number (more
than two, otherwise use a t-test) of group means. The observations are
assumed to be independent, normally distributed and the group variances
are assumed to be equal. A factorial ANOVA takes interaction into account.
BIC– similar to the AIC, but it takes both the variables and the sample size into
consideration.
Binomial distribution – is used when there the observations are discreet and
between 0 and n (sample size). The observations are skewed. Includes binary
data and ratios.
Blocking – grouping experimental material within which experimental units are
expected to be homogeneous. For example, a microarray, a plot of land or a
hospital is a block. To improve the reliability of the experiment, treatments or
conditions should be randomised within a block, not just over the entire
experiment. This reduces the bias, variance and possible confounding effects.
Bonferroni correction – a multiple-comparison correction that is used to
counteract the problem of multiple testing. It controls the familywise error
rate. It multiplies the p-values with the number of tests. This is the same as
dividing the level of significance by the number of tests. Works better when the
number of comparisons is small. Can often lead to a p- value of 1, meaning
that there is no evidence at all for rejecting the null hypothesis. It is quite strict
and can result in many false negatives.
Bootstrapping – a resampling method to obtain standard errors. Resamples
from the data with replacement (so values can be picked multiple times). A new
dataset is then produced with a slightly different distribution. This is repeated
many times. Then the standard deviation of the distribution can be taken, which
is the standard error for the data. The quantiles of this distribution stand for
the confidence interval. Has the assumption that the data set is
representative for the population.
Bootstrapping aggregation – used in random forest models. The data is
resampled, but for every new data set new binary tree is made. All of these trees
can then be combined to make a prediction. This makes a low-variance
estimator out of a high one.
Box-Cox – a plot that can help determine which transformation should be
applied. If the peak is close to 1, you should not transform. You should use
the power of the position of the peak, except for 0 (this is a log transformation).
Chi-squared test – compares the observed to the expected frequencies. The H 0
is that there is no difference between the two. Can also be used for contingency
tables.

, Collinearity – a correlation and a linear relationship between explanatory
variables. This can lead to a very high variance and can then create an
unreliable model. The statistical significance of an independent variable is
undermined. If there is perfect collinearity the correlation is equal to 1 or -1.
It is different from interaction, because it describes the relationship between two
explanatory variables. It does not include an effect on the response
variable, as would be the case with interaction.
Confidence interval – It shows the uncertainty of estimates, not a prediction or
confidence in a specific estimate. Usually 95%, meaning that only 5% of the
samples will fall outside of the range. Can also be used for a regression model,
with the confint function in R. In this case, if the interval includes 0, there is no
significant difference and the parameter might need to be excluded from
the model. There will be a confidence band (when talking about regression),
this says something about the uncertainty of the slope and intercept.
Confounders – unknown underlying factors. Could lead to misinterpretation of
the interactions between variables. They do not necessarily influence the
outcome. Randomization and blinding are ways to average out any
confounders. You can also add the confounder to a model, so the statistics
become more reliable.
Cook’s distance – measures the relative influence of each individual case in a
sample of data on the results of a regression analysis. It shows if one or more
points might have an unproportionally large influence and thus may point to
outliers.
Covariates – continuous variables that you measure alongside the main
variable of interest.
Cross-validation – the data is randomly split into different groups. The model is
fit to all but one group. Then the model is tested on the final group. This is
repeated until every group is used to test the model. This gives an R2 for
every model and can be used to see which model is the best. Can be used
to evaluate the performance of a model on unseen data and prevent overfitting.
Can be used to find the best λ, for regularisation.
Degrees of freedom – the maximum number of logically independent values,
which are values that have the freedom to vary. Gives an indication of the
flexibility or constraint of the data. If the df is the same as the number of
samples, there is a perfect fit. Sample size = dfmodel + dfresiduals
Dummy variable – is used in ANCOVA. When you add a categorical explanatory
variable to a linear model. So, for example: male becomes 0 and female
becomes 1. Now the model contains both categorical and continuous
explanatory variables.
Elastic net – a regularisation method which combines LASSO and ridge
penalties. It penalises with a mixture of the sum of absolute estimates and
the sum of squared estimates. Variables are shrunk towards zero but can
also become zero (so there is variable selection). Due to this, correlations
between explanatory variables are not preserved, but this method still works
well if there is correlation.
Factorial ANOVA – a test for comparing group means. A way of comparing
combinations of categorical independent variables while taking interactions
into account.
False discovery rate – conceptualises the rate of type I errors, when the H0 is
rejected although it is true, during multiple testing. FDR = FP / (FP + TP). Is
used as a correction for multiple testing. Is less strict than the Bonferroni
correction.
Family wise error rate – the probability of one or more false positives (type I
error) when performing multiple hypothesis testing. It is a value between 0 and
€6,99
Krijg toegang tot het volledige document:

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Lees online óf als PDF
Geen vaste maandelijkse kosten

Maak kennis met de verkoper
Seller avatar
mayastelzer

Ook beschikbaar in voordeelbundel

Maak kennis met de verkoper

Seller avatar
mayastelzer Universiteit Leiden
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
2
Lid sinds
9 maanden
Aantal volgers
0
Documenten
9
Laatst verkocht
1 maand geleden

0,0

0 beoordelingen

5
0
4
0
3
0
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via Bancontact, iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo eenvoudig kan het zijn.”

Alisha Student

Veelgestelde vragen