ARMS samenvatting herkansing
Week 1
LECTURE
Frequentist framework
- test how well the data fit H0
- p-values, confidence intervals, effect sizes, power analysis
Bayesian
- probability of hypothesis given the data, taking prior information into account
- bayes factors, priors, posteriors, credible intervals
Frequentist estimation
- all relevant information for inference is contained in the likelihood function
- our parameter of interest is a population mean (µ)
- we assume a normal likelihood function
- x-as : values for µ
- y-as : probability of the observed data for each value of µ : P(data|µ)
→ likelihood
Bayesian estimation
- in addition to the data, we may also have prior information about µ
- central idea: prior knowledge is updated with information in the data and together
provides the posterior distribution for µ
- information in our dataset provides information about what reasonable
values for µ could be (through likelihood function)
- advantage: accumulating knowledge (today’s posterior is tomorrow's prior)
- disadvantage: results depend on choice of prior
Prior influences the posteriors
Bayesian estimates
- the posterior distribution of the parameters of interest provides all desired estimates
- posterior mean/mode
- combination of the prior and likelihood
- posterior SD (comparable to frequentist ‘standard error’)
- posterior 95% credible interval: providing the bounds of the part of the posterior
with 95% of the posterior mass
Frequentist probability
- p value = probability of observing same or more extreme data given that null is true
- testing conditions on H0
- the probability of an event is assumed to be the frequency with which it occurs
1
,Bayesian probability
- when testing hypotheses, bayesians can calculate the probability of the hypothesis given the
data
- bayesian conditions on observed data
- PMP = posterior model probability
- the (bayesian) probability of the hypothesis after observing the data
- bayesian probability of hypothesis being true, depends on 2 criteria
1. how sensible it is, based on prior knowledge (the prior)
2. how well it fits the new evidence (the data)
- bayesian testing is comparative: hypotheses are tested against one another, not in isolation
- bayes factor:
- BF10 = 10, support for H1 is 10 times stronger than for H0
- BF10 =1, support for H1 is as strong as support for H0
- Posterior probabilities of hypothesis (PMP) are also relative probabilities
- updates of prior probabilities (for hypotheses) with the BF
Definition of probability
- frequentist: probability is the relative frequency of events (more formal?)
- bayesian: probability is the degree of belief (more intuitive?)
Intervals
- frequentist 95% confidence interval
- if we were to repeat this experiment many times and calculate a CI each time, 95%
of the intervals will include the true parameter value, and 5% won’t
- bayesian 95% credible interval
- there is 95% probability that the true value is in the credible interval
Linear regression
- scatterplot for scores on the variables x and y and the linear positive association between
them
Multiple linear regression
- observed outcome is the prediction based on the model + prediction error
Model assumptions
- serious violations → incorrect results
- sometimes easy solutions
- per model, know what the assumptions are
2
,MLR assumptions
1. interval/ratio variables (outcome and predictors)
- MLR can handle dummy variables as predictors
- dummy variable has 0 and 1 (1= males, 0 = females)
Evaluating the model
- frequentist
- estimate parameters of model
- test with NHST if parameters are significantly non-zero
- bayesian
- estimate parameters of model
- compare support in data for different models/hypotheses using bayes factors
Frequentist analyses
Bayesian analysis
3
, - bayesian estimates are summary of posterior distribution of parameters (B)
- differences with frequentist results can be explained by impact of prior
- BFinclusion shows if the model improves with this predictor (BF = 5.467, when adding age)
- last column provides 95% credible interval for each regression coefficient
Hierarchical linear regression analysis
- comparing 2 nested models
Exploration vs theory evaluation
- frequentist
- method enter
- data analyst decides what goes in the model
- confirmatory
- stepwise method
- best prediction model is determined based on results in the sample
- capitalizes most on chance
- best chance to get replicated
- bayesian
- exploratory
- BAIN can evaluate informative hypotheses → confirmatory
SEMINAR
Point estimate: single value estimate of a parameter
Interval estimate: range of values believed to contain the parameter
- provide a measure of uncertainty
- more information than point estimate
- ‘range of plausible values’
- can be used for estimation and testing
Frequentist statistics: confidence interval
- constructing a confidence interval around a point estimate
- we need
- point estimate (the sample mean, x)
- SD of point estimate, s
- sample size, n
- calculate the CI
4
Week 1
LECTURE
Frequentist framework
- test how well the data fit H0
- p-values, confidence intervals, effect sizes, power analysis
Bayesian
- probability of hypothesis given the data, taking prior information into account
- bayes factors, priors, posteriors, credible intervals
Frequentist estimation
- all relevant information for inference is contained in the likelihood function
- our parameter of interest is a population mean (µ)
- we assume a normal likelihood function
- x-as : values for µ
- y-as : probability of the observed data for each value of µ : P(data|µ)
→ likelihood
Bayesian estimation
- in addition to the data, we may also have prior information about µ
- central idea: prior knowledge is updated with information in the data and together
provides the posterior distribution for µ
- information in our dataset provides information about what reasonable
values for µ could be (through likelihood function)
- advantage: accumulating knowledge (today’s posterior is tomorrow's prior)
- disadvantage: results depend on choice of prior
Prior influences the posteriors
Bayesian estimates
- the posterior distribution of the parameters of interest provides all desired estimates
- posterior mean/mode
- combination of the prior and likelihood
- posterior SD (comparable to frequentist ‘standard error’)
- posterior 95% credible interval: providing the bounds of the part of the posterior
with 95% of the posterior mass
Frequentist probability
- p value = probability of observing same or more extreme data given that null is true
- testing conditions on H0
- the probability of an event is assumed to be the frequency with which it occurs
1
,Bayesian probability
- when testing hypotheses, bayesians can calculate the probability of the hypothesis given the
data
- bayesian conditions on observed data
- PMP = posterior model probability
- the (bayesian) probability of the hypothesis after observing the data
- bayesian probability of hypothesis being true, depends on 2 criteria
1. how sensible it is, based on prior knowledge (the prior)
2. how well it fits the new evidence (the data)
- bayesian testing is comparative: hypotheses are tested against one another, not in isolation
- bayes factor:
- BF10 = 10, support for H1 is 10 times stronger than for H0
- BF10 =1, support for H1 is as strong as support for H0
- Posterior probabilities of hypothesis (PMP) are also relative probabilities
- updates of prior probabilities (for hypotheses) with the BF
Definition of probability
- frequentist: probability is the relative frequency of events (more formal?)
- bayesian: probability is the degree of belief (more intuitive?)
Intervals
- frequentist 95% confidence interval
- if we were to repeat this experiment many times and calculate a CI each time, 95%
of the intervals will include the true parameter value, and 5% won’t
- bayesian 95% credible interval
- there is 95% probability that the true value is in the credible interval
Linear regression
- scatterplot for scores on the variables x and y and the linear positive association between
them
Multiple linear regression
- observed outcome is the prediction based on the model + prediction error
Model assumptions
- serious violations → incorrect results
- sometimes easy solutions
- per model, know what the assumptions are
2
,MLR assumptions
1. interval/ratio variables (outcome and predictors)
- MLR can handle dummy variables as predictors
- dummy variable has 0 and 1 (1= males, 0 = females)
Evaluating the model
- frequentist
- estimate parameters of model
- test with NHST if parameters are significantly non-zero
- bayesian
- estimate parameters of model
- compare support in data for different models/hypotheses using bayes factors
Frequentist analyses
Bayesian analysis
3
, - bayesian estimates are summary of posterior distribution of parameters (B)
- differences with frequentist results can be explained by impact of prior
- BFinclusion shows if the model improves with this predictor (BF = 5.467, when adding age)
- last column provides 95% credible interval for each regression coefficient
Hierarchical linear regression analysis
- comparing 2 nested models
Exploration vs theory evaluation
- frequentist
- method enter
- data analyst decides what goes in the model
- confirmatory
- stepwise method
- best prediction model is determined based on results in the sample
- capitalizes most on chance
- best chance to get replicated
- bayesian
- exploratory
- BAIN can evaluate informative hypotheses → confirmatory
SEMINAR
Point estimate: single value estimate of a parameter
Interval estimate: range of values believed to contain the parameter
- provide a measure of uncertainty
- more information than point estimate
- ‘range of plausible values’
- can be used for estimation and testing
Frequentist statistics: confidence interval
- constructing a confidence interval around a point estimate
- we need
- point estimate (the sample mean, x)
- SD of point estimate, s
- sample size, n
- calculate the CI
4