1. Explain the difference Hard clustering in k-means assign each data point to exactly one cluster,
between soft cluster- making hard cluster assignments.
ing in GMMs and hard
clusteringin k-means. GMM uses soft clustering that is based on probabilities instead of making
a hard assignment of the data point.
2. Why is a block boot- It adresses the issue of correlation that is often present in time series
strap often used for data. It samples the data in blocks of continuous observations rather than
time series (depen- individual observations.
dent) data?
3. T/F. In the bootstrap False
method, each boot-
strap sample must in-
clude all unique obser-
vations from the origi-
nal dataset.
4. In the percentile boot- Use the 2.5th and 97.5th percentiles of the bootstrap distribution.
strap method, how do
you construct a 95%
confidence interval?
5. In the studentized Bootstrap standard error of each replicate
bootstrap method,
what is typically used
to standardize the
bootstrap replicates?
6. The difference be- Boostrap stimulates from the data but MC stimulates from a model based
tween Bootstrap and on the data
the Monte Carlo
method is
, AML exam questions
7. What is the primary To assess the stability and uncertainty of the model estimates.
purpose of using the
bootstrap method in
linear regression?
8. If small perturbations bootstrapping will not work well, and may fail.
to the data-generating
process produce huge
swings in the sam-
pling distribution,
9. T/F. The bootstrap False, it is non-parametric
method requires as-
sumptions about the
underlying population
distribution.
10. What is the boot- Bootstrap is a method where you create B datasets by randomly resam-
strap method, and pling with replacement. You use it to see how the quantity of interest
how does it differ behaves when observed in this process.
from traditional statis-
tical methods?
11. Why is resampling Replacement is essential, since without resampling we would always get
with replacement es- the same summary statistics. Resampled dataset is a valid representation
sential in the boot- of original dataset.
strap process?
12. In what situations is When dataset is small, and if you want to estimate the distribution and
the bootstrap particu- attach confidence limits for testing.
larly useful compared
, AML exam questions
to parametric meth-
ods?
13. How do you inter- So, if you have a 95% confidence interval, that means that the values within
pret a bootstrap confi- the interval have 95% likelihood of being the population parameter.
dence interval?
14. What are the assump- We assume that the data is i.i.d, so that it is independently and identically
tions of the boot- distributed data. This is only for non-parametric!
strap method, and
how might violations
of these assumptions
impact results?
15. How can bootstrap Well, when you have bootstrap datasets, you can refit the model on each
methods be used to of these samples. Then we can analyse the distribution of parameter
assess the stability of a estimates or predictions across samples, which shows us the variability.
model or estimate? This helps assess model stability and the sensitivity of the predictions to
variations in the data.
16. What is the differ- Studentised bootstrap CI includes inned and outer
ence between the ba-
sic bootstrap confi-
dence interval and the
studentized bootstrap
confidence interval?
17. How does the choice It affects the method and the interpretation. For example, using mean is
of statistic affect the typically straightforward but median might require a larger sample size.
bootstrap process?
18. 9. What are some po- It is computationally demanding, potentially biased if original sample is
tential limitations or biased, and high variability for small sample sizes.