For each of the following five questions, select the probability distribution
that could best be used to model the described scenario. Each distribution
might be used, zero, one, or more than one time in the five questions.
These scenarios are meant to be simple and straightforward; if you're an
expert in the field the question asks about, please do not rely on your
expertise to fill in all the extra complexity (you'll end up making the
questions below more difficult than I intended).
Move To...
Question 1
1..4 pts
Number of people clicking an online banner ad each hour
Binomial
Exponenti
al
Geometric
Correct!
Poisson
Weibull
Move To...
Question 2
1..4 pts
Time from when a generator is turned on
until it fails Binomial
,Exponential
Geometr
ic
Poisson
Correct!
Weibull
Move To...
Question 3
1..4 pts
Number of hits to a real estate web site
each minute Binomial
Exponential
Geometric
Correct!
Poisson
Weibull
,Move To...
Question 4
1..4 pts
Number of people entering a grocery store
each minute Binomial
Exponential
Geometric
Correct!
Poisson
Weibull
Move To...
Question 5
1..4 pts
Time between hits on a real estate web site
Binomial
Correct!
Exponenti
al
Geometric
, Poisson
Weibull
Move
To...
INFORMATION FOR QUESTIONS 6-7
Five classification models were built for predicting whether a neighborhood
will soon see a large rise in home prices, based on public elementary school
ratings and other factors. The training data set was missing the school rating
variable for every new school (3% of the data points).
Because ratings are unavailable for newly-opened schools, it is believed
that locations that have recently experienced high population growth are
more likely to have missing school rating data.
Model 1 used imputation, filling in the missing data with the
average school rating from the rest of the data.
Model 2 used imputation, building a regression model to fill in the
missing school rating data based on other variables.
Model 3 used imputation, first building a classification model to
estimate (based on other variables) whether a new school is likely
to have been built as a result of recent population growth (or
whether it has been built for another purpose, e.g. to replace a very
old school), and then using that classification to select one of two
regression models to fill in an estimate of the school rating; there
are two different regression models (based on other variables), one
for neighborhoods with new schools built due to population growth,
and one for neighborhoods with new schools built for other reasons.
Model 4 used a binary variable to identify locations with missing
information.
Model 5 used a categorical variable: first, a classification model
was used to estimate whether a new school is likely to have
been built as a result of recent population growth; and then
each neighborhood was categorized as "data available",
"missing, population growth", or "missing, other reason".