1 REVIEW 2025 |250 VERIFIED CONCEPTS & EXAM QUESTIONS
Struggling to make sense of classifiers, decision boundaries, or the difference
between predictive and prescriptive analytics? This ISYE 6501 Midterm 1 cheat
sheet breaks it all down into simple, clear explanations that actually stick.
Whether you're figuring out which direction a boundary leans or when to use a
soft classifier, this guide gives you exactly what you need—no fluff, just focused
answers to help you ace the exam. Built for speed, clarity, and confidence, it's
your shortcut to mastering the math behind real-world modeling.
How do we find the best value of k in k means? - CORRECT ANSWER-Elbow method: we calculate
the total distance of each data point to its cluster center and plot it in two dimensions. We look for
the kik in the graph.
When clustering for prediction how do we choose the prediction? - CORRECT ANSWER-When we
see a new point, we just choose whichever cluster center is closest.
What is the difference between classification and clustering? - CORRECT ANSWER-With
classification mdoels, we know each data point's attributes and we already know the right
classification for the data points (supervised). In clustering (unsupervised) we know the attributes
but we don't know what group any of these data points are in.
What is the difference between supervised learning and unsupervised learning? - CORRECT
ANSWER-Supervised - the response is known
Unsupervised - response is not known.
The k-means algorithm for clustering is a "heuristic" because... - CORRECT ANSWER-...it isn't
guaranteed to get the best answer but it will get to a solution quickly.
A group of astronomers has a set of long-exposure CCD images of various distant objects. They do
not know yet which types of object each one is, and would like your help using analytics to
determine which ones look similar. Which is more appropriate: classification or clustering? -
CORRECT ANSWER-clustering
,Suppose one astronomer has categorized hundreds of the images by hand, and now wants your help
using analytics to automatically determine which category each new image belongs to. Which is
more appropriate: classification or clustering? - CORRECT ANSWER-classification
Which of these is generally a good reason to remove an outlier from your data set?
A. The outlier is an incorrectly-entered data, not real data.
B. Outliers like this only happen occasionally. - CORRECT ANSWER-A.
If the data point isn't a true one, you should remove it from your data set.
What is an outlier? - CORRECT ANSWER-A data point that is very different from the rest
What graph or plot can we use to find outliers? - CORRECT ANSWER-box-and-whisker plot
What are the parts of a box-and-whisker plot? - CORRECT ANSWER-The bottom and top of the box
are the 25th and 75th percentile. The middle valu is the median. The whiskers stretch up and down
to reasonable range of values (10 and 90th or 5th and 95 percentiles)
Where would outliers exist in a box and whisker plot - CORRECT ANSWER-outside of the whiskers.
What are some ways to deal with outliers that are bad data? - CORRECT ANSWER-Omit them or
use imputation
What can change detection be used for? - CORRECT ANSWER-Determining whether action might
be needed, determining impact of past action, determining changes to help plan.
What is Cumulative sum (CUSUM) used for - CORRECT ANSWER-detect in crease, decrease or both
What is C used for in the Cusum formula - CORRECT ANSWER-Since we expect there to be some
randomness, we include a value C to pull the running total down
, If we have a larger C ... - CORRECT ANSWER-the harder for S_t to get large and the less sensitive
the method will be
If we have a smaller C ... - CORRECT ANSWER-the more sensitive the method is because S_t can get
larger faster
What factors go into finding the right values of C and T? - CORRECT ANSWER-how costly it is if the
model takes a long time to nice a change, and how costly it is if the model think it has found a
change that really isn't there.
Why are hypothesis tests often not sufficient for change detection? - CORRECT ANSWER-They
often are slow to detect changes.
Hypothesis tests generally have high threshold levels, which makes them slow to detect changes.
In the CUSUM model, having a higher threshold T makes it... - CORRECT ANSWER-detect changes
slower, and less likely to falsely detect changes.
In the exponential smoothing equation S_t = \alpha \times x_t + (1-\alpha) \times S_{t-1} a value of
closer to 1 is chosen if... - CORRECT ANSWER-There's less randomness, so we're more willing to
trust the observation.
We put more weight on the observation x_t than the previous estimate S_{t-1}
A multiplicative seasonality, like in the Holt-Winters method, means that the seasonal effect is... -
CORRECT ANSWER-Proportional to the baseline value.
A multiplicative seasonality is larger when the baseline value is larger, because its effect is a multiple
of the baseline
In the exponential smoothing equation S_t = \alpha \times x_t + (1-\alpha) \times S_{t-1} only the
current observation x_t is considered in calculating the estimate S_t. - CORRECT ANSWER-False. we
consider all previous observations