QUESTIONS AND VERIFIED
CORRECT ANSWERS
GRADED A+ 100%
GUARANTEED PASS [ LATEST
2026-2027]
How should one generally split their data set? - CORRECT ANSWER-Training (building models) /
Validation (picking model) / Test (estimate performance)
Rotating versus randomness when validating data? - CORRECT ANSWER-Rotation: can make
sure each part of the data is equally separated
Randomness: no chance of bias
K-fold Cross-Validation - CORRECT ANSWER-takes number of sections (k) and tests against
eachother so you don't have to worry about what is being left out. Gives a better estimate of
model quality.
Clustering - CORRECT ANSWER-takes a set of data points, dividing them into groups so each
group contains points that are close to eachother or similar.
Distance Norms - CORRECT ANSWER-Given 2 points x and y with coordinates x1, x2 and y1, y2 --
the distance between them is the square root of x1-y1 squared + x2-y2 squared.
,rectilinear distance norms - CORRECT ANSWER-Absolute value of distance norms
P-norm distance - CORRECT ANSWER-generalized version of both distance equations where p
would be 2 for a straight-line distance and P would be 1 for a rectilinear distance
3rd most common value for P is infinity
Infinity Norm - CORRECT ANSWER-Largest of a set of numbers in absolute value -- infinity norm
of a square matrix is the maximum of the absolute row sums
k-means clustering algorithm formula meaning - CORRECT ANSWER-X denotes data
n data points and m attributes
Xij is the value of a data point i's attribute j
Y denotes cluster membership
Yik is one if data point i in in cluster k and 0 if not
Zkj denotes the j dimension coordinate of cluster center k
k-means clustering - CORRECT ANSWER-find a set of k cluster centers and assignments of each
data point to a cluster center to minimize the total distance from each data point to its cluster
center
How to decide how many clusters to include in k-means clustering - CORRECT ANSWER-begin by
picking k points inside a range of our data
K is the number of clusters we want
points we pick are called cluster centers
Process of k-means clustering - CORRECT ANSWER-1) choose number of clusters
2) Temporarily assign each data point to the cluster center it is close to
3) Recalculate the cluster centers (centroids)
, 4) Go back to previous step and reassign each data point to its closest cluster center
5) Continue repeating this loop until no data point changes clusters
What models are k-means clustering an example of? - CORRECT ANSWER-Machine learning,
Heuristic model: algorithm that is not guaranteed to find the absolute best solution, but in
many cases gets pretty close to the best soln.
Expectation maximization algorithm: minimizing finding smallest distance to a cluster center or
maximizing the negative of the distance to a cluser center
Should you remove outliers from kmeans clustering? - CORRECT ANSWER-Only if it does not
create inherent bias to the data
Should you run kmeans clustering once or several times? - CORRECT ANSWER-Several times --
using different initial cluster centers and find the best solution (also use different values of k as
test)
How to spot optimal amount of clusters? - CORRECT ANSWER-Look for a kink in the curve
observing total distance (y-axis) and number of clusters (x-axis) -- kink is where marginal benefit
of adding another cluster starts to be small.
Supervised learning - CORRECT ANSWER-Classification -- know each data points attributes and
already know the right classification for the data points, already knowing the response. (more
common in analytics)
Unsupervised learning - CORRECT ANSWER-Clustering -- don't know the right grouping of our
data points up front. know their attributes but don't know what group to any of these points are
in. model must decide how to cluster based only on attributes of the data.
Box and Whisker Plots - CORRECT ANSWER-top and bottom of box are 25th and 75th percentile
of the values