already passed 2025/2026
cluster analysis - correct answer ✔a descriptive analytics technique used to discover natural groupings
of objects (answers "what has happened?" questions)
what are the problem characteristics of cluster analysis? - correct answer ✔1. have information on data
that describes the objects (ex: customers)
2. no prior knowledge of how the objects are related to each other (ex: purchasing behavior)
3. the objective is to organize objects into groups (ex: ex: market segment)
what are the two similarity measures and what do they measure? - correct answer ✔1. Euclidean
Distance for numerical data (ex: height & weight)
2. Matching Coefficient for categorical data (ex: levels of income)
- both gauge whether a group of objects are similar or dissimilar to one another
Euclidean Distance - correct answer ✔- for numerical data
- the distance between two objects is the length of a straight like between them
- standardize the numerical data to make it unit-free before calculating the distance measure
Matching Coefficient - correct answer ✔- for categorical data
- number of columns with matching categorical values/total number of columns of categorical data
z-score - correct answer ✔(raw value - mean)/standard deviation
- used to standardize data for Euclidean Distance
,what are some business application of cluster analysis? - correct answer ✔1. marketing: divide
customers into homogeneous groups for target marketing
2. finance: divide clients into homogeneous groups for personalized finance advice
3. operations: identify outliers for quality control
which of the following is true about cluster analysis? (check all that apply) - correct answer ✔- it is to
answer what has happened questions
- it is used to discover natural groupings of objects
- it is a descriptive analytics technique
which of the following is a characteristic of a cluster analysis problem? (check all that apply) - correct
answer ✔- it is about how to organize objects into groups
- the data that describes the object must be given
- its objective is to maximize similarities of objects within groups
you have data on the weight and height of patients. which similarity measure should be used to
calculate how similar a group of patients is to one another? - correct answer ✔Euclidean distance
you have data on the gender and income levels of customers. which similarity measure should be used
to calculate how similar a group of customers is to one another? - correct answer ✔matching
coefficient
match the description on the left with the measure to use on the right:
1. it is the length of a straight line between two objects
2. it requires replacing the raw value of data with its z-score
3. the lower the measure the better
4. it is for categorical data
, 5. it is a ratio of number of columns with matching categorical values to the total number of categorical
columsn - correct answer ✔1. Euclidean
2. Euclidean
3. Euclidean
4. matching coefficient
5. matching coefficient
which of the following is a business application of cluster analysis? - correct answer ✔outlier detection
what are the two cluster analysis methods? - correct answer ✔1. hierarchical clustering
2. k-means clustering
Hierarchical Clustering - correct answer ✔- useful for small data sets (less than 500 rows of data)
- supports numeric and categorical/binary data
- sensitive to outliers
- experiment with different methods to calculate the distance between clusters
K-Means Clustering - correct answer ✔- useful for large data sets (over 500 rows of data)
- supports only numeric data
- less sensitive to outliers
- experiment with different numbers of clusters
how do we estimate the number of clusters? - correct answer ✔cubic clustering criterion (CCC)
cubic clustering criterion (CCC) - correct answer ✔- a metric related to R2, the proportion of variance in
the data accounted for by the clusters