K-Means Clustering
The number of clusters 𝑲 controls how many groups the data are partitioned into. A small
value of 𝐾forces many observations into the same cluster, producing broad and less detailed
groupings, while a larger value of 𝐾creates more refined clusters but may lead to over-
segmentation.
The initialization of cluster centroids affects the solution found by the algorithm. Because
k-means converges to a local optimum, different initializations can lead to different final
cluster assignments, so the algorithm is often run multiple times and the best solution is
selected.
Agglomerative Hierarchical Clustering
The distance metric controls how similarity between observations is measured. Different
distance metrics define closeness in different ways and can lead to very different cluster
structures.
The linkage method controls how distances between clusters are calculated when
merging them. Different linkage choices, such as single or complete linkage, lead to
different dendrogram shapes and therefore different final cluster assignments.
Lasso Regression
Regularization parameter controls the strength of the L1 penalty applied to the regression
coefficients. A small penalty produces estimates similar to ordinary least squares, while a
large penalty forces many coefficients exactly to zero, resulting in a sparse model.
® Controls the trade-off between model complexity and generalization
® Stronger regularization reducing overfitting but increasing bias
Ridge Regression
Regularization parameter controls the strength of the L2 penalty applied to the regression
coefficients. A small penalty yields coefficients close to ordinary least squares estimates,
while a large penalty shrinks coefficients toward zero.
® Ridge regression does not set coefficients exactly to zero
® Reduces variance and improves stability when predictors are highly correlated
The number of clusters 𝑲 controls how many groups the data are partitioned into. A small
value of 𝐾forces many observations into the same cluster, producing broad and less detailed
groupings, while a larger value of 𝐾creates more refined clusters but may lead to over-
segmentation.
The initialization of cluster centroids affects the solution found by the algorithm. Because
k-means converges to a local optimum, different initializations can lead to different final
cluster assignments, so the algorithm is often run multiple times and the best solution is
selected.
Agglomerative Hierarchical Clustering
The distance metric controls how similarity between observations is measured. Different
distance metrics define closeness in different ways and can lead to very different cluster
structures.
The linkage method controls how distances between clusters are calculated when
merging them. Different linkage choices, such as single or complete linkage, lead to
different dendrogram shapes and therefore different final cluster assignments.
Lasso Regression
Regularization parameter controls the strength of the L1 penalty applied to the regression
coefficients. A small penalty produces estimates similar to ordinary least squares, while a
large penalty forces many coefficients exactly to zero, resulting in a sparse model.
® Controls the trade-off between model complexity and generalization
® Stronger regularization reducing overfitting but increasing bias
Ridge Regression
Regularization parameter controls the strength of the L2 penalty applied to the regression
coefficients. A small penalty yields coefficients close to ordinary least squares estimates,
while a large penalty shrinks coefficients toward zero.
® Ridge regression does not set coefficients exactly to zero
® Reduces variance and improves stability when predictors are highly correlated