QUESTIONS AND 100% VERIFIED ANSWERS |
LATEST 2025/ 2026 UPDATE | GRADED A+ | 100%
SUCCESS
The sample of data must be large enough to contain significant
information, yet small enough to be manipulated quickly.
Data Preparation - correct answer -The data in a data set are often said
to be "dirty" and "raw" before they have been preprocessed.
We need to put them into a form that is best suited for a data-mining
algorithm.
Data preparation makes heavy use of the descriptive statistics and data
visualization methods.
Unsupervised learning application - correct answer -The goal is to use
the variable values to identify relationships between observations.
Qualitative assessments, such as how well the results match expert
judgment, are used to assess unsupervised learning methods.
,Cluster Analysis - correct answer -The goal of this unsupervised learning
method is to segment observations into similar groups based on the
observed variables
Can be employed during the data preparation step to identify variables
or observations that can be aggregated or removed from consideration
Types of Clustering Methods - correct answer -Hierarchical and K-Means
Euclidean distance - correct answer -Most common method to measure
dissimilarity between observations, when observations include
continuous variables
Hierarchical clustering - correct answer -Bottom-up approach
Determines the similarity of two clusters by considering the similarity
between the observations composing either cluster
Single linkage - correct answer -The similarity between two clusters is
defined by the similarity of the pair of observations (one from each
cluster) that are the most similar
, Complete linkage - correct answer -This clustering method defines the
similarity between two clusters as the similarity of the pair of
observations (one from each cluster) that are the most different
Average linkage - correct answer -Defines the similarity between two
clusters to be the average similarity computed over all pairs of
observations between the two clusters
Ward's method - correct answer -Computes dissimilarity as the sum of
the squared differences in similarity between each individual
observation in the union of the two clusters and the centroid of the
resulting merged cluster
k-Means clustering - correct answer -Given a value of k, the k-means
algorithm randomly partitions the observations into k clusters.
After all observations have been assigned to a cluster, the resulting
cluster centroids are calculated.
Using the updated cluster centroids, all observations are reassigned to
the cluster with the closest centroid