2025/2026
Why Data Mining? - Answers Explosive data growth (in KB, MB, GB,TB, PB, EB, and ZB)
What is data mining? - Answers Knowledge discovery from data (Extraction of interesting
patterns or knowledge from huge amounts of data.)
Benefits of data mining - Answers Scalability and efficiency
The four views of data mining - Answers Data, Application, Knowledge, Technique
What are the 5Vs of Data Mining? - Answers Volume, Variety, Velocity, Veracity, Value
Relational, transactional data (Data View) - Answers E.g., student records, bank accounts, store
purchases
Sequential, temporal, streaming data (Data View) - Answers E.g., gene sequences, stock prices,
sensor readings
Spatial, spatial-temporal data (Data View) - Answers E.g., land use, bird migration, traffic
condition
Text, multimedia, Web data (Data View) - Answers E.g., news articles, audio/video/image data,
hypertext
Graph, network data (Data View) - Answers E.g., social network, power grid, co-authorship
Market Analysis, target advertisement (Application View) - Answers E.g., customer profiling,
product recommendation
Healthcare, medical research (Application View) - Answers E.g., disease diagnosis, patient care,
drug discovery
Science and engineering (Application View) - Answers E.g., air pollution, marine life, electric
vehicles
Security (Application View) - Answers E.g., surveillance, intrusion/crime, fraud, cyberattack
Government, nonprofit (Application View) - Answers E.g., urban planning, traffic control,
education
Frequent pattern , correlation (Knowledge View) - Answers E.g., Songs listened together or in
certain sequence
Categorization (Knowledge View) - Answers E.g., Similarity among user with certain purchases,
differences between two patient groups
, Anomaly, outliers (Knowledge View) - Answers E.g., sensor errors, fraud activities, extreme
events
Changes over time (Knowledge View) - Answers E.g., emerging new patterns, shift of user
interest
What are the five different techniques for data mining? - Answers Frequent pattern analysis,
classification/prediction, clustering, anomaly detection, trend and evolution analysis
Frequent Pattern Analysis - Answers Includes frequent itemset, frequent sequence, frequent
structure, association rules, correlation analysis
Classification - Answers Includes pre-defined classes, training data, and distinguishable classes
Prediction - Answers Includes numerical prediction (continuous) values (e.g. weather, stock
price, traffic)
Clustering - Answers Includes no pre-defined classes, intra-cluster similarity, inter-cluster
dissimilarity
Anomaly Detection - Answers Includes anomalies or outliers (e.g. error, noise, fraud, extreme
events)
Trend and Evolution Analysis - Answers Includes changes over time, overall trend, periodical
patterns, anomalies (e.g. Google Trends)
What Steps Form The Data Mining Pipeline? - Answers Data Understanding, Data Preprocessing,
Data Warehousing, Data Modeling, Pattern Evaluation
Data Understanding - Answers Answering questions like: What types of data? What do they look
like?
Includes statistics and visualization
observes similarity vs dissimilarity
data preprocessing - Answers Preparing the data for the mining process, includes the following
operations: Data Integration, Data Transformation, Data Reduction, Data Cleaning
What Potential Issues Are There With Data? - Answers Missing data, errors, inconsistency
Data Warehousing - Answers the collection, storage, and retrieval of data in electronic files.
Includes operational data. Can involve a data cube and OLAP