,Data Mining: Concepts and Techniques
2nd Edition
Solution Manual
Jiawei Han and Micheline Kamber
The University of Illinois at Urbana-Champaign
c
°Morgan Kaufmann, 2006
Note: For Instructors’ reference only. Do not copy! Do not distribute!
,Contents
1 Introduction 3
1.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Data Preprocessing 13
2.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3 Data Warehouse and OLAP Technology: An Overview 31
3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4 Data Cube Computation and Data Generalization 41
4.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5 Mining Frequent Patterns, Associations, and Correlations 53
5.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6 Classification and Prediction 69
6.17 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
7 Cluster Analysis 79
7.13 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
8 Mining Stream, Time-Series, and Sequence Data 91
8.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
9 Graph Mining, Social Network Analysis, and Multirelational Data Mining 103
9.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
10 Mining Object, Spatial, Multimedia, Text, and Web Data 111
10.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
11 Applications and Trends in Data Mining 123
11.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
1
, Chapter 1
Introduction
1.11 Exercises
1.1. What is data mining? In your answer, address the following:
(a) Is it another hype?
(b) Is it a simple transformation of technology developed from databases, statistics, and machine learning?
(c) Explain how the evolution of database technology led to data mining.
(d) Describe the steps involved in data mining when viewed as a process of knowledge discovery.
Answer:
Data mining refers to the process or method that extracts or “mines” interesting knowledge or patterns
from large amounts of data.
(a) Is it another hype?
Data mining is not another hype. Instead, the need for data mining has arisen due to the wide
availability of huge amounts of data and the imminent need for turning such data into useful information
and knowledge. Thus, data mining can be viewed as the result of the natural evolution of information
technology.
(b) Is it a simple transformation of technology developed from databases, statistics, and machine learning?
No. Data mining is more than a simple transformation of technology developed from databases, sta-
tistics, and machine learning. Instead, data mining involves an integration, rather than a simple
transformation, of techniques from multiple disciplines such as database technology, statistics, ma-
chine learning, high-performance computing, pattern recognition, neural networks, data visualization,
information retrieval, image and signal processing, and spatial data analysis.
(c) Explain how the evolution of database technology led to data mining.
Database technology began with the development of data collection and database creation mechanisms
that led to the development of effective mechanisms for data management including data storage and
retrieval, and query and transaction processing. The large number of database systems offering query
and transaction processing eventually and naturally led to the need for data analysis and understanding.
Hence, data mining began its development out of this necessity.
(d) Describe the steps involved in data mining when viewed as a process of knowledge discovery.
The steps involved in data mining when viewed as a process of knowledge discovery are as follows:
• Data cleaning, a process that removes or transforms noise and inconsistent data
• Data integration, where multiple data sources may be combined
3