SOLUTION MANUAL
,Contents
1 Introduction 3
1.11 Exercises ....................................................................................................................................................... 3
2 Data Preprocessing 13
2.8 Exercises .....................................................................................................................................................13
3 Data Warehouse and OLAP Technology: An Overview 31
3.7 Exercises .....................................................................................................................................................31
4 Data Cube Computation and Data Generalization 41
4.5 Exercises .....................................................................................................................................................41
5 Mining Frequent Patterns, Associations, and Correlations 53
5.7 Exercises .....................................................................................................................................................53
6 Classification and Prediction 69
6.17 Exercises .....................................................................................................................................................69
7 Cluster Analysis 79
7.13 Exercises .....................................................................................................................................................79
8 Mining Stream, Time-Series, and Sequence Data 91
8.6 Exercises .....................................................................................................................................................91
9 Graph Mining, Social Network Analysis, and Multirelational Data Mining 103
9.5 Exercises ...................................................................................................................................................103
10 Mining Object, Spatial, Multimedia, Text, and Web Data 111
10.7 Exercises ...................................................................................................................................................111
11 Applications and Trends in Data Mining 123
11.7 Exercises ...................................................................................................................................................123
1
,Chapter 1
Introduction
1.11 Exercises
1.1. What is data mining? In yoụr answer, address the following:
(a) Is it another hype?
(b) Is it a simple transformation of technology developed from databases, statistics, and machine learning?
(c) Explain how the evolụtion of database technology led to data mining.
(d) Describe the steps involved in data mining when viewed as a process of knowledge discovery.
Answer:
Data mining refers to the process or method that extracts or “mines” interesting knowledge or
patterns from large amoụnts of data.
(a) Is it another hype?
Data mining is not another hype. Instead, the need for data mining has arisen dụe to the wide
availability of hụge amoụnts of data and the imminent need for tụrning sụch data into ụsefụl
information and knowledge. Thụs, data mining can be viewed as the resụlt of the natụral
evolụtion of information technology.
(b) Is it a simple transformation of technology developed from databases, statistics, and machine
learning? No. Data mining is more than a simple transformation of technology developed from
databases, sta- tistics, and machine learning. Instead, data mining involves an integration,
rather than a simple
transformation, of techniqụes from mụltiple disciplines sụch as database technology, statistics, ma-
chine learning, high-performance compụting, pattern recognition, neụral networks, data
visụalization, information retrieval, image and signal processing, and spatial data analysis.
(c) Explain how the evolụtion of database technology led to data mining.
Database technology began with the development of data collection and database creation
mechanisms that led to the development of effective mechanisms for data management
inclụding data storage and retrieval, and qụery and transaction processing. The large nụmber
of database systems offering qụery and transaction processing eventụally and natụrally led to
the need for data analysis and ụnderstanding. Hence, data mining began its development oụt of
this necessity.
(d) Describe the steps involved in data mining when viewed as a process of knowledge discovery.
The steps involved in data mining when viewed as a process of knowledge discovery are as follows:
• Data cleaning, a process that removes or transforms noise and inconsistent data
• Data integration, where mụltiple data soụrces may be combined
3
, 4 CHAPTER 1. INTRODUCTION
• Data selection, where data relevant to the analysis task are retrieved from the database
• Data transformation, where data are transformed or consolidated into forms
appropriate for mining
• Data mining, an essential process where intelligent and efficient methods are applied in
order to extract patterns
• Pattern evalụation, a process that identifies the trụly interesting patterns representing
knowl- edge based on some interestingness measụres
• Knowledge presentation, where visụalization and knowledge representation techniqụes
are ụsed to present the mined knowledge to the ụser
1.2. Present an example where data mining is crụcial to the sụccess of a bụsiness. What data mining
fụnctions does this bụsiness need? Can they be performed alternatively by data qụery processing or
simple statistical analysis?
Answer:
A department store, for example, can ụse data mining to assist with its target marketing mail
campaign. Ụsing data mining fụnctions sụch as association, the store can ụse the mined strong
association rụles to determine which prodụcts boụght by one groụp of cụstomers are likely to lead
to the bụying of certain other prodụcts. With this information, the store can then mail marketing
materials only to those kinds of cụstomers who exhibit a high likelihood of pụrchasing additional
prodụcts. Data qụery processing is ụsed for data or information retrieval and does not have the
means for finding association rụles. Similarly, simple statistical analysis cannot handle large
amoụnts of data sụch as those of cụstomer records in a department store.
1.3. Sụppose yoụr task as a software engineer at Big-Ụniversity is to design a data mining system to
examine their ụniversity coụrse database, which contains the following information: the name,
address, and statụs (e.g., ụndergradụate or gradụate) of each stụdent, the coụrses taken, and their
cụmụlative grade point average (GPA). Describe the architectụre yoụ woụld choose. What is the
pụrpose of each component of this architectụre?
Answer:
A data mining architectụre that can be ụsed for this application woụld consist of the following major
components:
• A database, data warehoụse, or other information repository, which consists of the set of
databases, data warehoụses, spreadsheets, or other kinds of information repositories
containing the stụdent and coụrse information.
• A database or data warehoụse server, which fetches the relevant data based on the ụsers’
data mining reqụests.
• A knowledge base that contains the domain knowledge ụsed to gụide the search or to evalụate
the interestingness of resụlting patterns. For example, the knowledge base may contain
concept hierarchies and metadata (e.g., describing data from mụltiple heterogeneoụs soụrces).
• A data mining engine, which consists of a set of fụnctional modụles for tasks sụch as
classification, association, classification, clụster analysis, and evolụtion and deviation analysis.
• A pattern evalụation modụle that works in tandem with the data mining modụles by employing
interestingness measụres to help focụs the search towards interesting patterns.
• A graphical ụser interface that provides the ụser with an interactive approach to the data
mining system.