, Data Mining and Analysis:
Fundamental Concepts and Algorithms
dataminingbook.info
Mohammed J. Zaki1 Wagner Meira Jr.2
1
Department of Computer Science
Rensselaer Polytechnic Institute, Troy, NY, USA
2
Department of Computer Science
Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
Chapter 1: Data Mining and Analysis
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 1: Data Mining and Analysis
,Data Matrix
Data can often be represented or abstracted as an n × d data matrix, with n
rows and d columns, given as
X1 X2 · · · Xd
x1
x11 x12 · · · x1d
D =
x2 x21 x22 · · · x2d
.. .. .. .. ..
. . . . .
xn xn1 xn2 · · · xnd
Rows: Also called instances, examples, records, transactions, objects,
points, feature-vectors, etc. Given as a d-tuple
xi = (xi1 , xi2 , . . . , xid )
Columns: Also called attributes, properties, features, dimensions,
variables, f ields, etc. Given as an n-tuple
Xj = (x1j , x2j , . . . , xnj )
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 1: Data Mining and Analysis
, Iris Dataset Extract
Sepal Sepal Petal Petal
Class
length width length width
X1 X2 X3 X4 X5
x1 5.9 3.0 4.2 1.5 Iris-versicolor
x2 6.9 3.1 4.9 1.5 Iris-versicolor
x3 6.6 2.9 4.6 1.3 Iris-versicolor
x4 4.6 3.2 1.4 0.2 Iris-setosa
x5 6.0 2.2 4.0 1.0 Iris-versicolor
x6 4.7 3.2 1.3 0.2 Iris-setosa
x7 6.5 3.0 5.8 2.2 Iris-virginica
x8 5.8 2.7 5.1 1.9 Iris-virginica
.. .. .. .. .. ..
. . . . . .
x149 7.7 3.8 6.7 2.2 Iris-virginica
x150 5.1 3.4 1.5 0.2 Iris-setosa
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 1: Data Mining and Analysis
Fundamental Concepts and Algorithms
dataminingbook.info
Mohammed J. Zaki1 Wagner Meira Jr.2
1
Department of Computer Science
Rensselaer Polytechnic Institute, Troy, NY, USA
2
Department of Computer Science
Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
Chapter 1: Data Mining and Analysis
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 1: Data Mining and Analysis
,Data Matrix
Data can often be represented or abstracted as an n × d data matrix, with n
rows and d columns, given as
X1 X2 · · · Xd
x1
x11 x12 · · · x1d
D =
x2 x21 x22 · · · x2d
.. .. .. .. ..
. . . . .
xn xn1 xn2 · · · xnd
Rows: Also called instances, examples, records, transactions, objects,
points, feature-vectors, etc. Given as a d-tuple
xi = (xi1 , xi2 , . . . , xid )
Columns: Also called attributes, properties, features, dimensions,
variables, f ields, etc. Given as an n-tuple
Xj = (x1j , x2j , . . . , xnj )
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 1: Data Mining and Analysis
, Iris Dataset Extract
Sepal Sepal Petal Petal
Class
length width length width
X1 X2 X3 X4 X5
x1 5.9 3.0 4.2 1.5 Iris-versicolor
x2 6.9 3.1 4.9 1.5 Iris-versicolor
x3 6.6 2.9 4.6 1.3 Iris-versicolor
x4 4.6 3.2 1.4 0.2 Iris-setosa
x5 6.0 2.2 4.0 1.0 Iris-versicolor
x6 4.7 3.2 1.3 0.2 Iris-setosa
x7 6.5 3.0 5.8 2.2 Iris-virginica
x8 5.8 2.7 5.1 1.9 Iris-virginica
.. .. .. .. .. ..
. . . . . .
x149 7.7 3.8 6.7 2.2 Iris-virginica
x150 5.1 3.4 1.5 0.2 Iris-setosa
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 1: Data Mining and Analysis