Introduction
Big data
1. Data volume : size of the data sets that an organization has
collected to be analyzed and processed. Quantity of the generated
and stored data.
2. Data velocity : data is collected at an enormous speed. Compared
with small data, big data is produced more continually. 2 kinds of
velocity related to big data are the frequency of generation and
frequency of handling, recording, publishing.
3. Data variety : type and the nature of the data. Big data is
unstructured and heterogenous.
4. Data veracity : the reliability of the data (quality and value of the
data). The data must not only be large but must also achieve value
in the analysis of it.
What is data
-> a collection of data objects and their attributes
> an attribute is a property or characteristic of an object. The more
attributes, the more information about an object. The attributes describe
an object (which is a record, point, case, sample, instance or entity)
Attribute values: numbers or symbols assigned to an attribute
The same attribute can have different attribute values (height -> meters
and feet)
Different attributes can be mapped to the same set of values (ID, age ->
integers)
Types of attributes
Nominal: category or state (categorical attribute) -> ID, eye color,
ZIP code, sex
Ordinal: ranking, grade, height
Interval: has values, measured using intervals that show order,
direction and difference in values -> calendar dates, temperature in
C
Ratio: a numeric attribute with an inherent zero-point ->
temperature in K, length, time, counts
1
, discrete attribute -> has a finite or countable set of values (counts, zip
codes)
continuous attribute -> floating point variables (T, height, weight)
Dataset types
Record
Data matrix
Document data
Transaction data
Graph
www
protein interactions
Ordered
spatial data
temporal data
sequential data
molecular sequences
1. record data: data that consists of a collection of records, each of
which consists of a fixed set of attributes
a. data matrix (QR or barcode)
-> 2D code consisting of black and white dots or dots arranged
in a square or rectangular pattern (m by n matrix)
2
Big data
1. Data volume : size of the data sets that an organization has
collected to be analyzed and processed. Quantity of the generated
and stored data.
2. Data velocity : data is collected at an enormous speed. Compared
with small data, big data is produced more continually. 2 kinds of
velocity related to big data are the frequency of generation and
frequency of handling, recording, publishing.
3. Data variety : type and the nature of the data. Big data is
unstructured and heterogenous.
4. Data veracity : the reliability of the data (quality and value of the
data). The data must not only be large but must also achieve value
in the analysis of it.
What is data
-> a collection of data objects and their attributes
> an attribute is a property or characteristic of an object. The more
attributes, the more information about an object. The attributes describe
an object (which is a record, point, case, sample, instance or entity)
Attribute values: numbers or symbols assigned to an attribute
The same attribute can have different attribute values (height -> meters
and feet)
Different attributes can be mapped to the same set of values (ID, age ->
integers)
Types of attributes
Nominal: category or state (categorical attribute) -> ID, eye color,
ZIP code, sex
Ordinal: ranking, grade, height
Interval: has values, measured using intervals that show order,
direction and difference in values -> calendar dates, temperature in
C
Ratio: a numeric attribute with an inherent zero-point ->
temperature in K, length, time, counts
1
, discrete attribute -> has a finite or countable set of values (counts, zip
codes)
continuous attribute -> floating point variables (T, height, weight)
Dataset types
Record
Data matrix
Document data
Transaction data
Graph
www
protein interactions
Ordered
spatial data
temporal data
sequential data
molecular sequences
1. record data: data that consists of a collection of records, each of
which consists of a fixed set of attributes
a. data matrix (QR or barcode)
-> 2D code consisting of black and white dots or dots arranged
in a square or rectangular pattern (m by n matrix)
2