Data Analytics (2IAB0)
Semester 2, 2020-2020
Data Analytics for Engineers
Exploratory Data Analysis (EDA) 2
Types of data 2
Elementary statistical plots 3
Summary statistics 3
Advanced statistical plots 4
Data Visualization (VIS) 6
Visualization 6
Colors and color de ciency 7
Idioms 8
Checklist for effective visualization 9
Data Mining Methods (DMM) 10
Data mining 10
Four methods 10
Linear model quality 11
K-Means clustering 11
Distances 12
Decision tree quality 12
Support of an item set (association rules) 12
Data Organization and Queries (ORG) 13
Data organization 13
SQL 14
Data Aggregation and Sampling (DAS) 16
Important concepts 16
Measurements and sampling 17
Data cleaning and ltering 17
Hypothesis Formulation and Testing (HYP) 19
Distributions 19
Metrics 19
Con dence intervals and hypothesis testing 20
, Exploratory Data Analysis (EDA)
In short:
• Types of data
• Elementary statistical plots
• Summary statistics
• Advanced statistical plots
Types of data
Important concepts
Data = raw, unorganized numbers, facts, etcetera
information = structured, meaningful and useful numbers and facts
Data types
• numerical = data that has intrinsic numerical value
‣ continuous = data that can attain any value on a given measurement scale
- interval = no xed zero point; only di erences have meaning
- ratio = xed zero point; ratios have meaning
‣ discrete = can only attain a nite number of values
• categorical = no intrinsic numerical value
‣ nominal = two or more outcomes that have no natural order
‣ ordinal = outcomes that have a natural order; sequential, diverging or cyclic
Examples:
• Temperature is interval data since 20°C ≠ 2 × 10°C
• length is ratio data since 20m = 2 × 10m
• categorical data (e.g. ratings) are sometimes labeled with numbers, but these numbers are meaningless,
so they are not numerical data
Tables
Tables are good for reading values and to draw attention to actual values. There are two kinds of tables:
• reference table = store all data in a table so that is can be looked up easily;
• demonstration table = a table to illustrate a point.
Key features of EDA
• getting to know the data before further analysis;
• extensively using plots;
• generating questions;
• detecting errors (what are reasonable values?; given one value, what could be the others?).