100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.6 TrustPilot
logo-home
Summary

Summary Introduction to Analytics D0H61a

Rating
-
Sold
-
Pages
82
Uploaded on
15-06-2025
Written in
2024/2025

Course by prof. Jochen de Weerdt. Very comprehensive summary comprising all of the theory required to make the exam. If something is not included in the document (which will be rare), I always refer to the slides. Many subjects that may be confusing at first when going through the slides are described intuitively in this summary. Almost every link (url) that was used in the slides as examples or clarifications are included as well. 82 pages may look like much, but it includes a lot of tables and figures, which make the summary very easy and untuitive to go through.

Show more Read less
Institution
Course














Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Course

Document information

Uploaded on
June 15, 2025
Number of pages
82
Written in
2024/2025
Type
Summary

Subjects

Content preview

Introduction to Analytics

2024-2025 Finn Germe



Contents

1 The Data Analytics Process 4
1.1 What is it all about? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 Supervised Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.2 Unsupervised algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 The Data Analytics Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5.1 MLOps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5.2 Involved Parties and Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.6 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.7 Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Data Preprocessing 12
2.1 Data Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.1 Data Leakage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Data Cleaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Data Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.1 Standardization, Normalization & Categorization . . . . . . . . . . . . . . . . . . . . 14
2.3.2 Dummy Variables and Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.3 Feature Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.4 Feature Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.5 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3 Exploratory Data Analysis EDA 18
3.1 Data Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.1 The Essence: Human Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.2 Human limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.3 What makes a good visualization? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 An example EDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4 Predictive Analytics – Decision Trees 23
4.1 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.3 The ID3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3.1 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3.2 Information Gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3.3 Impurity Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.4 C4.5 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.5 Countering Overfitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.5.1 Possible Fixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.5.2 Non-Linearity and Decision Boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.6 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

,5 Predictive Analytics – Model Evaluation 30
5.1 Classification Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.1.1 Threshold Dependency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2 Cost-Sensitive Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2.1 Inverse Class Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.2.2 Classifying Using Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.3 Threshold-Independent Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.3.1 Receiver Operating Characteristic ROC Curve . . . . . . . . . . . . . . . . . . . . . . 34
5.3.2 Precision-Recall Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.4 Cross-Validation and Tuning Analytical Models . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.5 Model Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6 Predicitve Analytics – Regression 39
6.1 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.2 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.3 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.3.1 Stepwise Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.3.2 Elastic Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.4 Regression Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.4.1 Confidence intervals for β . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.5 Regression Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.5.1 Splitting Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

7 Predictive Analytics – Other 46
7.1 k-Nearest Neighbours – kNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
7.1.1 Weighted Voting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.1.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.2 Support Vector Machines – SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.2.1 The Dual Problem and Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
7.2.2 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
7.2.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
7.3 Naïve Bayes and Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
7.3.1 Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.4 Others: RF, XGB, Deep Learning? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

8 Descriptive Analytics I – Clustering 51
8.1 Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
8.2 Partitional Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
8.2.1 K-means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
8.2.2 Choosing the number of clusters K . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
8.2.3 K-means++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
8.3 Other Clustering Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
8.4 Cluster Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
8.4.1 Internal Cluster Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
8.4.2 External Cluster Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
8.4.3 Domain Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
8.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59




0

Finn Germe 2/82

,9 Descriptive Analytics II – Association Rules 59
9.1 Association Rule Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
9.1.1 Support and Confidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
9.1.2 Mining Association Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
9.1.3 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
9.2 Sequential Pattern Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
9.2.1 Temporal Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
9.2.2 Sequential Pattern Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
9.2.3 Algorithms for sequential pattern mining . . . . . . . . . . . . . . . . . . . . . . . . . 63
9.2.4 Timing Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
9.3 Conclusion – ARM and Seq. Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

10 Fraud Analytics 67
10.1 Predicitve Analytics for Fraud Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
10.1.1 Synthetic Minority Oversampling TEchnique SMOTE . . . . . . . . . . . . . . . . . 69
10.1.2 Descriptive Analytics for Fraud Detection . . . . . . . . . . . . . . . . . . . . . . . . 70
10.1.3 Outlier Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
10.1.4 Local Outlier Factor LOF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
10.1.5 Isolation Forests IF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
10.2 Benford’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
10.2.1 First-two digits? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
10.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

11 Business Applications 75
11.1 Credit Risk Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
11.1.1 Retail Credit Risk Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
11.1.2 Corporate Credit Risk Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
11.2 Marketing Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
11.2.1 Customer churn prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
11.2.2 Customer lifetime value modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
11.2.3 Response modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
11.2.4 Uplift modeling – prescriptive analytics . . . . . . . . . . . . . . . . . . . . . . . . . . 81
11.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82




0

Finn Germe 3/82

, 1 The Data Analytics Process




Figure 1: Using data analytics to create real business value

Data contains value and knowledge. Some claim data is the new oil, professor does not agree. To
extract this knowledge you have to be able to:
∗ Store it

∗ Manage it
∗ Analyze it

1.1 What is it all about?
What is AI?

AI is a field of computer science dedicated to solving problems which otherwise require human intelli-
gence—for example, pattern recognition, learning, and generalization
Machine Learning ML is the scientific study of algorithms and statistical models that computer sys-
tems use to perform a specific task without using explicit instructions, relying on patterns and
inference instead

∗ ML is seen as a subset of AI
∗ ML algorithms build a mathematical model based on training data, to make predictions or
decisions without being explicitly programmed to perform the task




1.1

Finn Germe 4/82

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
SmwBoy123 Katholieke Universiteit Leuven
Follow You need to be logged in order to follow users or courses
Sold
73
Member since
4 year
Number of followers
26
Documents
9
Last sold
3 days ago

4.5

6 reviews

5
3
4
3
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions