Samenvatting

Summary Introduction to Analytics D0H61a

Beoordeling

Verkocht

Pagina's

Geüpload op

15-06-2025

Geschreven in

2024/2025

Course by prof. Jochen de Weerdt. Very comprehensive summary comprising all of the theory required to make the exam. If something is not included in the document (which will be rare), I always refer to the slides. Many subjects that may be confusing at first when going through the slides are described intuitively in this summary. Almost every link (url) that was used in the slides as examples or clarifications are included as well. 82 pages may look like much, but it includes a lot of tables and figures, which make the summary very easy and untuitive to go through.

Meer zien Lees minder

Instelling

Vak

Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Meld schending auteursrecht

Geschreven voor

Instelling: Katholieke Universiteit Leuven (KU Leuven)
Studie: Bachelor Handelsingenieur
Vak: Introduction to Analytics (D0H61A)

Alle documenten voor dit vak (1)

Documentinformatie

Geüpload op: 15 juni 2025
Aantal pagina's: 82
Geschreven in: 2024/2025
Type: Samenvatting

Onderwerpen

data analytics
algorithms
data
supervised algorithms
unsupervised algorithms
big data
data preprocessing
exploratory data analysis
eda
predictive analytics
decision trees
model evaluation
regression
k

Voorbeeld van de inhoud

Introduction to Analytics

2024-2025 Finn Germe

Contents

1 The Data Analytics Process 4
1.1 What is it all about? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 Supervised Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.2 Unsupervised algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 The Data Analytics Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5.1 MLOps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5.2 Involved Parties and Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.6 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.7 Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Data Preprocessing 12
2.1 Data Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.1 Data Leakage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Data Cleaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Data Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.1 Standardization, Normalization & Categorization . . . . . . . . . . . . . . . . . . . . 14
2.3.2 Dummy Variables and Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.3 Feature Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.4 Feature Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.5 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3 Exploratory Data Analysis EDA 18
3.1 Data Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.1 The Essence: Human Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.2 Human limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.3 What makes a good visualization? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 An example EDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4 Predictive Analytics – Decision Trees 23
4.1 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.3 The ID3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3.1 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3.2 Information Gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3.3 Impurity Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.4 C4.5 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.5 Countering Overfitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.5.1 Possible Fixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.5.2 Non-Linearity and Decision Boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.6 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

,5 Predictive Analytics – Model Evaluation 30
5.1 Classification Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.1.1 Threshold Dependency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2 Cost-Sensitive Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2.1 Inverse Class Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.2.2 Classifying Using Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.3 Threshold-Independent Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.3.1 Receiver Operating Characteristic ROC Curve . . . . . . . . . . . . . . . . . . . . . . 34
5.3.2 Precision-Recall Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.4 Cross-Validation and Tuning Analytical Models . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.5 Model Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6 Predicitve Analytics – Regression 39
6.1 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.2 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.3 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.3.1 Stepwise Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.3.2 Elastic Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.4 Regression Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.4.1 Confidence intervals for β . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.5 Regression Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.5.1 Splitting Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

7 Predictive Analytics – Other 46
7.1 k-Nearest Neighbours – kNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
7.1.1 Weighted Voting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.1.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.2 Support Vector Machines – SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.2.1 The Dual Problem and Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
7.2.2 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
7.2.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
7.3 Naïve Bayes and Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
7.3.1 Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.4 Others: RF, XGB, Deep Learning? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

8 Descriptive Analytics I – Clustering 51
8.1 Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
8.2 Partitional Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
8.2.1 K-means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
8.2.2 Choosing the number of clusters K . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
8.2.3 K-means++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
8.3 Other Clustering Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
8.4 Cluster Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
8.4.1 Internal Cluster Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
8.4.2 External Cluster Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
8.4.3 Domain Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
8.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

0

Finn Germe 2/82

,9 Descriptive Analytics II – Association Rules 59
9.1 Association Rule Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
9.1.1 Support and Confidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
9.1.2 Mining Association Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
9.1.3 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
9.2 Sequential Pattern Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
9.2.1 Temporal Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
9.2.2 Sequential Pattern Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
9.2.3 Algorithms for sequential pattern mining . . . . . . . . . . . . . . . . . . . . . . . . . 63
9.2.4 Timing Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
9.3 Conclusion – ARM and Seq. Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

10 Fraud Analytics 67
10.1 Predicitve Analytics for Fraud Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
10.1.1 Synthetic Minority Oversampling TEchnique SMOTE . . . . . . . . . . . . . . . . . 69
10.1.2 Descriptive Analytics for Fraud Detection . . . . . . . . . . . . . . . . . . . . . . . . 70
10.1.3 Outlier Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
10.1.4 Local Outlier Factor LOF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
10.1.5 Isolation Forests IF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
10.2 Benford’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
10.2.1 First-two digits? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
10.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

11 Business Applications 75
11.1 Credit Risk Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
11.1.1 Retail Credit Risk Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
11.1.2 Corporate Credit Risk Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
11.2 Marketing Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
11.2.1 Customer churn prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
11.2.2 Customer lifetime value modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
11.2.3 Response modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
11.2.4 Uplift modeling – prescriptive analytics . . . . . . . . . . . . . . . . . . . . . . . . . . 81
11.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

0

Finn Germe 3/82

, 1 The Data Analytics Process

Figure 1: Using data analytics to create real business value

Data contains value and knowledge. Some claim data is the new oil, professor does not agree. To
extract this knowledge you have to be able to:
∗ Store it

∗ Manage it
∗ Analyze it

1.1 What is it all about?
What is AI?

AI is a field of computer science dedicated to solving problems which otherwise require human intelli-
gence—for example, pattern recognition, learning, and generalization
Machine Learning ML is the scientific study of algorithms and statistical models that computer sys-
tems use to perform a specific task without using explicit instructions, relying on patterns and
inference instead

∗ ML is seen as a subset of AI
∗ ML algorithms build a mathematical model based on training data, to make predictions or
decisions without being explicitly programmed to perform the task

1.1

Finn Germe 4/82

€9,46

Krijg toegang tot het volledige document:

100% tevredenheidsgarantie

Direct beschikbaar na je betaling

Lees online óf als PDF

Geen vaste maandelijkse kosten

Maak kennis met de verkoper

SmwBoy123

4,5

(6)

Maak kennis met de verkoper

SmwBoy123 Katholieke Universiteit Leuven

Bekijk profiel

Volgen

Verkocht

Lid sinds

4 jaar

Aantal volgers

Documenten

Laatst verkocht

3 dagen geleden

4,5

6 beoordelingen

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper SmwBoy123. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €9,46. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 47909 samenvattingen verkocht Opgericht in 2010, al 16 jaar dé plek om samenvattingen te kopen

Summary Introduction to Analytics D0H61a

Geschreven voor

Documentinformatie

Onderwerpen

Voorbeeld van de inhoud

Meer vakken binnen Katholieke Universiteit Leuven (KU Leuven) > Bachelor Handelsingenieur

Maak kennis met de verkoper

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Niet tevreden? Kies een ander document

Betaal zoals je wilt, start meteen met leren

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?