100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

Samenvatting Advanced Analytics in a Big Data World (D0S06B)

Rating
-
Sold
2
Pages
91
Uploaded on
12-03-2025
Written in
2023/2024

Summary of the full course based on notes and slides for the course Advanced Analytics in a Big Data World (D0S06B) HIR (B) 2nd Master. Successful first session.

Institution
Course











Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Course

Document information

Uploaded on
March 12, 2025
Number of pages
91
Written in
2023/2024
Type
Summary

Subjects

Content preview

ADVANCED ANALYTICS
Prof. Seppe vanden Broucke




KU Leuven

,TABLE OF CONTENTS
Table of Contents...................................................................................................................................1
1 Introduction........................................................................................................................................4
1.1 Setting the Scene.........................................................................................................................4
1.2 Components of Data Science.......................................................................................................4
1.3 Process, People, and Problems....................................................................................................5
2 Preprocessing and Feature Engineering..............................................................................................7
2.1 Preprocessing Steps.....................................................................................................................7
2.2 Feature Engineering...................................................................................................................10
2.3 Conclusion.................................................................................................................................10
3 Supervised Learning..........................................................................................................................12
3.1 (Logistic) Regression..................................................................................................................12
3.2 Decision and Regression Trees...................................................................................................13
3.3 K-NN...........................................................................................................................................15
4 Model Evaluation..............................................................................................................................16
4.1 Introduction...............................................................................................................................16
4.2 Classification Performance.........................................................................................................16
4.3 Regression Performance............................................................................................................19
4.4 Cross-Validation and Tuning......................................................................................................19
4.5 Additional Notes........................................................................................................................20
4.6 Monitoring and Maintenance....................................................................................................21
5 Ensemble Modelling: Bagging and Boosting.....................................................................................23
5.1 Introduction...............................................................................................................................23
5.2 Bagging......................................................................................................................................23
5.3 Boosting.....................................................................................................................................24
5.4 Comparing Bagging and Boosting..............................................................................................25
6 Interpretability..................................................................................................................................26
6.1 Introduction...............................................................................................................................26
6.2 Feature importance...................................................................................................................26
6.3 Partial Dependence Plots...........................................................................................................27
6.4 Individual Conditional Expectation plots....................................................................................27
6.5 LIME...........................................................................................................................................27
6.6 Shapley values...........................................................................................................................28
6.7 Conclusion.................................................................................................................................28


1

,7 Deep Learning Part 1: Foundations and Images................................................................................29
7.1 Introduction...............................................................................................................................29
7.2 Foundations of artificial neural networks..................................................................................30
7.3 Delving deeper into Artificial Neural Networks..........................................................................31
7.4 The convolutional architecture..................................................................................................33
7.5 Interpretation of convolutional neural networks.......................................................................35
7.6 Generative models for images...................................................................................................37
8 Unsupervised Learning.....................................................................................................................45
8.1 Frequent itemset and association rule mining...........................................................................45
8.2 Clustering...................................................................................................................................47
8.3 Dimensionality reduction...........................................................................................................50
8.4 Anomaly detection.....................................................................................................................51
9 Data Science Tools............................................................................................................................53
9.1 In-memory analytics..................................................................................................................53
9.2 Python and R..............................................................................................................................53
9.3 Visualization...............................................................................................................................53
9.4 The road to big data...................................................................................................................54
9.5 Notebooks and development environments.............................................................................54
9.6 Labeling......................................................................................................................................55
9.7 File formats................................................................................................................................55
9.8 Packaging and versioning systems.............................................................................................57
9.9 Model deployment....................................................................................................................58
10 Hadoop, Spark, and Streaming Analytics........................................................................................61
10.1 Introduction.............................................................................................................................61
10.2 Hadoop: HDFS and MapReduce...............................................................................................61
10.3 Spark: SparkSQL and MLlib......................................................................................................64
10.4 Streaming analytics and other trends......................................................................................67
11 Deep Learning Part 2: Text, Representation Learning and Recurrence...........................................69
11.1 Traditional approaches............................................................................................................69
11.2 Word embeddings and representational learning...................................................................70
11.3 Recurrent neural networks (RNN)............................................................................................73
11.4 From RNNs to Transformers....................................................................................................75
11.5 Conclusion...............................................................................................................................77
12 Graph Analytics...............................................................................................................................78
12.1 Graph construction.................................................................................................................78
12.2 Graph metrics..........................................................................................................................78

2

, 12.3 Community mining...................................................................................................................79
12.4 Making predictions: Relational learners..................................................................................80
12.5 Making predictions: Featurization...........................................................................................82
12.6 Example...................................................................................................................................82
12.7 A word on validation................................................................................................................82
12.8 Node2vec and deep learning...................................................................................................83
12.9 Tooling.....................................................................................................................................86
12.10 NoSQL....................................................................................................................................86
12.11 Graph databases....................................................................................................................87
13 Wrap Up..........................................................................................................................................89
13.1 Key pitfalls................................................................................................................................89
13.2 Closing......................................................................................................................................90




3

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
rikteugels Katholieke Universiteit Leuven
Follow You need to be logged in order to follow users or courses
Sold
54
Member since
2 year
Number of followers
8
Documents
6
Last sold
1 month ago

4.5

2 reviews

5
1
4
1
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions