100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
Samenvatting

Samenvatting Advanced Analytics in a Big Data World (D0S06B)

Beoordeling
-
Verkocht
2
Pagina's
91
Geüpload op
12-03-2025
Geschreven in
2023/2024

Samenvatting van de volledige cursus op basis van de notities en slides voor het vak Advanced Analytics in a Big Data World (D0S06B) HIR(B) 2e master. Geslaagd eerste zit.

Instelling
Vak











Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Geschreven voor

Instelling
Studie
Vak

Documentinformatie

Geüpload op
12 maart 2025
Aantal pagina's
91
Geschreven in
2023/2024
Type
Samenvatting

Onderwerpen

Voorbeeld van de inhoud

ADVANCED ANALYTICS
Prof. Seppe vanden Broucke




KU Leuven

,TABLE OF CONTENTS
Table of Contents...................................................................................................................................1
1 Introduction........................................................................................................................................4
1.1 Setting the Scene.........................................................................................................................4
1.2 Components of Data Science.......................................................................................................4
1.3 Process, People, and Problems....................................................................................................5
2 Preprocessing and Feature Engineering..............................................................................................7
2.1 Preprocessing Steps.....................................................................................................................7
2.2 Feature Engineering...................................................................................................................10
2.3 Conclusion.................................................................................................................................10
3 Supervised Learning..........................................................................................................................12
3.1 (Logistic) Regression..................................................................................................................12
3.2 Decision and Regression Trees...................................................................................................13
3.3 K-NN...........................................................................................................................................15
4 Model Evaluation..............................................................................................................................16
4.1 Introduction...............................................................................................................................16
4.2 Classification Performance.........................................................................................................16
4.3 Regression Performance............................................................................................................19
4.4 Cross-Validation and Tuning......................................................................................................19
4.5 Additional Notes........................................................................................................................20
4.6 Monitoring and Maintenance....................................................................................................21
5 Ensemble Modelling: Bagging and Boosting.....................................................................................23
5.1 Introduction...............................................................................................................................23
5.2 Bagging......................................................................................................................................23
5.3 Boosting.....................................................................................................................................24
5.4 Comparing Bagging and Boosting..............................................................................................25
6 Interpretability..................................................................................................................................26
6.1 Introduction...............................................................................................................................26
6.2 Feature importance...................................................................................................................26
6.3 Partial Dependence Plots...........................................................................................................27
6.4 Individual Conditional Expectation plots....................................................................................27
6.5 LIME...........................................................................................................................................27
6.6 Shapley values...........................................................................................................................28
6.7 Conclusion.................................................................................................................................28


1

,7 Deep Learning Part 1: Foundations and Images................................................................................29
7.1 Introduction...............................................................................................................................29
7.2 Foundations of artificial neural networks..................................................................................30
7.3 Delving deeper into Artificial Neural Networks..........................................................................31
7.4 The convolutional architecture..................................................................................................33
7.5 Interpretation of convolutional neural networks.......................................................................35
7.6 Generative models for images...................................................................................................37
8 Unsupervised Learning.....................................................................................................................45
8.1 Frequent itemset and association rule mining...........................................................................45
8.2 Clustering...................................................................................................................................47
8.3 Dimensionality reduction...........................................................................................................50
8.4 Anomaly detection.....................................................................................................................51
9 Data Science Tools............................................................................................................................53
9.1 In-memory analytics..................................................................................................................53
9.2 Python and R..............................................................................................................................53
9.3 Visualization...............................................................................................................................53
9.4 The road to big data...................................................................................................................54
9.5 Notebooks and development environments.............................................................................54
9.6 Labeling......................................................................................................................................55
9.7 File formats................................................................................................................................55
9.8 Packaging and versioning systems.............................................................................................57
9.9 Model deployment....................................................................................................................58
10 Hadoop, Spark, and Streaming Analytics........................................................................................61
10.1 Introduction.............................................................................................................................61
10.2 Hadoop: HDFS and MapReduce...............................................................................................61
10.3 Spark: SparkSQL and MLlib......................................................................................................64
10.4 Streaming analytics and other trends......................................................................................67
11 Deep Learning Part 2: Text, Representation Learning and Recurrence...........................................69
11.1 Traditional approaches............................................................................................................69
11.2 Word embeddings and representational learning...................................................................70
11.3 Recurrent neural networks (RNN)............................................................................................73
11.4 From RNNs to Transformers....................................................................................................75
11.5 Conclusion...............................................................................................................................77
12 Graph Analytics...............................................................................................................................78
12.1 Graph construction.................................................................................................................78
12.2 Graph metrics..........................................................................................................................78

2

, 12.3 Community mining...................................................................................................................79
12.4 Making predictions: Relational learners..................................................................................80
12.5 Making predictions: Featurization...........................................................................................82
12.6 Example...................................................................................................................................82
12.7 A word on validation................................................................................................................82
12.8 Node2vec and deep learning...................................................................................................83
12.9 Tooling.....................................................................................................................................86
12.10 NoSQL....................................................................................................................................86
12.11 Graph databases....................................................................................................................87
13 Wrap Up..........................................................................................................................................89
13.1 Key pitfalls................................................................................................................................89
13.2 Closing......................................................................................................................................90




3

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
rikteugels Katholieke Universiteit Leuven
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
54
Lid sinds
2 jaar
Aantal volgers
8
Documenten
6
Laatst verkocht
1 maand geleden

4,5

2 beoordelingen

5
1
4
1
3
0
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen