100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
Samenvatting

Machine Learning (Data Mining) - Samenvatting (slides en handboek)

Beoordeling
-
Verkocht
14
Pagina's
129
Geüpload op
02-10-2023
Geschreven in
2022/2023

Behaalde score: (17/20); Kwalitatieve, uitgebreide, duidelijke, allesomvattende (alle behandelde hoofdstukken) (128p) samenvatting (in Engels) van het vak Data Mining gebaseerd op het handboek, eigen notities en de slides. Recentelijk geschreven en gebruikt (2022).

Meer zien Lees minder
Instelling
Vak











Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Gekoppeld boek

Geschreven voor

Instelling
Studie
Vak

Documentinformatie

Heel boek samengevat?
Nee
Wat is er van het boek samengevat?
Onbekend
Geüpload op
2 oktober 2023
Aantal pagina's
129
Geschreven in
2022/2023
Type
Samenvatting

Onderwerpen

Voorbeeld van de inhoud

Data Mining




1

,Inhoudstafel:
0. General Introduction...............................................................................................................................................7
1. Introduction: Data-Analytic Thinking.....................................................................................................................14
1.1 The Ubiquity of Data Opportunities................................................................................................................14
1.2 Example: Hurricane Frances............................................................................................................................15
1.3 Example: Predicting Customer Churn..............................................................................................................15
1.4 Data Science, Engineering and Data-Driven Decision Making.........................................................................16
1.5 Data Processing and ‘Big Data’.......................................................................................................................17
1.6 From Big Data 1.0 to Big Data 2.0...................................................................................................................17
1.7 Data and Data Science Capability as a Strategic Asset....................................................................................18
1.8 Data-Analytic Thinking....................................................................................................................................19
1.9 This Book........................................................................................................................................................19
1.10 Data Mining and Data Science, Revisited (fundamental concepts)................................................................20
1.11 Chemistry Is Not About Test Tubes: Data Science Versus the Work of the Data Scientist.............................20
1.12 Summary........................................................................................................................................................20
2. Business Problems and Data Science Solutions.....................................................................................................21
2.1 From Business Problems to Data Mining Tasks................................................................................................21
2.2 Supervised Versus Unsupervised Methods......................................................................................................23
2.3 Data Mining and Its Results.............................................................................................................................24
2.4 The Data Mining Process.................................................................................................................................25
2.4.1 Business Understanding...............................................................................................................................25
2.4.2 Data Understanding......................................................................................................................................26
2.4.3 Data Preparation..........................................................................................................................................26
2.4.4 Modeling.......................................................................................................................................................26
2.4.5 Evaluation.....................................................................................................................................................27
2.4.6 Deployment..................................................................................................................................................27
2.5 Implications for Managing the Data Science Team..........................................................................................28
2.6 Other Analytics Techniques and Technologies................................................................................................28
2.6.1 Statistics........................................................................................................................................................28
2.6.2 Database Querying.......................................................................................................................................28
2.6.3 Data Warehousing........................................................................................................................................29
2.6.4 Regression Analysis.......................................................................................................................................29
2.6.5 Machine Learning and Data Mining..............................................................................................................29
2.6.6 Answering Business Questions with These Techniques................................................................................30
2.7 Summary..........................................................................................................................................................30
3. Introduction to Predictive Modeling: From Correlation to Supervised Segmentation...........................................31
3.1 Models, Induction, Deduction.........................................................................................................................31

2

, 3.2 Supervised Segmentation................................................................................................................................33
3.2.1 Selecting Informative Attributes...................................................................................................................34
3.2.2 Example: Attribute Selection with Information Gain (lezen)........................................................................37
3.2.3 Supervised Segmentation with Tree-Structured Models..............................................................................38
3.3 Visualizing Segmentations...............................................................................................................................39
3.4 Trees as Sets of Rules.......................................................................................................................................40
3.5 Probability Estimation......................................................................................................................................41
3.6 Example: Addressing the Churn Problem with Tree Induction (lezen).............................................................41
3.7 Summary..........................................................................................................................................................41
4. Fitting a Model to Data..........................................................................................................................................42
4.1 Classfication via Mathematical Functions........................................................................................................43
4.1.1 Linear Discriminant Functions.......................................................................................................................45
4.1.2 Optimizing an Objective Function.................................................................................................................47
4.1.3 An Example of Mining a Linear Discriminant from Data (lezen)...................................................................47
4.1.4 Linear Discriminant Functions for Scoring and Ranking Instances................................................................48
4.1.5 Support Vector Machines, Briefly.................................................................................................................48
4.2 Regression via Mathematical Functions..........................................................................................................49
4.3 Class Probability Estimation and Logistic “Regression”....................................................................................49
4.3.1 Logistic Regression: Some Technical Details (lezen).....................................................................................50
4.4 Example: Logistic Regression versus Tree Induction (lezen)............................................................................50
4.5 Nonlinear Functions, Support Vector Machines, and Neural Networks..........................................................51
4.6 Summary..........................................................................................................................................................52
5. Overfitting and Its Avoidance................................................................................................................................53
5.1 Generalization.................................................................................................................................................53
5.2 Overfitting........................................................................................................................................................53
5.3 Overfitting Examined.......................................................................................................................................54
5.3.1 Holdout Data and Fitting Graphs..................................................................................................................54
5.3.2 Overfitting in Tree Induction.........................................................................................................................56
5.3.3 Overfitting in Mathematical Functions.........................................................................................................57
5.4 Example: Overfitting Linear Functions (lezen).................................................................................................57
5.5 Example: Why Is Overfitting Bad? (lezen)........................................................................................................58
5.6 From Holdout Evaluation to Cross-Validation..................................................................................................59
5.7 Example: The Churn Dataset Revisited (lezen)................................................................................................60
5.8 Learning Curves...............................................................................................................................................61
5.9 Overfitting Avoidance and Complexity Control................................................................................................62
5.9.1 Avoiding Overfitting with Tree Induction......................................................................................................62
5.9.2 A General Method for Avoiding Overfitting..................................................................................................62
5.9.3 Avoiding Overfitting for Parameter Optimization (lezen).............................................................................63

3

, 5.10 Summary........................................................................................................................................................63
6. Similarity, Neighbors, and Clusters........................................................................................................................64
6.1 Similarity and Distance....................................................................................................................................64
6.2 Nearest-Neighbor Reasoning...........................................................................................................................65
6.2.1 Example: Whiskey Analytics (lezen)..............................................................................................................65
6.3 Nearest Neighbors for Predictive Modeling.....................................................................................................66
6.3.1 How Many Neighbors and How Much Influence?........................................................................................67
6.3.2 Geometric Interpretation, Overfitting, and Complexity Control...................................................................68
6.3.3 Issues with Nearest-Neighbor Methods.......................................................................................................69
6.4 Some Important Technical Details Relating to Similarities and Neighbors......................................................70
6.4.1 Heterogeneous Attributes............................................................................................................................70
6.4.2 Other Distance Functions (lezen)..................................................................................................................70
6.4.3 Combining Functions: Calculating Scores from Neighbors (lezen)................................................................70
6.5 Clustering.........................................................................................................................................................71
6.5.1 Example: Whiskey Analytics Revisited (lezen)..............................................................................................71
6.5.2 Hierarchical Clustering..................................................................................................................................71
6.5.3 Nearest Neighbors Revisited: Clustering Around Centroids.........................................................................73
6.5.4 Example: Clustering Business News Stories (lezen)......................................................................................75
6.5.5 Understanding the Results of Clustering......................................................................................................75
6.5.6 Using Supervised Learning to Generate Cluster Descriptions (lezen)...........................................................76
6.6 Stepping Back: Solving a Business Problem Versus Data Exploration..............................................................77
6.7 Summary..........................................................................................................................................................77
7. Decision Analytic Thinking I: What Is a Good Model?............................................................................................78
7.1 Evaluating Classifiers.......................................................................................................................................78
7.1.1 Plain Accuracy and Its Problems...................................................................................................................78
7.1.2 The Confusion Matrix...................................................................................................................................79
7.1.3 Problems with Unbalanced Classes..............................................................................................................80
7.1.4 Problems with Unequal Costs and Benefits..................................................................................................81
7.1.5 Generalizing Beyond Classification...............................................................................................................81
7.2 A Key Analytical Framework: Expected Value..................................................................................................81
7.2.1 Using Expected Value to Frame Classifier Use..............................................................................................82
7.2.2 Using Expected Value to Frame Classifier Evaluation...................................................................................82
7.3 Evaluation, Baseline Performance, and Implications for Investments in Data.................................................84
7.4 Summary..........................................................................................................................................................85
8. Visualizing Model Performance.............................................................................................................................86
8.1 Ranking Instead of Classifying..........................................................................................................................86
8.2 Profit Curves....................................................................................................................................................87
8.3 ROC Graphs and Curves...................................................................................................................................88

4

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
studentua2001 Universiteit Antwerpen
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
52
Lid sinds
5 jaar
Aantal volgers
48
Documenten
3
Laatst verkocht
11 maanden geleden

3,7

6 beoordelingen

5
2
4
1
3
2
2
1
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen