100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

Machine Learning (Data Mining) - Samenvatting (slides en handboek)

Rating
-
Sold
14
Pages
129
Uploaded on
02-10-2023
Written in
2022/2023

Summary study book Data Science for Business of Foster Provost, Tom Fawcett - ISBN: 9781449361327, Edition: 1, Year of publication: -

Institution
Course











Whoops! We can’t load your doc right now. Try again or contact support.

Connected book

Written for

Institution
Study
Course

Document information

Summarized whole book?
No
Which chapters are summarized?
Unknown
Uploaded on
October 2, 2023
Number of pages
129
Written in
2022/2023
Type
Summary

Subjects

Content preview

Data Mining




1

,Inhoudstafel:
0. General Introduction...............................................................................................................................................7
1. Introduction: Data-Analytic Thinking.....................................................................................................................14
1.1 The Ubiquity of Data Opportunities................................................................................................................14
1.2 Example: Hurricane Frances............................................................................................................................15
1.3 Example: Predicting Customer Churn..............................................................................................................15
1.4 Data Science, Engineering and Data-Driven Decision Making.........................................................................16
1.5 Data Processing and ‘Big Data’.......................................................................................................................17
1.6 From Big Data 1.0 to Big Data 2.0...................................................................................................................17
1.7 Data and Data Science Capability as a Strategic Asset....................................................................................18
1.8 Data-Analytic Thinking....................................................................................................................................19
1.9 This Book........................................................................................................................................................19
1.10 Data Mining and Data Science, Revisited (fundamental concepts)................................................................20
1.11 Chemistry Is Not About Test Tubes: Data Science Versus the Work of the Data Scientist.............................20
1.12 Summary........................................................................................................................................................20
2. Business Problems and Data Science Solutions.....................................................................................................21
2.1 From Business Problems to Data Mining Tasks................................................................................................21
2.2 Supervised Versus Unsupervised Methods......................................................................................................23
2.3 Data Mining and Its Results.............................................................................................................................24
2.4 The Data Mining Process.................................................................................................................................25
2.4.1 Business Understanding...............................................................................................................................25
2.4.2 Data Understanding......................................................................................................................................26
2.4.3 Data Preparation..........................................................................................................................................26
2.4.4 Modeling.......................................................................................................................................................26
2.4.5 Evaluation.....................................................................................................................................................27
2.4.6 Deployment..................................................................................................................................................27
2.5 Implications for Managing the Data Science Team..........................................................................................28
2.6 Other Analytics Techniques and Technologies................................................................................................28
2.6.1 Statistics........................................................................................................................................................28
2.6.2 Database Querying.......................................................................................................................................28
2.6.3 Data Warehousing........................................................................................................................................29
2.6.4 Regression Analysis.......................................................................................................................................29
2.6.5 Machine Learning and Data Mining..............................................................................................................29
2.6.6 Answering Business Questions with These Techniques................................................................................30
2.7 Summary..........................................................................................................................................................30
3. Introduction to Predictive Modeling: From Correlation to Supervised Segmentation...........................................31
3.1 Models, Induction, Deduction.........................................................................................................................31

2

, 3.2 Supervised Segmentation................................................................................................................................33
3.2.1 Selecting Informative Attributes...................................................................................................................34
3.2.2 Example: Attribute Selection with Information Gain (lezen)........................................................................37
3.2.3 Supervised Segmentation with Tree-Structured Models..............................................................................38
3.3 Visualizing Segmentations...............................................................................................................................39
3.4 Trees as Sets of Rules.......................................................................................................................................40
3.5 Probability Estimation......................................................................................................................................41
3.6 Example: Addressing the Churn Problem with Tree Induction (lezen).............................................................41
3.7 Summary..........................................................................................................................................................41
4. Fitting a Model to Data..........................................................................................................................................42
4.1 Classfication via Mathematical Functions........................................................................................................43
4.1.1 Linear Discriminant Functions.......................................................................................................................45
4.1.2 Optimizing an Objective Function.................................................................................................................47
4.1.3 An Example of Mining a Linear Discriminant from Data (lezen)...................................................................47
4.1.4 Linear Discriminant Functions for Scoring and Ranking Instances................................................................48
4.1.5 Support Vector Machines, Briefly.................................................................................................................48
4.2 Regression via Mathematical Functions..........................................................................................................49
4.3 Class Probability Estimation and Logistic “Regression”....................................................................................49
4.3.1 Logistic Regression: Some Technical Details (lezen).....................................................................................50
4.4 Example: Logistic Regression versus Tree Induction (lezen)............................................................................50
4.5 Nonlinear Functions, Support Vector Machines, and Neural Networks..........................................................51
4.6 Summary..........................................................................................................................................................52
5. Overfitting and Its Avoidance................................................................................................................................53
5.1 Generalization.................................................................................................................................................53
5.2 Overfitting........................................................................................................................................................53
5.3 Overfitting Examined.......................................................................................................................................54
5.3.1 Holdout Data and Fitting Graphs..................................................................................................................54
5.3.2 Overfitting in Tree Induction.........................................................................................................................56
5.3.3 Overfitting in Mathematical Functions.........................................................................................................57
5.4 Example: Overfitting Linear Functions (lezen).................................................................................................57
5.5 Example: Why Is Overfitting Bad? (lezen)........................................................................................................58
5.6 From Holdout Evaluation to Cross-Validation..................................................................................................59
5.7 Example: The Churn Dataset Revisited (lezen)................................................................................................60
5.8 Learning Curves...............................................................................................................................................61
5.9 Overfitting Avoidance and Complexity Control................................................................................................62
5.9.1 Avoiding Overfitting with Tree Induction......................................................................................................62
5.9.2 A General Method for Avoiding Overfitting..................................................................................................62
5.9.3 Avoiding Overfitting for Parameter Optimization (lezen).............................................................................63

3

, 5.10 Summary........................................................................................................................................................63
6. Similarity, Neighbors, and Clusters........................................................................................................................64
6.1 Similarity and Distance....................................................................................................................................64
6.2 Nearest-Neighbor Reasoning...........................................................................................................................65
6.2.1 Example: Whiskey Analytics (lezen)..............................................................................................................65
6.3 Nearest Neighbors for Predictive Modeling.....................................................................................................66
6.3.1 How Many Neighbors and How Much Influence?........................................................................................67
6.3.2 Geometric Interpretation, Overfitting, and Complexity Control...................................................................68
6.3.3 Issues with Nearest-Neighbor Methods.......................................................................................................69
6.4 Some Important Technical Details Relating to Similarities and Neighbors......................................................70
6.4.1 Heterogeneous Attributes............................................................................................................................70
6.4.2 Other Distance Functions (lezen)..................................................................................................................70
6.4.3 Combining Functions: Calculating Scores from Neighbors (lezen)................................................................70
6.5 Clustering.........................................................................................................................................................71
6.5.1 Example: Whiskey Analytics Revisited (lezen)..............................................................................................71
6.5.2 Hierarchical Clustering..................................................................................................................................71
6.5.3 Nearest Neighbors Revisited: Clustering Around Centroids.........................................................................73
6.5.4 Example: Clustering Business News Stories (lezen)......................................................................................75
6.5.5 Understanding the Results of Clustering......................................................................................................75
6.5.6 Using Supervised Learning to Generate Cluster Descriptions (lezen)...........................................................76
6.6 Stepping Back: Solving a Business Problem Versus Data Exploration..............................................................77
6.7 Summary..........................................................................................................................................................77
7. Decision Analytic Thinking I: What Is a Good Model?............................................................................................78
7.1 Evaluating Classifiers.......................................................................................................................................78
7.1.1 Plain Accuracy and Its Problems...................................................................................................................78
7.1.2 The Confusion Matrix...................................................................................................................................79
7.1.3 Problems with Unbalanced Classes..............................................................................................................80
7.1.4 Problems with Unequal Costs and Benefits..................................................................................................81
7.1.5 Generalizing Beyond Classification...............................................................................................................81
7.2 A Key Analytical Framework: Expected Value..................................................................................................81
7.2.1 Using Expected Value to Frame Classifier Use..............................................................................................82
7.2.2 Using Expected Value to Frame Classifier Evaluation...................................................................................82
7.3 Evaluation, Baseline Performance, and Implications for Investments in Data.................................................84
7.4 Summary..........................................................................................................................................................85
8. Visualizing Model Performance.............................................................................................................................86
8.1 Ranking Instead of Classifying..........................................................................................................................86
8.2 Profit Curves....................................................................................................................................................87
8.3 ROC Graphs and Curves...................................................................................................................................88

4

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
studentua2001 Universiteit Antwerpen
Follow You need to be logged in order to follow users or courses
Sold
52
Member since
5 year
Number of followers
48
Documents
3
Last sold
10 months ago

3.7

6 reviews

5
2
4
1
3
2
2
1
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions