100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
Samenvatting

Data Science for Business what you need to know about data mining and data-analytic thinking - Foster Provost Tom Fawcett - Summary

Beoordeling
3,7
(16)
Verkocht
95
Pagina's
36
Geüpload op
01-11-2017
Geschreven in
2017/2018

A clear and elaborate summary of the Data Science for Business "What You Need to Know About Data Mining and Data-Analytic Thinking" by Foster Provost & Tom Fawcett. Extra chapters include Neural Networks, a Formula sheet and example exam questions. Chapters included are: 1, 2, 3, 4, 5, 6, 7, 8, 9, 11 and 13. This summary is 36 pages and includes all needed information to pass an exam, it is based on several online sources in addition to the book itself. This summary helped students in achieving a grade of at least 7/10 at Maastricht University.

Meer zien Lees minder












Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Documentinformatie

Heel boek samengevat?
Ja
Geüpload op
1 november 2017
Bestand laatst geupdate op
16 december 2017
Aantal pagina's
36
Geschreven in
2017/2018
Type
Samenvatting

Onderwerpen

Voorbeeld van de inhoud

Data Science for Business Summary
What You Need to Know About Data Mining and Data-Analytic Thinking
Foster Provost & Tom Fawcett




Author: Martijn C. Paulussen
University: Maastricht University School of Business and Economics
Master: MSc Business Intelligence & Smart Services
Course: [EBC4220] Business Analytics




School of Business and Economics
MSc Business Intelligence & Smart Services


© 2017 Martijn Paulussen - Maastricht University School of Business and Economics

Nothing in this publication may be reproduced and/or made public by means of printing, offset, photocopy or
microfilm or in any digital, electronic, optical or any other form without the prior written permission of the
owner of the copyright.

,Table of Contents
Chapter 0 - Introduction and general insights ................................................................................................3
Chapter 1 - Data Analytic Thinking ...............................................................................................................3
1.1. Data Driven Decision making (DDD) .................................................................................................3
Chapter 2 –Data Mining Tasks and Business Problems .................................................................................4
2.1. Supervised vs Unsupervised ................................................................................................................5
2.2. Data mining and KDD .........................................................................................................................5
Chapter 3 - Predictive Modeling: Correlation to Supervised Segmentation ..................................................6
3.1. Entropy ................................................................................................................................................6
3.2. Information Gain .................................................................................................................................8
3.3. Entropy Chart ......................................................................................................................................9
3.4. Supervised Segmentation with Tree-Structured Models .....................................................................9
Chapter 4 - Fitting a Model to Data..............................................................................................................11
4.1. Support Vector Machines ..................................................................................................................11
4.2. Logistic Regression ...........................................................................................................................12
Chapter 5 - Overfitting and its Avoidance....................................................................................................13
5.1. Holdout Data .....................................................................................................................................13
5.2. Cross-validation.................................................................................................................................14
5.3. Learning Curve ..................................................................................................................................14
Chapter 6 - Similarity, Neighbors, and Clusters ...........................................................................................15
6.1. General Euclidean Distance ..............................................................................................................15
6.2. Nearest Neighbor ...............................................................................................................................15
6.3. Clustering ..........................................................................................................................................17
6.4. Centroids Clustering ..........................................................................................................................17
Chapter 7 - Decision Analytic Thinking 1: What is a Good Model? ...........................................................19
7.1. Confusion Matrix...............................................................................................................................19
7.2. Expected Value (Profit) .....................................................................................................................19
7.3. Sensitivity, Specificity and Accuracy................................................................................................21
7.4. Baseline Methods ..............................................................................................................................21
Chapter 8 - Visualizing Model Performance ................................................................................................22
8.1. Profit Curve .......................................................................................................................................22
8.2. ROC graphs and curves .....................................................................................................................23
8.2.1. Area Under the ROC Curve (AUC) ...........................................................................................23
8.3. Cumulative response curves ..............................................................................................................24
8.4. Lift curves..........................................................................................................................................24
8.5. Example: Performance Analytics for Churn Modeling .....................................................................24
8.5.1. Fitting Curve ...............................................................................................................................24
Chapter 9 - Evidence and Probabilities ........................................................................................................25


1

, 9.1. Joint Probabilities and Independence ................................................................................................25
9.2. Naïve Bayes .......................................................................................................................................25
9.3. Evidence Lift .....................................................................................................................................26
Chapter 11 - Decision Analytics Thinking 2: Toward Analytical Engineering ...........................................27
11.1. Expected Value................................................................................................................................27
Chapter 13 - Data Science and Business Strategy ........................................................................................28
Chapter Extra: Fuzzy Systems ......................................................................................................................29
Crisp sets ..................................................................................................................................................29
Fuzzy sets .................................................................................................................................................29
Chapter Extra: Neural Networks ..................................................................................................................32
Chapter Extra: Formula Sheet ......................................................................................................................34
Chapter Exam Question................................................................................................................................35




2

,Chapter 0 - Introduction and general insights
Big Data: Datasets that are too large for traditional data processing systems.
Web 2.0: where new systems and companies began taking advantage of the interactive nature of the Web.



Chapter 1 - Data Analytic Thinking
1.1. Data Driven Decision making (DDD)
Data science involves principles, processes, and techniques for
understanding phenomena via the (automated) analysis of data in
order to improve decision making.
Data-driven decision-making [DDD] refers to the practice of basing
decisions on the analysis of data, rather than purely on intuition.
Two decision types: (1) decisions for which “discoveries” need to be
made within data, and (2) decisions that repeat, especially at massive
scale, and so decision-making can benefit from even small increases in
decision-making accuracy based on data analysis.
Fundamental concept 1: Extracting useful knowledge from data to
solve business problems can be treated systematically by following a
process with reasonably well-defined stages. (e.g. CRISP-DM)
Fundamental concept 2: From a large mass of data, information
technology can be used to find informative descriptive attributes of Figure 1-1. Data science in the
entities of interest. context of various data-related
processes in the organization.
Fundamental concept 3: If you look too hard at a set of data, you
will find something—but it might not generalize beyond the data you’re looking at.
Fundamental concept 4: Formulating data mining solutions and evaluating the results involves thinking
carefully about the context in which they will be used.




3

, Chapter 2 –Data Mining Tasks and Business Problems
Fundamental concepts: A set of canonical data mining tasks; The data mining process; Supervised
versus unsupervised data mining.
1. Classification & Probability estimation: Predict, for each individual in a population, which of a
(small) set of classes this individual belongs to. (Will it happen?)
Q: Among all the customers of TelCo, which are likely to respond to a given offer?
A: e.g. two classes: will respond and will not respond. Yes or No.
Class probability estimation: a score representing the probability (or some other quantification of
likelihood) that that individual belongs to each class.
2. Regression (value estimation): Estimate or predict, for each individual, the numerical value of some
variable for that individual. (How much?)
Q: How much will a given customer use the service?
A: predict service usage (target). Model can be based on other similar individuals (variables).
3. Similarity matching: Identify similar individuals based on data known about them.
Similarity matching can be used directly to find similar entities.
4. Clustering: to group individuals in a population together by their similarity, but not driven by any
specific purpose. (Chapter 6)
Q: Do our customers form natural groups or segments? What products should we offer or
develop? How should our customer care teams (or sales teams) be structured?
Clustering is useful in preliminary domain exploration to see which natural groups exist
5. Co-occurrence grouping: Find associations between entities based on transactions involving them.
Q: What items are commonly purchased together?
While clustering looks at similarity between objects based on the objects’ attributes, co-occurrence
grouping considers similarity of objects based on their appearing together in transactions. “People
who bought X also bought Y”
6. Profiling: attempts to characterize the typical behavior of an individual, group, or population.
Q: What is the typical cell phone usage (day, night, international) of this customer segment?
Profiling is often used to establish behavioral norms for anomaly detection applications such as
fraud detection and monitoring for intrusions to computer systems
7. Link prediction: Predict connections between data items, usually by suggesting that a link should
exist, and possibly also estimating the strength of the link.
Q: Since you and Karen share 10 friends, maybe you’d like to be Karen’s friend?
Link prediction can also estimate the strength of a link. We search for links that do not exist
between customers and movies, but that we predict should exist and should be strong.
8. Data reduction: Take a large set of data and replace it with a smaller set of data that contains much of
the important information in the larger set.
Data reduction usually involves loss of information. What is important is the trade-off for
improved insight.
9. Causal modeling: Helps us understand what events or actions actually influence others.


4

Beoordelingen van geverifieerde kopers

7 van 16 beoordelingen worden weergegeven
3 jaar geleden

3 jaar geleden

4 jaar geleden

4 jaar geleden

Hi Yasarammar, sorry to see you gave 1 star, was there anything that could be improved?

4 jaar geleden

5 jaar geleden

5 jaar geleden

Hi Noasara, thank you for your review, I hope it helps you with learning :).

4 jaar geleden

5 jaar geleden

5 jaar geleden

Thank you for your review Gyoo2!

3,7

16 beoordelingen

5
3
4
9
3
2
2
0
1
2
Betrouwbare reviews op Stuvia

Alle beoordelingen zijn geschreven door echte Stuvia-gebruikers na geverifieerde aankopen.

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
martijnpaulussen Maastricht University
Bekijk profiel
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
114
Lid sinds
8 jaar
Aantal volgers
108
Documenten
4
Laatst verkocht
1 jaar geleden

3,7

20 beoordelingen

5
4
4
10
3
3
2
1
1
2

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen