Samenvatting

Data Science for Business what you need to know about data mining and data-analytic thinking - Foster Provost Tom Fawcett - Summary

Name: Data Science for Business what you need to know about data mining and data-analytic thinking - Foster Provost Tom Fawcett - Summary
SKU: doc_366396
Rating: 3.69 (16 reviews)
Author: martijnpaulussen

Beoordeling

3,7

(16)

Verkocht

Pagina's

Geüpload op

01-11-2017

Geschreven in

2017/2018

A clear and elaborate summary of the Data Science for Business "What You Need to Know About Data Mining and Data-Analytic Thinking" by Foster Provost & Tom Fawcett. Extra chapters include Neural Networks, a Formula sheet and example exam questions. Chapters included are: 1, 2, 3, 4, 5, 6, 7, 8, 9, 11 and 13. This summary is 36 pages and includes all needed information to pass an exam, it is based on several online sources in addition to the book itself. This summary helped students in achieving a grade of at least 7/10 at Maastricht University.

Meer zien Lees minder

Instelling

Vak

Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Meld schending auteursrecht

Gekoppeld boek

Foster Provost, Tom Fawcett Data Science for Business

Uitgave:augustus 2013
ISBN:9781449361327
Druk:1

Geschreven voor

Instelling: Maastricht University (UM)
Studie: MSc Business Intelligence and Smart Services
Vak: Business Analytics

Alle documenten voor dit vak (1)

Documentinformatie

Heel boek samengevat?: Ja
Geüpload op: 1 november 2017
Bestand laatst geupdate op: 16 december 2017
Aantal pagina's: 36
Geschreven in: 2017/2018
Type: Samenvatting

Onderwerpen

data science for business
business analytics
neural network
data science for business foster provost tom fawcett
summary

Voorbeeld van de inhoud

Data Science for Business Summary
What You Need to Know About Data Mining and Data-Analytic Thinking
Foster Provost & Tom Fawcett

Author: Martijn C. Paulussen
University: Maastricht University School of Business and Economics
Master: MSc Business Intelligence & Smart Services
Course: [EBC4220] Business Analytics

School of Business and Economics
MSc Business Intelligence & Smart Services

© 2017 Martijn Paulussen - Maastricht University School of Business and Economics

Nothing in this publication may be reproduced and/or made public by means of printing, offset, photocopy or
microfilm or in any digital, electronic, optical or any other form without the prior written permission of the
owner of the copyright.

,Table of Contents
Chapter 0 - Introduction and general insights ................................................................................................3
Chapter 1 - Data Analytic Thinking ...............................................................................................................3
1.1. Data Driven Decision making (DDD) .................................................................................................3
Chapter 2 –Data Mining Tasks and Business Problems .................................................................................4
2.1. Supervised vs Unsupervised ................................................................................................................5
2.2. Data mining and KDD .........................................................................................................................5
Chapter 3 - Predictive Modeling: Correlation to Supervised Segmentation ..................................................6
3.1. Entropy ................................................................................................................................................6
3.2. Information Gain .................................................................................................................................8
3.3. Entropy Chart ......................................................................................................................................9
3.4. Supervised Segmentation with Tree-Structured Models .....................................................................9
Chapter 4 - Fitting a Model to Data..............................................................................................................11
4.1. Support Vector Machines ..................................................................................................................11
4.2. Logistic Regression ...........................................................................................................................12
Chapter 5 - Overfitting and its Avoidance....................................................................................................13
5.1. Holdout Data .....................................................................................................................................13
5.2. Cross-validation.................................................................................................................................14
5.3. Learning Curve ..................................................................................................................................14
Chapter 6 - Similarity, Neighbors, and Clusters ...........................................................................................15
6.1. General Euclidean Distance ..............................................................................................................15
6.2. Nearest Neighbor ...............................................................................................................................15
6.3. Clustering ..........................................................................................................................................17
6.4. Centroids Clustering ..........................................................................................................................17
Chapter 7 - Decision Analytic Thinking 1: What is a Good Model? ...........................................................19
7.1. Confusion Matrix...............................................................................................................................19
7.2. Expected Value (Profit) .....................................................................................................................19
7.3. Sensitivity, Specificity and Accuracy................................................................................................21
7.4. Baseline Methods ..............................................................................................................................21
Chapter 8 - Visualizing Model Performance ................................................................................................22
8.1. Profit Curve .......................................................................................................................................22
8.2. ROC graphs and curves .....................................................................................................................23
8.2.1. Area Under the ROC Curve (AUC) ...........................................................................................23
8.3. Cumulative response curves ..............................................................................................................24
8.4. Lift curves..........................................................................................................................................24
8.5. Example: Performance Analytics for Churn Modeling .....................................................................24
8.5.1. Fitting Curve ...............................................................................................................................24
Chapter 9 - Evidence and Probabilities ........................................................................................................25

1

, 9.1. Joint Probabilities and Independence ................................................................................................25
9.2. Naïve Bayes .......................................................................................................................................25
9.3. Evidence Lift .....................................................................................................................................26
Chapter 11 - Decision Analytics Thinking 2: Toward Analytical Engineering ...........................................27
11.1. Expected Value................................................................................................................................27
Chapter 13 - Data Science and Business Strategy ........................................................................................28
Chapter Extra: Fuzzy Systems ......................................................................................................................29
Crisp sets ..................................................................................................................................................29
Fuzzy sets .................................................................................................................................................29
Chapter Extra: Neural Networks ..................................................................................................................32
Chapter Extra: Formula Sheet ......................................................................................................................34
Chapter Exam Question................................................................................................................................35

2

,Chapter 0 - Introduction and general insights
Big Data: Datasets that are too large for traditional data processing systems.
Web 2.0: where new systems and companies began taking advantage of the interactive nature of the Web.

Chapter 1 - Data Analytic Thinking
1.1. Data Driven Decision making (DDD)
Data science involves principles, processes, and techniques for
understanding phenomena via the (automated) analysis of data in
order to improve decision making.
Data-driven decision-making [DDD] refers to the practice of basing
decisions on the analysis of data, rather than purely on intuition.
Two decision types: (1) decisions for which “discoveries” need to be
made within data, and (2) decisions that repeat, especially at massive
scale, and so decision-making can benefit from even small increases in
decision-making accuracy based on data analysis.
Fundamental concept 1: Extracting useful knowledge from data to
solve business problems can be treated systematically by following a
process with reasonably well-defined stages. (e.g. CRISP-DM)
Fundamental concept 2: From a large mass of data, information
technology can be used to find informative descriptive attributes of Figure 1-1. Data science in the
entities of interest. context of various data-related
processes in the organization.
Fundamental concept 3: If you look too hard at a set of data, you
will find something—but it might not generalize beyond the data you’re looking at.
Fundamental concept 4: Formulating data mining solutions and evaluating the results involves thinking
carefully about the context in which they will be used.

3

, Chapter 2 –Data Mining Tasks and Business Problems
Fundamental concepts: A set of canonical data mining tasks; The data mining process; Supervised
versus unsupervised data mining.
1. Classification & Probability estimation: Predict, for each individual in a population, which of a
(small) set of classes this individual belongs to. (Will it happen?)
Q: Among all the customers of TelCo, which are likely to respond to a given offer?
A: e.g. two classes: will respond and will not respond. Yes or No.
Class probability estimation: a score representing the probability (or some other quantification of
likelihood) that that individual belongs to each class.
2. Regression (value estimation): Estimate or predict, for each individual, the numerical value of some
variable for that individual. (How much?)
Q: How much will a given customer use the service?
A: predict service usage (target). Model can be based on other similar individuals (variables).
3. Similarity matching: Identify similar individuals based on data known about them.
Similarity matching can be used directly to find similar entities.
4. Clustering: to group individuals in a population together by their similarity, but not driven by any
specific purpose. (Chapter 6)
Q: Do our customers form natural groups or segments? What products should we offer or
develop? How should our customer care teams (or sales teams) be structured?
Clustering is useful in preliminary domain exploration to see which natural groups exist
5. Co-occurrence grouping: Find associations between entities based on transactions involving them.
Q: What items are commonly purchased together?
While clustering looks at similarity between objects based on the objects’ attributes, co-occurrence
grouping considers similarity of objects based on their appearing together in transactions. “People
who bought X also bought Y”
6. Profiling: attempts to characterize the typical behavior of an individual, group, or population.
Q: What is the typical cell phone usage (day, night, international) of this customer segment?
Profiling is often used to establish behavioral norms for anomaly detection applications such as
fraud detection and monitoring for intrusions to computer systems
7. Link prediction: Predict connections between data items, usually by suggesting that a link should
exist, and possibly also estimating the strength of the link.
Q: Since you and Karen share 10 friends, maybe you’d like to be Karen’s friend?
Link prediction can also estimate the strength of a link. We search for links that do not exist
between customers and movies, but that we predict should exist and should be strong.
8. Data reduction: Take a large set of data and replace it with a smaller set of data that contains much of
the important information in the larger set.
Data reduction usually involves loss of information. What is important is the trade-off for
improved insight.
9. Causal modeling: Helps us understand what events or actions actually influence others.

4

€10,24

Krijg toegang tot het volledige document:

Gekocht door 95 studenten

100% tevredenheidsgarantie

Direct beschikbaar na je betaling

Lees online óf als PDF

Geen vaste maandelijkse kosten

Maak kennis met de verkoper

martijnpaulussen

3,7

(20)

Ook beschikbaar in voordeelbundel

Beoordelingen van geverifieerde kopers

7 van 16 beoordelingen worden weergegeven

taliaabdool · 2 beoordelingen

3 jaar geleden

timofeindor · 1 beoordeling

3 jaar geleden

yasarammar · 1 beoordeling

4 jaar geleden

Reactie van de verkoper

4 jaar geleden

Hi Yasarammar, sorry to see you gave 1 star, was there anything that could be improved?

nickgritter Business Administration · 14 beoordelingen

4 jaar geleden

noasara Bedrijfskunde · 4 beoordelingen

5 jaar geleden

Reactie van de verkoper

5 jaar geleden

Hi Noasara, thank you for your review, I hope it helps you with learning :).

raymonvanboheemen Pre-Master Strategic Management · 7 beoordelingen

4 jaar geleden

gyoo2 · 2 beoordelingen

5 jaar geleden

Reactie van de verkoper

5 jaar geleden

Thank you for your review Gyoo2!

3,7

16 beoordelingen

Betrouwbare reviews op Stuvia

Alle beoordelingen zijn geschreven door echte Stuvia-gebruikers na geverifieerde aankopen.

Maak kennis met de verkoper

martijnpaulussen Maastricht University

Bekijk profiel

Volgen

Verkocht

114

Lid sinds

8 jaar

Aantal volgers

108

Documenten

Laatst verkocht

1 jaar geleden

3,7

20 beoordelingen

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper martijnpaulussen. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €10,24. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 44903 samenvattingen verkocht Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Data Science for Business what you need to know about data mining and data-analytic thinking - Foster Provost Tom Fawcett - Summary

Gekoppeld boek

Geschreven voor

Documentinformatie

Onderwerpen

Voorbeeld van de inhoud

Meer vakken binnen Maastricht University (UM) > MSc Business Intelligence and Smart Services

Ook beschikbaar in voordeelbundel

Beoordelingen van geverifieerde kopers

Maak kennis met de verkoper

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Niet tevreden? Kies een ander document

Betaal zoals je wilt, start meteen met leren

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?