100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
Samenvatting

Summary Data Science for Business Book Chapters

Beoordeling
-
Verkocht
-
Pagina's
33
Geüpload op
03-12-2019
Geschreven in
2019/2020

A summary of the book "Data Science for Business" by Foster Provost and Tom Fawcett. These notes per chapter also have highlighted the information that was presented in the class slides as well in yellow, showing students the things that they should focus on based on what was discussed in the lectures for Data Science for Business taught by Chintan Amrit.

Meer zien Lees minder











Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Documentinformatie

Heel boek samengevat?
Nee
Wat is er van het boek samengevat?
Chapter 1-10
Geüpload op
3 december 2019
Aantal pagina's
33
Geschreven in
2019/2020
Type
Samenvatting

Voorbeeld van de inhoud

Chapter 1. Introduction: Data Analytic Thinking
"The widest applications of data-mining techniques are in marketing for tasks such as targeted
marketing, online advertising, and recommendations for cross-selling"

"A data perspective will provide you with structure and principles, and this will give you a
framework to systematically analyze such problems"

Data science: "Set of fundamental principles that guide the extraction of knowledge from data"
● Data mining: "The extraction of knowledge from data, via technologies that incorporate
these principles"

Churn: "Customers switching from one company to another"
● "Customer retention has been a major use of data mining technologies–especially in
telecommunications and finance business"

Data Science, Engineering, and Data-Driven Decision Making
Ultimate goal: Improve decision making

"Data-driven decision-making (DDD) refers to the practice of basing decisions on the analysis of
data, rather than purely on intuition"
● Benefits:
○ "The more data-driven a firm is, the more productive it is"
■ "One standard deviation higher on the DDD scale is associated with a
4%-6% increase in productivity"
○ "Correlated with higher return on assets, return on equity, asset utilization, and
market value"

2 types of decisions to consider:
1. Decisions for which "discoveries" need to be made w/n data
a. E.g. Walmart and Target
2. Decisions that repeat, especially at massive scale, and so decision-making can benefit
from even small increases in decision-making accuracy based on data analytics
a. MegaTelCo churn example

"A predictive model extracts most of the complexity of the world, focusing in on a particular set
of indicators that correlate in some way w/ a quantity of interest"

Increasing use in business of automated decision making by computer systems

Data Processing and "Big Data"
Data engineering and processing ≠ data science
● "Data science needs access to data and it often benefits from sophisticated data
engineering and data processing technologies may facilitate, but these technologies
are not data science technologies per se", they do support data science
● Data processing for data-oriented business tasks that do not involve the extraction of
knowledge or data-driven decision making

,Big data: "Datasets that are too large for traditional data processing systems, and therefore
require new processing technologies"
● "Using big data technologies is associated w/ significant additional productivity growth"
○ 1 standard deviation of higher utilization = 1%-3% higher productivity

From Big Data 1.0 to Big Data 2.0
● "We should expect a Big Data 2.0 phase to follow Big Data 1.0. Once firms have become
capable of processing massive data in a flexible fashion, they should begin asking:
“What can I now do that I couldn’t do before, or do better than I could do before?”"

Data and Data Science Capability as a Strategic Asset
Fundamental principle: "Data, and the capability to extract useful knowledge from data, should
be regarded as key strategic assets"
● An asset view helps to think about the extent to which one should invest in it
● They are complementary strategic assets

"Predictive performance continues to improve as more data are used"
● Companies with more data may have an important strategic advantage

Data-Analytic Thinking
Data-analytic thinking may help "to envision opportunities for improving data-driven
decision-making, or to see data-oriented competitive threats"

Fundamental concept:
● "Extracting useful knowledge from data to solve business problems can be treated
systematically by following a process with reasonably well-defined stages"
● "From a large mass of data, information technology can be used to find informative
descriptive attributes of entities of interest"
● "If you look too hard at a set of data, you will find something—but it might not generalize
beyond the data you’re looking at"
● "Formulating data mining solutions and evaluating the results involves thinking carefully
about the context in which they will be used"

Chapter 2. Business Problems and Data Science Solutions
Fundamental concepts: A set of canonical data mining tasks; The data mining process;
Supervised versus unsupervised data mining.

Data science as a process with stages

From Business Problems to Data Mining Tasks
"A critical skill in data science is the ability to decompose a data-analytics problem into pieces
such that each piece matches a known task for which tools are available"
● Recognizing problems and their solutions helps to not waste time and resources
○ Focus on the parts that involve human involvement

,Types of tasks:
1. Classification and class probability estimation: Predict for each individual to which of a
(small) set of classes it belongs to
a. Classes are normally mutually exclusive
b. Closely related tasks: Scoring, class probability estimation
c. Outcome: Model
2. Regression ("value estimation"): "Estimate or predict… the numerical value of some
variable" per individual
a. "Informally, classification predicts whether something will happen, whereas
regression predicts how much something will happen"
3. Similarity matching: "Identify similar individuals based on data known about them"
a. E.g. product recommendations
4. Clustering: "Group individuals in a population together by their similarity, but not driven
by any specific purpose"
a. Useful in preliminary domain exploration for natural groups
5. (Association Rule Mining) Co-occurrence grouping (aka frequent itemset mining,
association rule discovery, and market-basket analysis): "Find associations between
entities based on transactions involving them"
a. "While clustering looks at similarit[ies] between objects based on the objects’
attributes, co-occurrence grouping considers similarity of objects based on
their appearing together in transactions"
b. Examples:
i. Put items next to each other for ease of finding
ii. Promote the items as a package
iii. Place items far apart from each other so that the customer has to walk
the aisles to search for it, and by doing so potentially see and buy other
items
c. Generic Rule: X ⇒ Y [S%, C %]
i. X,Y: Products and/or services
ii. X: Left-hand-side (LHS) and Y: Right-hand-side (RHS)
iii. S: Support: How often X and Y go together
iv. C: Confidence: How often Y go together with the X
v. Eg: {Laptop Computer, Antivirus Software} ⇒ {Extended SErvice
Plan}[30%, 70%]




6. Profiling (aka behavior description): "Attempts to characterize the typical behavior of an
individual, group, or population"
a. Establish behavioral norms for anomaly detection applications
b. E.g. fraud detection, monitoring for intrusions to computer systems

, 7. Link prediction: "Attempts to predict connections between data items, usually by
suggesting that a link should exist, and possibly also estimating the strength of the link"
8. Data reduction: "Attempts to take a large set of data and replace it with a smaller set of
data that contains much of the important information in the larger set"
9. Causal modeling: "Attempts to help us understand what events or actions actually
influence others"
a. Take into account the assumptions being made
i. "When undertaking causal modeling, a business needs to weigh the
trade-off of increasing investment to reduce the assumptions made,
versus deciding that the conclusions are good enough given the
assumptions"

Supervised Versus Unsupervised Methods
Unsupervised data mining problems do not have a specific target that is defined, but own
conclusions can be made about what the examples have in common
● "Supervised tasks require different techniques that unsupervised tasks do, and the
results often are much more useful"
● E.g. for unsupervised: Clustering, co-occurrence grouping and profiling
● E.g. for supervised: Classification (Categorical [binary] target), regression (numeric
target) and causal modeling
● Both: Matching, link prediction and data reduction

Important condition for supervised: There must be data on the target, acquiring this data is
often a key data science investment
● Individual's label: "The value for the target variable for an individual"

In the early stages it is important:
1. To decide whether the line of attack will be supervised or unsupervised
2. If supervised, to produce a precise definition of a target variable

Data Mining and Its Results
"Difference between (1) mining the data to find patterns and build models, and (2) using the
results of data mining"

Iteration is the rule rather than the exception
● Importance of "reasonable consistency, repeatability, and objectiveness"

The CRISP data mining process
1. Business understanding
a. Understand the problem to be solved and its use scenario: "The initial
formulation may not be complete or optimal so multiple iterations may be
necessary for an acceptable solution formulation to appear"
b. Analysts' creativity plays a large role
c. Problems can be structured/engineered into one or more subproblems that
involve building models
2. Data understanding

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
patycyl Universiteit van Amsterdam
Bekijk profiel
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
17
Lid sinds
6 jaar
Aantal volgers
16
Documenten
13
Laatst verkocht
4 jaar geleden

4,0

1 beoordelingen

5
0
4
1
3
0
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen