Garantie de satisfaction à 100% Disponible immédiatement après paiement En ligne et en PDF Tu n'es attaché à rien 4.2 TrustPilot
logo-home
Resume

Summary course Strategy Analytics (Grade Assignments 9)

Note
-
Vendu
17
Pages
39
Publié le
07-04-2021
Écrit en
2021/2022

Complete summary of: - Book: Data Science for Business (Provost & Fawcett) - Case studies summary and answers (P. Snoeren) All exam materials needed next to the lecture slides!

Établissement
Cours















Oups ! Impossible de charger votre document. Réessayez ou contactez le support.

Livre connecté

École, étude et sujet

Établissement
Cours
Cours

Infos sur le Document

Livre entier ?
Oui
Publié le
7 avril 2021
Nombre de pages
39
Écrit en
2021/2022
Type
Resume

Sujets

Aperçu du contenu

Strategy Analytics Summary
Data Science for Business Book, Provost & Fawcett
Case Studies, P. Snoeren

Assignment 1 grade: 8.5
Assignment 2 grade: 10




Sophie van Sonsbeek
MSc Business Administration
University of Amsterdam
6314M0380Y
March 23, 2021




Sophie van Sonsbeek - 12799955

,Table of content
Chapter 1. Introduction: Data-Analytic Thinking ............................................................................................ 3

Chapter 2. Business Problems and Data Science Solutions ............................................................................. 5

Chapter 3. Introduction to Predictive Modeling: From Correlation to Supervised Segmentation .................... 7

Chapter 4. Fitting a Model to Data ................................................................................................................. 9

Chapter 5. Overfitting and its avoidance ...................................................................................................... 12

Chapter 6. Similarity, neighbors, and clusters .............................................................................................. 15

Chapter 7: Decision Analytic Thinking I: What is a Good Model? .................................................................. 18

Chapter 8: Visualizing model performance................................................................................................... 20

Chapter 9: Evidence and Probabilities .......................................................................................................... 24

Chapter 10: Representing and Mining Text .................................................................................................. 26

Chapter 11: Decision Analytic Thinking II: Toward Analytical Engineering .................................................... 29

Chapter 12: Other Data Science Tasks and Techniques................................................................................. 30

Chapter 13: Data Science and Business Strategy .......................................................................................... 32

Chapter 14: Conclusion ................................................................................................................................ 34

Cases ........................................................................................................................................................... 35
1. Capital one ............................................................................................................................................. 35
2. Gaming industry..................................................................................................................................... 35
3. Easyjet + Fifa .......................................................................................................................................... 36
Easyjet ........................................................................................................................................................ 36
Fifa .............................................................................................................................................................. 36
4. Google Healthcare ................................................................................................................................. 37
5. Twitter and stock returns ....................................................................................................................... 38
6. Privacy.................................................................................................................................................... 38




Sophie van Sonsbeek - 12799955

,Chapter 1. Introduction: Data-Analytic Thinking
Introduction Data collection is done in every aspect of business:
- Operations, manufacturing, supply-chain, customer behavior,
marketing campaign performance, workflow procedure, and
so on.

Data science = the availability of data increases interest in methods to
extract knowledge and information from data.
The ubiquity of Data mining techniques:
data - Marketing: targeted marketing, online advertising,
opportunities recommendations for cross-selling
- Finance: credit scoring, trading, fraud detection
- Retail: Amazon & Walmart applies throughout entire business

Data-analytic thinking enables you to evaluate proposals for data
mining projects.
This book Goal of this book:
Translate business problems into data problems.
Provide data mining/data science techniques.

Example used in the book: Predicting customer churn.
Customers switching from one company to another is called churn,
and it is expensive all around: one company must spend on incentives
to attract a customer while another company loses revenue when the
customer departs.
Data science Data science, engineering, and data-driven decision making
principles
Data driven decision making (DDD) refers to the practice of basing
decisions on the analysis of data, rather than purely on intuition.
Two types of decisions focused on in this book:
1. “Need discoveries”
2. “Repeated decisions”
And so, even a small increase in decision-making accuracy can have a
big impact.

Example:
Target wanted to jump on their competition: Amazon. They were
interested whether they could predict that people are expecting a
baby. If they could, they would gain an advantage by making offers
before their competitors.
If they could, they would gain an advantage by making offers before
their competitors.
à Pregnant mothers often change their diets, wardrobes, vitamin
etc.
Big data Data processing and “Big Data”
Difference between data science and data-driven business:


Sophie van Sonsbeek - 12799955

, • Data science needs data and benefits from data engineering
that are facilitated by data processing technologies. But
these techniques are not only for data science.
o Data processing technologies are important for data-
oriented business tasks that do not involve extracting
knowledge or data-driven decision making.
o E.g. online advertising campaign management,
modern web system processing
• Big data technologies:
o Big data = datasets that are too large for traditional
data processing systems require new processing
technologies.
o Big data technologies are used for implementing data
mining techniques à support data processing of data
mining techniques.
Strategic asset Data and data science capability as a strategic asset
Fundamental principle of data science: data, and the capability to
extract useful knowledge form data, should be regarded as key
strategic assets.




Sophie van Sonsbeek - 12799955

,Chapter 2. Business Problems and Data Science Solutions

Summary Fundamental concepts: A set of canonical data mining tasks; the data
mining process; supervised versus unsupervised data mining.

Understanding the whole data mining process helps to structure data
mining projects into systematic analyses.
Data mining From business problems to data mining tasks
techniques Data scientists decompose a business problem into sub tasks. The
data mining subtasks can then be composed to solve the overall
problem.

Data mining algorithms:
1. Classification and class probability estimation attempt to
predict, for each population, which of small set of classes this
individual belongs to.
Classification and scoring are very closely related; as we shall
see, a model that can do one can usually be modified to do
the other.
2. Regression (“value estimation”) attempts to estimate or
predict, for each individual, the numerical value of some
variable for that individual.
“How much will a given customer use the service?”
3. Similarity matching attempts to identify similar individuals
based on data known about them.
4. Clustering attempts to group individuals in a population
together by their similarity.
“Do our customers form natural groups or segments?”
5. Co-occurrence grouping attempts to find associations
between entities based on transactions involving them.
“What items are commonly purchased together?”
6. Profiling attempts to characterize the typical behavior of an
individual, group or population.
“What is the typical cell phone usage of this customer
segment?”
7. Link prediction attempts to predict connections between data
items.
“Since you and Karen share 10 friends, maybe you’d like to be
Karen’s friend?”
8. Data reduction attempts to take a large set of data and
replace it with a smaller set of data that contains much of the
important information in the larger set.
“GPA instead of list of grades per student”
9. Causal modeling attempts to help us understand what events
or actions actually influence others.



Sophie van Sonsbeek - 12799955

,Supervised versus Supervised learning = training data has a dependent variable or target
unsupervised variable.
methods - Purpose: predicting the target
- Problem: “will a customer leave when her contract expires?”
- Data mining techniques:
o Classification
§ Categorical (binary) target
§ “Which service package will a customer likely
purchase if given incentive I?
o Regression
§ Numeric target
§ “How much will this customer use the service?”
o Causal modeling
The data mining 1. Business understanding
process a. Recasting the problem & designing a solution is
iterative process of discovery.
2. Data understanding
a. It’s important to understand strengths & limitations of
the data because rarely there is an exact match with
the problem
3. Data preparation
a. Is the phase in which data are manipulated and
converted into forms that yield better results?
4. Modeling
a. Output of modeling: some sort of model or pattern
capturing regularities in the data valid & reliable
5. Evaluation
a. Are the data mining results valid & reliable?
6. Deployment
a. Getting return on investment by implementing the
results




Sophie van Sonsbeek - 12799955

,Chapter 3. Introduction to Predictive Modeling: From Correlation to
Supervised Segmentation
Summary Fundamental concepts: identifying informative attributes;
segmenting data by progressive attribute selection.
Exemplary techniques: finding correlations; attribute/variable
selection; tree induction

Predictive modeling: supervised segmentation – how can we segment
the population into groups that differ from each other with respect to
some quantity of interest.
Models, Predictive model = a formula for estimating the target.
induction and - Classification
prediction - Regression
Descriptive model = gain insight into the underlying phenomenon or
process.

Supervised learning = model describes a relationship between
independent variables and target variable.

Deductive vs inductive
Induction = generalizing from specific cases to general rules.
Inductive models:
- Classification and regression
Input data used for inducing the model à training data
Training data = are called labeled data because the value for the
target variable is known.
Supervised Selecting informative attributes
segmentation Classification
The groups need to be pure à homogeneous with respect to the
target variable.

The most common splitting criterion is called information gain, and it
is based on a purity measure called entropy.

Entropy = a measure of disorder (how mixed the segment is with
respect to the target variable).

P = probability for getting that element (p=1, all members of the set
have property x, p=0, no members of the set have property x)
Measure for group impurity
0=pure

1 = maximum impurity




Sophie van Sonsbeek - 12799955

, Information gain = the improvement in purity created by
segmentation. It combines segment size and segment purity.



Numeric variables
Numeric variables can be ‘discretized’ by choosing a split point (or
many split points) and then treating the result as a categorical
attribute.
Visualizing Classification tree
segmentations




Decision lines and hyperplanes
The lines separating the regions are known as decision lines.
Hyperplane is used in data mining literature to refer to the general
separating surface, whatever it may be.
La place Overfitting
correction La place correction moderates the influence of leaves with only a few
instances.


N = number of examples in the leaf belonging to class C
M = the number of examples not belonging to class C




Trees and sets of Before starting to build a classification tree with variables, it is worth
rules asking: how good are each of these variables individually?

For this we measure the information gain of each attribute, as
discussed earlier.
As can be seen, the first three variables – the house value, the
number of leftover minutes, and the number of long calls per month
– have a higher information gain than the rest.




Sophie van Sonsbeek - 12799955
€10,24
Accéder à l'intégralité du document:

Garantie de satisfaction à 100%
Disponible immédiatement après paiement
En ligne et en PDF
Tu n'es attaché à rien

Faites connaissance avec le vendeur

Seller avatar
Les scores de réputation sont basés sur le nombre de documents qu'un vendeur a vendus contre paiement ainsi que sur les avis qu'il a reçu pour ces documents. Il y a trois niveaux: Bronze, Argent et Or. Plus la réputation est bonne, plus vous pouvez faire confiance sur la qualité du travail des vendeurs.
sophievansonsbeek Universiteit van Amsterdam
S'abonner Vous devez être connecté afin de suivre les étudiants ou les cours
Vendu
35
Membre depuis
5 année
Nombre de followers
31
Documents
4
Dernière vente
2 mois de cela
MSc Business Administration Summaries

UvA Summaries for: - Premaster Business Administration courses - MSc Business Administration - Marketing & Digital Marketing track - MSc Business Administration - Entrepreneurship & Innovation track

0,0

0 revues

5
0
4
0
3
0
2
0
1
0

Récemment consulté par vous

Pourquoi les étudiants choisissent Stuvia

Créé par d'autres étudiants, vérifié par les avis

Une qualité sur laquelle compter : rédigé par des étudiants qui ont réussi et évalué par d'autres qui ont utilisé ce document.

Le document ne convient pas ? Choisis un autre document

Aucun souci ! Tu peux sélectionner directement un autre document qui correspond mieux à ce que tu cherches.

Paye comme tu veux, apprends aussitôt

Aucun abonnement, aucun engagement. Paye selon tes habitudes par carte de crédit et télécharge ton document PDF instantanément.

Student with book image

“Acheté, téléchargé et réussi. C'est aussi simple que ça.”

Alisha Student

Foire aux questions