Module 1
Big data: a record created and stored of some behavior. Made of digital traces.
Not just about the size, but also about the tools/models we use to extract insights from data.
Data can be big in different ways:
- Tall data (n >> p): many observations, relatively few variables.
- Wide data (n << p): few observations, many variables.
à you need lots of observations to learn about many variables.
Primary data (custommade): data collected to answer a specific research question.
Secondary data (readymade): data collected for non-research purpose, has to be refitted to
answer question.
à most big data applications are secondary.
Types of business data:
Structured (ready for analyse, on a scale) Unstructured
Internal Human-Generated: survey ratings, Human-Generated: emails,
generated aptitude testing. letters, text messages, audio
transcripts, customer comments,
Machine-Generated: web metrics from voicemails, corporate
web logs, product purchase from sales video/communications, pictures,
records, process control measures. illustrations, employee reviews.
External Human-Generated: number of retweets, Human-Generated: content of
generated Facebook likes, patient ratings. social media updates, comments
in online forums, video reviews,
Machine-Generated: GPS for tweets, Pinterest images, surveillance
time of tweet/updates/postings. video.
Language processing is a process of extracting something useful (e.g. topic) from something
that is essentially useless (e.g. text).
Big data for personalization: using big data in order to personalize recommendations.
à recommender systems.
Big data for boosting engagement: everything is tested before it is implemented for everyone.
à A/B testing: test whether different versions of something are better.
Big data for new product development: seeing what is currently trending and use that to
decide what things to do in the future.
Big data for reducing customer churn (klantverloop, customer quits some service):
1. Use past data to estimate model that predicts churn.
2. Use that model to predict probability of churn on current customers.
3. Intervene on those most likely to churn.
Big data for public policy & economy: big data helps answering important public policy and
economic questions. Now that everyone has smartphones, we can track what’s happening.
Advantages of using big data to learn about things (characteristics):
- Big: useful when the event is rare/small, when there is “heterogeneity” in responses
(customers respond differently to the same thing), when the relationship is complex.
- Reducing uncertainty (narrow the confidence interval).
- Observe more different types of customers and see how their responses vary.
- Always on (always collecting information): important when we need to know and
respond to the answers quickly.
- Nonreactive: users are typically not aware they are being recorded or they don’t care,
so people don’t change their behavior.
, Disadvantages of using big data to learn about things (characteristics):
- Incomplete: big data record what happened, but not why.
- Inaccessible:
- From outside organization: legal/business/ethical barriers.
- From inside organization: databases not integrated, different coding schemes,
lacking variables to match, customer interacted with different touchpoints.
- Non-representative: the expressed opinion may not be representative of the
underlying opinion à can’t make inferences about population based on your sample.
- Drifting: if you want to measure change, don’t change the measure. But with big data
the users can change, how they use it can change, the platform itself changes, etc.
- Algorithmically confounded: how the platform is designed can influence behavior,
introducing bias or noise into what you’re trying to study.
- Dirty: big data sources can be loaded with junk or spam.
- Sensitive: some of the information that companies have is sensitive (privacy).
Big data in marketing:
- Personalization: give recommendations based on user behavior.
- Turning users into marketers: for example Spotify uses data to make Spotify
wrapped, so people can see and share this.
- Advantage: retention and loyalty à reminds how much value you got from it.
- Disadvantage: being reminded that they track your data feels invasive.
- Purchase data: e.g. loyalty card (bonuskaart) AH. What you can do with this:
- Personalize promotions.
- Cross-selling: recommend other products you might also like.
- Assortment planning: predict what products are bought together.
- Price segmentation: determine which segments are price-sensitive to private
label vs. A-brand.
- Predicting churn & lifetime value: intervene before a customer leaves, and
estimate how much this customer is worth to AH.
- Partnerships & data monetization: advertising, but in grocery store à brands pay AH
to target their shoppers with personalized ads / targeted promotions. Effectiveness of
these campagins is measured directly against loyalty card transaction data.
- Targeting ads to segments: use big data to create narrow segments that provide a
better match.
- Predicting consumer behavior:
- When will the customer quit or churn? Should we intervene to keep them or let
them go?
- What service or product will customer use next? Should we nudge them in a
certain direction?
- If we spend a certain amount of money in certain channels, how many and
what types of customers are we likely to acquire (join or subscribe)?
How do we judge the value of the estimate?
- Bias = how far the estimate is off from the truth on average.
- Variance = how much “noise” or uncertainty there is in the
estimates.
(bottom left is big data (best), top right is small data (worst))
Big data: a record created and stored of some behavior. Made of digital traces.
Not just about the size, but also about the tools/models we use to extract insights from data.
Data can be big in different ways:
- Tall data (n >> p): many observations, relatively few variables.
- Wide data (n << p): few observations, many variables.
à you need lots of observations to learn about many variables.
Primary data (custommade): data collected to answer a specific research question.
Secondary data (readymade): data collected for non-research purpose, has to be refitted to
answer question.
à most big data applications are secondary.
Types of business data:
Structured (ready for analyse, on a scale) Unstructured
Internal Human-Generated: survey ratings, Human-Generated: emails,
generated aptitude testing. letters, text messages, audio
transcripts, customer comments,
Machine-Generated: web metrics from voicemails, corporate
web logs, product purchase from sales video/communications, pictures,
records, process control measures. illustrations, employee reviews.
External Human-Generated: number of retweets, Human-Generated: content of
generated Facebook likes, patient ratings. social media updates, comments
in online forums, video reviews,
Machine-Generated: GPS for tweets, Pinterest images, surveillance
time of tweet/updates/postings. video.
Language processing is a process of extracting something useful (e.g. topic) from something
that is essentially useless (e.g. text).
Big data for personalization: using big data in order to personalize recommendations.
à recommender systems.
Big data for boosting engagement: everything is tested before it is implemented for everyone.
à A/B testing: test whether different versions of something are better.
Big data for new product development: seeing what is currently trending and use that to
decide what things to do in the future.
Big data for reducing customer churn (klantverloop, customer quits some service):
1. Use past data to estimate model that predicts churn.
2. Use that model to predict probability of churn on current customers.
3. Intervene on those most likely to churn.
Big data for public policy & economy: big data helps answering important public policy and
economic questions. Now that everyone has smartphones, we can track what’s happening.
Advantages of using big data to learn about things (characteristics):
- Big: useful when the event is rare/small, when there is “heterogeneity” in responses
(customers respond differently to the same thing), when the relationship is complex.
- Reducing uncertainty (narrow the confidence interval).
- Observe more different types of customers and see how their responses vary.
- Always on (always collecting information): important when we need to know and
respond to the answers quickly.
- Nonreactive: users are typically not aware they are being recorded or they don’t care,
so people don’t change their behavior.
, Disadvantages of using big data to learn about things (characteristics):
- Incomplete: big data record what happened, but not why.
- Inaccessible:
- From outside organization: legal/business/ethical barriers.
- From inside organization: databases not integrated, different coding schemes,
lacking variables to match, customer interacted with different touchpoints.
- Non-representative: the expressed opinion may not be representative of the
underlying opinion à can’t make inferences about population based on your sample.
- Drifting: if you want to measure change, don’t change the measure. But with big data
the users can change, how they use it can change, the platform itself changes, etc.
- Algorithmically confounded: how the platform is designed can influence behavior,
introducing bias or noise into what you’re trying to study.
- Dirty: big data sources can be loaded with junk or spam.
- Sensitive: some of the information that companies have is sensitive (privacy).
Big data in marketing:
- Personalization: give recommendations based on user behavior.
- Turning users into marketers: for example Spotify uses data to make Spotify
wrapped, so people can see and share this.
- Advantage: retention and loyalty à reminds how much value you got from it.
- Disadvantage: being reminded that they track your data feels invasive.
- Purchase data: e.g. loyalty card (bonuskaart) AH. What you can do with this:
- Personalize promotions.
- Cross-selling: recommend other products you might also like.
- Assortment planning: predict what products are bought together.
- Price segmentation: determine which segments are price-sensitive to private
label vs. A-brand.
- Predicting churn & lifetime value: intervene before a customer leaves, and
estimate how much this customer is worth to AH.
- Partnerships & data monetization: advertising, but in grocery store à brands pay AH
to target their shoppers with personalized ads / targeted promotions. Effectiveness of
these campagins is measured directly against loyalty card transaction data.
- Targeting ads to segments: use big data to create narrow segments that provide a
better match.
- Predicting consumer behavior:
- When will the customer quit or churn? Should we intervene to keep them or let
them go?
- What service or product will customer use next? Should we nudge them in a
certain direction?
- If we spend a certain amount of money in certain channels, how many and
what types of customers are we likely to acquire (join or subscribe)?
How do we judge the value of the estimate?
- Bias = how far the estimate is off from the truth on average.
- Variance = how much “noise” or uncertainty there is in the
estimates.
(bottom left is big data (best), top right is small data (worst))