Module 1: Intro, Characteristics of Big Data 3
I. Big data: Definition 3
II. What is a marketing strategy? 3
III. Using big data to learn about things 5
IV. Mean squared error (MSE) 6
Module 2: Research Strategies 7
● What can you use big data for? 7
● Combined model 7
● RFM analysis 7
Example 1: 8
● RFM for segment-level prediction 9
Example 2: 9
● Expected profits and ROI 10
● Assignments 2 10
Module 3: A/B Testing 11
● What is A/B testing 11
● Mechanics of random assignment 11
● What’s the unit of randomization? 11
● What do you want to measure? 11
● Compare outcomes using tests and confidence intervals 12
● Compare outcomes using tests and confidence intervals 12
● Average treatment effect 12
● Confidence interval and tests 13
● Power 13
● Sample question 14
○ “Natural” experiments 14
○ Difference-in-difference estimator 14
Module 4: Customer Journey and Attribution Analytics 15
● Do channels really add incremental sales? 15
● Multi-touch attribution: which channel gets the credit? 15
● Rule-based: Last touch or last click 15
● An example: Markov-based approach 15
● Counterfactual 15
● Average causal effect and assignment (lol) 16
Module 5: Dynamic Targeting 19
● Recommender systems: help consumers keep track of/be aware of products in markets
with a very high variety; match preferences 19
● Key issues: 19
● 1. What type of data do you use to build the RS? 19
, ● 2. How do you make predictions? 19
○ Collaborative filtering: based on similarities between users & products 20
● 3. How do you measure the success or performance of the RS? 20
● Problems with recommender systems: 20
● Assignments 20
Module 6: User Generated Content 21
● User-generated content (UGC): any form of digital content produced by volunteer users of
an online service/website; publicly available to other users 21
● Assuming the average rating (R) reveals the true value of a product: 22
● Positive pre-purchase information: if R>Ȓ (better than expected) 22
● Negative pre-purchase information; if R<Ȓ (worse than expected) 22
● Prospect theory: we’re more sensitive to losses than gains 22
● Fake reviews: increase reviews & ratings and improve sales rank 22
● J-shaped distribution of ratings: 22
● Polarity: proportion of reviews at extremes (1 and 5 stars) 23
● (Positive) Imbalance: proportion of positive (vs negative) reviews 23
● Assignments 23
Module 7: Network Analytics, Social Contagion 23
● Networks: economic agents don’t act independently 23
● Bass Model: predicts how fast users will adopt a product 24
● S-shaped curve (P<Q): people adopt due to social influence; has to take off first 24
● Inverse J-curve (P>Q): people adopt spontaneously; takes off quickly 24
● 80-20 rule: top 20% of products account for 80% of sales (long-tail effect) 24
● Assumptions: 24
○ 1. Social contagion is actually among consumers (not an alternative reason) 24
○ 2. Some consumers have more influence than others 25
○ 3. Firms are able to identify & target influentials 25
● Degree: the number of contacts an individual has in a network 25
● Degree variance: 25
● Homophily: the degree to which individuals in a network are similar, in terms of
(observable?) characteristics 25
● Network components: a set of nodes with at least one connection 25
● Distance between nodes: number of links among the shortest path 25
● Betweenness centrality: the extent to which an individual is in the shortest path between
customers in a network 26
● Closeness centrality: a measure of closeness in a network; how easy is it to reach others
in the network? (reciprocal of “farness”) 26
● Tie strength: 26
● Hinz, Skiera Barrot & Becker (2011): 26
● Kim et al (2015): looking at 32 separated rural villages in Honduras 26
● Assignments 26
Wrap up and Exam review 27
, Module 1: Intro, Characteristics of Big Data
I. Big data: Definition
● A shorthand term that refers to several things (No precise definition):
○ Size of data (observations and variables)
○ Nature of the collection process (continuous, always on)
○ Type of data (structured vs. unstructured)
○ Purpose-built or data exhaust (primary vs. secondary)
○ Passively vs. actively collected
II. What is a marketing strategy?
● Target market
○ a group of customers with similar needs that the company wants to appeal to
● Marketing mix
○ All controllable variables company puts together to satisfy the target group
○ The marketing mix is the 4 P’s: product, place, promotion, price
● Choosing the target market: Segmentation
○ Through segmentation, a firm can increase sales, prices and marketing
efficiency, makes the marketing mix more personalized and efficient: better
products, place, promotion and price
○ Customers prefer to have things that exactly meet their needs, are willing to
pay more for it, and respond better to customized communications!
○ Big data = more variables to use for segmentation (purchase history, usage
history, which may reveal interests and benefits sought)
● Acquire new customers
○ Google: they know what you search for, browsing
■ Interested in running based on search and browse online
○ Facebook: they know what you like, interests, who you know demographics
■ Part of a running group, interests in running
○ Amazon: they know what you buy, where you live (delivery), what you consume
on Amazon media
■ Bought running gear in the past
● Predicting customer behavior
○ When will the customer quit (or churn)? (Should we intervene to keep them or
let them go?)
○ What service or product will the customer use next? (Should we nudge them in
a certain direction?)
○ If we spend a certain amount of money in certain channels, how many and
what types of customers are we likely to acquire (join or subscribe)?
● Reducing churn
I. Big data: Definition 3
II. What is a marketing strategy? 3
III. Using big data to learn about things 5
IV. Mean squared error (MSE) 6
Module 2: Research Strategies 7
● What can you use big data for? 7
● Combined model 7
● RFM analysis 7
Example 1: 8
● RFM for segment-level prediction 9
Example 2: 9
● Expected profits and ROI 10
● Assignments 2 10
Module 3: A/B Testing 11
● What is A/B testing 11
● Mechanics of random assignment 11
● What’s the unit of randomization? 11
● What do you want to measure? 11
● Compare outcomes using tests and confidence intervals 12
● Compare outcomes using tests and confidence intervals 12
● Average treatment effect 12
● Confidence interval and tests 13
● Power 13
● Sample question 14
○ “Natural” experiments 14
○ Difference-in-difference estimator 14
Module 4: Customer Journey and Attribution Analytics 15
● Do channels really add incremental sales? 15
● Multi-touch attribution: which channel gets the credit? 15
● Rule-based: Last touch or last click 15
● An example: Markov-based approach 15
● Counterfactual 15
● Average causal effect and assignment (lol) 16
Module 5: Dynamic Targeting 19
● Recommender systems: help consumers keep track of/be aware of products in markets
with a very high variety; match preferences 19
● Key issues: 19
● 1. What type of data do you use to build the RS? 19
, ● 2. How do you make predictions? 19
○ Collaborative filtering: based on similarities between users & products 20
● 3. How do you measure the success or performance of the RS? 20
● Problems with recommender systems: 20
● Assignments 20
Module 6: User Generated Content 21
● User-generated content (UGC): any form of digital content produced by volunteer users of
an online service/website; publicly available to other users 21
● Assuming the average rating (R) reveals the true value of a product: 22
● Positive pre-purchase information: if R>Ȓ (better than expected) 22
● Negative pre-purchase information; if R<Ȓ (worse than expected) 22
● Prospect theory: we’re more sensitive to losses than gains 22
● Fake reviews: increase reviews & ratings and improve sales rank 22
● J-shaped distribution of ratings: 22
● Polarity: proportion of reviews at extremes (1 and 5 stars) 23
● (Positive) Imbalance: proportion of positive (vs negative) reviews 23
● Assignments 23
Module 7: Network Analytics, Social Contagion 23
● Networks: economic agents don’t act independently 23
● Bass Model: predicts how fast users will adopt a product 24
● S-shaped curve (P<Q): people adopt due to social influence; has to take off first 24
● Inverse J-curve (P>Q): people adopt spontaneously; takes off quickly 24
● 80-20 rule: top 20% of products account for 80% of sales (long-tail effect) 24
● Assumptions: 24
○ 1. Social contagion is actually among consumers (not an alternative reason) 24
○ 2. Some consumers have more influence than others 25
○ 3. Firms are able to identify & target influentials 25
● Degree: the number of contacts an individual has in a network 25
● Degree variance: 25
● Homophily: the degree to which individuals in a network are similar, in terms of
(observable?) characteristics 25
● Network components: a set of nodes with at least one connection 25
● Distance between nodes: number of links among the shortest path 25
● Betweenness centrality: the extent to which an individual is in the shortest path between
customers in a network 26
● Closeness centrality: a measure of closeness in a network; how easy is it to reach others
in the network? (reciprocal of “farness”) 26
● Tie strength: 26
● Hinz, Skiera Barrot & Becker (2011): 26
● Kim et al (2015): looking at 32 separated rural villages in Honduras 26
● Assignments 26
Wrap up and Exam review 27
, Module 1: Intro, Characteristics of Big Data
I. Big data: Definition
● A shorthand term that refers to several things (No precise definition):
○ Size of data (observations and variables)
○ Nature of the collection process (continuous, always on)
○ Type of data (structured vs. unstructured)
○ Purpose-built or data exhaust (primary vs. secondary)
○ Passively vs. actively collected
II. What is a marketing strategy?
● Target market
○ a group of customers with similar needs that the company wants to appeal to
● Marketing mix
○ All controllable variables company puts together to satisfy the target group
○ The marketing mix is the 4 P’s: product, place, promotion, price
● Choosing the target market: Segmentation
○ Through segmentation, a firm can increase sales, prices and marketing
efficiency, makes the marketing mix more personalized and efficient: better
products, place, promotion and price
○ Customers prefer to have things that exactly meet their needs, are willing to
pay more for it, and respond better to customized communications!
○ Big data = more variables to use for segmentation (purchase history, usage
history, which may reveal interests and benefits sought)
● Acquire new customers
○ Google: they know what you search for, browsing
■ Interested in running based on search and browse online
○ Facebook: they know what you like, interests, who you know demographics
■ Part of a running group, interests in running
○ Amazon: they know what you buy, where you live (delivery), what you consume
on Amazon media
■ Bought running gear in the past
● Predicting customer behavior
○ When will the customer quit (or churn)? (Should we intervene to keep them or
let them go?)
○ What service or product will the customer use next? (Should we nudge them in
a certain direction?)
○ If we spend a certain amount of money in certain channels, how many and
what types of customers are we likely to acquire (join or subscribe)?
● Reducing churn