Module 1: Intro
1. Big
When is big an advantage?
When the event is rare or small
When there is “heterogeneity” in responses
When the relationship is complex
2. Always-on
Collecting data in real-time is important when we need to know and respond to the answers
quickly
• Economic activity
• Public health
• Monitoring competition
• Trend spotting
• Product harm crises
• Marketing response
• Digital marketing dashboard
3. Nonreactive
People usually change their behavior when they know they are being observed. But with big
data users are typically not aware they are being recorded.
4. Incomplete
“Big Data record what happened, but not why”
Example
• Predicting churn = predicting customer quit• Attracting new customers costs 5-6x more than
retaining existing
• Solution: use big data to predict churn, intervene on those likely to churn before they churn
• Big data record usage, charges, channel of acquisition
, • These may predict churn: customers who use the service a lot are less likely to churn.
• But this isn’t a cause
5. Inaccessible
• From outside the organization:
• Legal, business or ethical barriers to giving outside researchers
access to data
• From inside:
• Databases are not integrated, lacking variables to match,
different coding schemes
6. Nonrepresentative
If your sample is representative, you can make inferences about the population based on your
sample
7. Drifting
“If you want to measure change, don’t change the measure”
Often, we’re interested in analyzing something over time
Big data sources can suffer from problems because:
• The users can change
• How they use it can change
• The platform itself changes
8. Algorithmically confounded
• How the platform is designed can influence behavior, introducing bias or noise into what
you’re trying to study
• Facebook encourages users to have at least 20 friends
• Difficult to study: on average how many friends do people have?
9. Dirty
• Big data sources can be
loaded with junk or spam
10. Sensitive
Some of the information that companies have is sensitive
• Strava fitness app reveals information on military sites.
1. Big
When is big an advantage?
When the event is rare or small
When there is “heterogeneity” in responses
When the relationship is complex
2. Always-on
Collecting data in real-time is important when we need to know and respond to the answers
quickly
• Economic activity
• Public health
• Monitoring competition
• Trend spotting
• Product harm crises
• Marketing response
• Digital marketing dashboard
3. Nonreactive
People usually change their behavior when they know they are being observed. But with big
data users are typically not aware they are being recorded.
4. Incomplete
“Big Data record what happened, but not why”
Example
• Predicting churn = predicting customer quit• Attracting new customers costs 5-6x more than
retaining existing
• Solution: use big data to predict churn, intervene on those likely to churn before they churn
• Big data record usage, charges, channel of acquisition
, • These may predict churn: customers who use the service a lot are less likely to churn.
• But this isn’t a cause
5. Inaccessible
• From outside the organization:
• Legal, business or ethical barriers to giving outside researchers
access to data
• From inside:
• Databases are not integrated, lacking variables to match,
different coding schemes
6. Nonrepresentative
If your sample is representative, you can make inferences about the population based on your
sample
7. Drifting
“If you want to measure change, don’t change the measure”
Often, we’re interested in analyzing something over time
Big data sources can suffer from problems because:
• The users can change
• How they use it can change
• The platform itself changes
8. Algorithmically confounded
• How the platform is designed can influence behavior, introducing bias or noise into what
you’re trying to study
• Facebook encourages users to have at least 20 friends
• Difficult to study: on average how many friends do people have?
9. Dirty
• Big data sources can be
loaded with junk or spam
10. Sensitive
Some of the information that companies have is sensitive
• Strava fitness app reveals information on military sites.