Lecture 1
Big data
Three characteristics:
- Volume
- Velocity
- Variability
Administrative data: Provided by persons or organizations for regulatory or other
government activities
Transaction data: Generated as an automatic byproduct of transaction and activities (e.g.,
credit card data, traffic flow data)
Social media data: Created by people with the express purpose of sharing with (Some)
others
Sensor data: Generated by apps on for example smarthphones (GPS, accerometers,
biomarkers)
Limitations of big data
Couper (2013):
- Single variable, few covariates
- Bias through self-selection and self-presentation
- Volatility or lack of stability
- Privacy issues
- Access issues
- Opportunity for mischief
- Bigger does not mean better
Single variable, Limited covariates:
- Surveys are much more than a single variable
- Limited demographic variables provided or imputed may be wrong
- Knowing changing fuel prices is not the same as knowing what people do in response
to such changing prices
,Bias:
- Two sources of bias:
Selection bias:
- Not everyone uses social media
- Need to distinguish between producers and users of users of social media
- Not everyone uses loyalty cards or credits cards, or makes purchases online
Measurement bias:
- Impression management is a key element of social media
- The average Facebook user has MANY “friends”
Technological speed (Lack of stability):
Social media may be good for measuring short term trends, but surveys may be better for
longer-run measurement
Access Issues:
- Social media and transaction data are usually propriety
- A key strength of surveys is public access to data, permitting Replication and
reanalysis
Oppurtinity for mischief:
Example of justin Bieber fans. A part of them is fake but also counted as fans.
Bigger does not mean better
Example election polls USA.
Research process
, Introduction into survey methodology
- Developing and conducting a survey is an iterative process. Use the feedback given
by instructors and peers
Survey research:
- A method for gathering information about a specified group of people (Population)
by asking them questions
- For fact finding and theory testing
- Questions as well as answers are highly structured
- Ask yourself: Is a survey the best instrument to answer my research question?
When is a survey an appropriate tool?
- It is about populations: What unit of analysis? (Only when the unit is ‘individual’)
- How much do members of a population know? (Not too difficult questions)
- How much are respondents willing to reveal (Not to sensitive)
- It is about non-observable behavior
- It is only reliable when respondent is aware of behavior
- A survey is an unreliable tool to discover attitudes and behavior of the distant past
GIGO
- In order to prevent errors, scientists use a methodological framework
- In order to reduce the negative consequences of unavoidable errors that enter during
the research process, scientists use statistical tools
- Remember: you can rarely put right by statistics what you have messed up by design
- GIGO: Garbage In Garbage Out