Lecture 2:
1. Big data models that work well at one point in time may not work in other situations. This
is best described as an example of… Lack of generalizability.
a. Lack of representativeness → has to do with type of data you get to built your
model on (e.g. only looking at people who write reviews of your product, might
not cover all your consumers → more input from really negative/positive people)
b. Lack of causal evidence → models tell the things are correlated, but doesn’t tell
us about a causal relationship
c. The presence of omitted variables
2. According to Cathy O’Neil, you need two things to build an algorithm: (i) data and (ii) a
model for success → the data you chose to use will shape your model, and what you
define as success for the model is just as important as the calculations that take place
3. Youyou paper: When predicting life outcomes, it is generally best to rely on… self-
reported personality assessments.
4. Computer models that use Facebook data for personality assessment are most accurate
when predicting the trait of… openness to experience.
5. What is a LASSO regression, and how is it different from a regular linear regression?
Normal regression model has 1 goal: fit the data using the predictors in the best
way that it can = give me the coefficient/the data that has the model fit the best
way possible. LASSO has 2 goals: (i) fit the data to the model and (ii) using as few
variables/coefficients as possible → making sure your model is not overfit with
the data you're training it for.
6. Which of the following types can be used to produce the most accurate personality
assessments? Facebook likes.
7. According to Gladstone et al. (2019), consumer spending data is best described as…
behavioral residue.
8. Individuals who spend more money on dining and drinking are likely to score higher in
the trait of… Extraversion.
9. Predictive models based on spending are significantly less accurate for… individuals
living in high deprivation: the model does less of a good job of predicting
personality from your spending, why? / are there other biases that might exist in
these papers?
Lecture 3:
1. In one study, an ad for a product was shown to 3 million users. Of those users, 3000
users clicked on the ad and 600 users purchased the product.
a. Clickthrough rate: number of clicks / numbers shown ad to = 0.1%
b. Conversion rate: (installs/reach × 100), so it is 600/3.000.000*100 = 0.02%
2. Rob is someone who really enjoys computers, Stargate and Stargate SG1. Which often
following statements is likely true about Rob? Rob is low in extraversion