100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
Samenvatting

Summary Social Behaviour Dynamics

Beoordeling
-
Verkocht
1
Pagina's
58
Geüpload op
09-02-2022
Geschreven in
2021/2022

Applied Data Science Utrecht University (UU): dynamics of social and behavioural processes, and longitudinal research.












Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Documentinformatie

Geüpload op
9 februari 2022
Aantal pagina's
58
Geschreven in
2021/2022
Type
Samenvatting

Voorbeeld van de inhoud

Harnan Et Al. (2019) – A Second Chance to Get Causal
Inference Right: A Classification of Data Science Tasks
Statement: statistics may be applied to make causal inferences when using data from
randomized experiments, but not when using nonexperimental (observational) data

Simpson’s paradox: failure to recognize that the choice of data analysis depends on the
causal structure of the problem

1. A Classification of Data Science Tasks
The scientific contributions of data science can be organized into three classes of tasks:

- Description: using data to provide quantitative summary of features of the world
o Elementary calculations (mean / proportion)
o Unsupervised learning algorithms (cluster analysis)
o “Clever” data visualisations (storytelling)
- Prediction: using data to map some features of the world (the inputs) to other
features of the world (the outputs); double to hundreds of variables
o Elementary calculations: e.g., correlation coefficient, risk difference
o Supervised learning algorithms: random forests, neural networks
- Counterfactual prediction: using data to predict certain features of the world as if
the world had been different, as is required in causal inference applications
o Elementary calculations by randomised experiments and perfect adherence
o Complex implementations like g-methods

Statistical inference (explanation; confirmatory) is often required for all three tasks

Sciences are primarily defined by their questions rather than by their tools




1

, 2. Prediction vs. Causal Inference
Predictive (non-causal) applications of data science: map inputs to outputs

- but do not consider how the world would look like under different courses of action

Mapping observed inputs to outputs is for automated data analysis because only requires:

- Large data set with inputs and outputs
- Algorithm that establishes a mapping between inputs and outputs
- Metric to assess the performance of the mapping, often based on a gold standard

Prediction tasks require expert knowledge to specify the scientific question:

- What inputs and what outputs
- Identify / generate relevant data sources

However, no expert knowledge is required for prediction after the inputs and outputs are
specified and measured in a particular dataset (machine learning can take over here)

Causal inference by expert knowledge to create meaning to prediction (causal structure)

Confounding factor: underlying mechanism within observations and features

Model paradox: if a variable takes over the effect of another variable (depending if left out)

Counterfact: potential outcome which is not observed

Naïve conclusion: assuming causal relations from predictions

3. Implications for Decision-Making
Predictive algorithms inform us decisions have to be made, but they cannot help us make the
decisions, also predictive algorithms do not depict actual causality

- Causal analysis needed to answer “what if” questions, and avoid agnostic features

Distinction between prediction and causal inference (counterfactual prediction) negligible for
decision-making when relevant expert knowledge is codifiable into algorithms

- Complex systems (too chaotic for long term prediction)
o unknown and nondeterministic governing laws (“rules of the game”)
o Uncertainty about necessary data are available
o Learning by trial and error (experimenting) is impossible

A complex system must be understood by qualitative (model) knowledge for causalities

- Extremely complex systems require narrow research questions and modest analysis
o Not explaining causal structure of entire system or globally optimal decisions




2

, 4. Process and Implications for Teaching
Accuracy of causal answers cannot be quantified using observational data

Data scientists without subject-matter knowledge cannot conduct causal analyses in isolation:

- They don’t know how to articulate the questions (what the target experiment is)
- They don’t know how to answer them (how to emulate the target experiment)

5. Conclusion
Data science that embraces causal inference must

- Develop methods for integration of sophisticated analytics with expert causal expertise
- Acknowledge (unlike prediction) assessment of the validity of causal inferences cannot
be exclusively data-driven because validity of causal inferences also depends on the
adequacy of expert causal knowledge

Causal directed acyclic graphs: represent different sets of causal structures compatible
with existing causal knowledge explore impact of causal uncertainty on effect estimates

Intelligence is the ability to predict counterfactually how the world would change under
different actions by integrating expert knowledge and mapping algorithms

- No AI will be worthy of the name without causal inference




3

, Holland (1986) – Statistics and Causal Inference
1. Introduction
Randomised experiments: statistical procedure with ability to identify causation

Purpose of the paper is to show, firstly, how statistics is useful for causal inference, and
secondly, the difference between causal and associational inference

2. Model for Associational Inference
The joint distribution of 𝑌 and 𝐴 over 𝑈 is specified by 𝑃𝑟( 𝑌 = 𝑦, 𝐴 = 𝑎) = proportion of 𝑢
in 𝑈 for which 𝑌(𝑢) = 𝑦 and 𝐴(𝑢) = 𝑎

- Associational parameters are determined by this joint distribution

𝑃𝑟(𝑌=𝑦, 𝐴=𝑎)
conditional distribution of 𝑌 given 𝐴 is specified by 𝑃𝑟(𝑌 = 𝑦|𝐴 = 𝑎) = 𝑃𝑟(𝐴=𝑎)


- Conditional distribution describes how distribution 𝑌 values change over 𝑈 as 𝐴 varies
- Associational parameter: regression of 𝑌 on 𝐴, conditional expectation 𝐸(𝑌|𝐴 = 𝑎)

3. Rubin’s Model for Causal Inference
o "A causes B" almost always means that A causes B relative to some other cause
that includes the condition "not A"
- Treatment 𝑆 = 𝑡 → 𝑌𝑡 (𝑈) (one cause) versus control 𝑆 = 𝑐 → 𝑌𝑐 (𝑈) (another cause)
o Controlled study: 𝑆 is constructed by the experimenter
o Uncontrolled study: 𝑆 is determined by factors beyond experimenter control
- Either case, the critical feature of the notion of cause in this model is that the value of
𝑆(𝑢) for each unit could have been different

Role of time now becomes important because a unit is exposed to a cause at some specific time
or within a specific time period variables now divide into two classes:

- Pre-exposure: those whose values are determined prior to exposure to the cause
- Post-exposure: those whose values are determined after exposure to the cause

Treatment 𝑡 causes the effect 𝑌𝑡 (𝑈) − 𝑌𝑐 (𝑈) on unit 𝑈, relative to treatment 𝑐

Fundamental problem of causal inference: it is impossible to observe the value of 𝑌𝑡 (𝑈)
and 𝑌𝑐 (𝑈) on the same unit and, therefore, it is impossible to observe the effect of 𝑡 on 𝑢

- Scientific solution: exploit various homogeneity or invariance assumptions
- Statistical solution: average causal effect 𝑇 of 𝑡 (relative to 𝑐) over 𝑈 is the expected
value of the difference 𝑌𝑡 (𝑈) − 𝑌𝑐 (𝑈) over the 𝑢’s in 𝑈: 𝐸(𝑌𝑡 − 𝑌𝑐 ) = 𝑇 = 𝐸(𝑌𝑡 ) − 𝐸(𝑌𝑐 )




4
€7,39
Krijg toegang tot het volledige document:

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Lees online óf als PDF
Geen vaste maandelijkse kosten

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
Samme Universiteit Utrecht
Bekijk profiel
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
43
Lid sinds
4 jaar
Aantal volgers
26
Documenten
9
Laatst verkocht
1 maand geleden

4,0

1 beoordelingen

5
0
4
1
3
0
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen