Data science and ethics
Inhoud
Inleiding .................................................................................................................................. 6
Course and Evaluation........................................................................................................ 6
Why care? ........................................................................................................................... 6
1. Expected from society ............................................................................................................. 6
2. Huge potential risks ................................................................................................................. 6
3. Potential benefits .................................................................................................................... 7
4. Future ...................................................................................................................................... 7
5. SciFi becomes Sci ..................................................................................................................... 7
Goal of the course .................................................................................................................. 8
Ethics in the News................................................................................................................... 8
Data science ethics ................................................................................................................. 8
Trolley Problem .................................................................................................................. 9
Ethics of self-driving cars .................................................................................................... 9
Data, Algorithms and Models........................................................................................... 10
Different Roles.................................................................................................................. 11
FAT ........................................................................................................................................ 11
FAT Flow: a Data Science Ethics Framework .................................................................... 12
FAT Flow: Concepts and Techniques ................................................................................ 13
FAT Flow: Cautionary Tales .............................................................................................. 13
Subjectivity of ethics ........................................................................................................ 13
Discussion Case 1....................................................................................................................... 14
Fair Data Gathering .......................................................................................................... 14
Transparent Data Gathering............................................................................................. 14
Discussion Case 2....................................................................................................................... 14
Fair Data Preparation ....................................................................................................... 15
Transparent Data Preparation ......................................................................................... 15
Fair Data Modelling .......................................................................................................... 15
Transparant Data Modeling ............................................................................................. 15
Fair Model Evaluation ...................................................................................................... 15
Transparent Model Evaluation ......................................................................................... 16
Fair Model Deployment ................................................................................................... 16
Transparent Model Deployment ...................................................................................... 16
Beyond Data Science Ethics .................................................................................................. 16
1
, Ethical AI Frameworks .......................................................................................................... 16
IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems (2018) ............ 16
Ethics guidelines for trustworthy AI (2019) ..................................................................... 17
White House Executive Order on Maintaining American Leadership in Artificial
Intelligence, Feb. 2019 ..................................................................................................... 17
ISO .................................................................................................................................... 17
Discussion Case 3....................................................................................................................... 17
Ethical Data Gathering ............................................................................................................. 18
Privacy and GDPR ................................................................................................................. 18
Privacy .............................................................................................................................. 18
GDPR ................................................................................................................................. 20
GDPR key concepts .................................................................................................................... 20
Discussion Case 1....................................................................................................................... 24
CIA ......................................................................................................................................... 24
Privacy Mechanisms: Encryption and hashing ..................................................................... 24
Symmetric encryption ...................................................................................................... 26
Asymmetric encryption .................................................................................................... 26
Encryption for data protection......................................................................................... 28
Hashing ............................................................................................................................. 29
Quantum Computing ........................................................................................................ 32
Obfuscation ...................................................................................................................... 33
Government Backdoor ......................................................................................................... 33
Public data ............................................................................................................................ 35
Clearview.AI...................................................................................................................... 36
Bias ........................................................................................................................................ 36
Sample Bias ...................................................................................................................... 37
Experimentation ................................................................................................................... 39
Summary data gathering ...................................................................................................... 41
Ethical Data Preprocessing ....................................................................................................... 41
Input Selection ................................................................................................................. 41
Discrimination against sensitive groups: Data Preprocessing for non-discrimination ........ 42
Measuring ......................................................................................................................... 42
Proxies for discrimination.......................................................................................................... 42
Methods ........................................................................................................................... 43
1. Massaging: Relabeling ........................................................................................................... 43
2. Reweighing ............................................................................................................................ 45
2
, 3. Sampling ................................................................................................................................ 47
Experiments ............................................................................................................................... 47
Conclusions................................................................................................................................ 48
Privacy ................................................................................................................................... 49
Defining Target Variable................................................................................................... 49
Measuring Fairness (Revisited) ........................................................................................ 49
COMPAS case............................................................................................................................. 50
Methods to include privacy .............................................................................................. 50
Anonymizing Data ..................................................................................................................... 50
Online Re-identificaiton ................................................................................................... 53
Conclusion: ....................................................................................................................... 55
Data Preprocessing and Modelling: Privacy ............................................................................. 55
Data preprocessing ............................................................................................................... 55
K-anonymity ..................................................................................................................... 55
Recap k-anonymity .................................................................................................................... 55
L-diversity ......................................................................................................................... 56
T-closeness ....................................................................................................................... 58
Differential privacy ........................................................................................................... 59
Privacy loss parameter ε............................................................................................................ 62
How do we add this noise? ....................................................................................................... 63
Assumption 1: Single Count Query. Needed? ........................................................................... 64
Assumption 2: trusted data curator .......................................................................................... 66
Conclusion ........................................................................................................................ 68
Ethical Modelling: Including Privacy and Preferences ............................................................. 69
Including Privacy ................................................................................................................... 69
Differential Privacy ........................................................................................................... 69
Zero Knowledge Proofs .................................................................................................... 69
Homomorphic Encryption ................................................................................................ 70
Secure Multi Party Communication ................................................................................. 72
Applications ............................................................................................................................... 74
Federated Learning .......................................................................................................... 75
Federated Averaging ................................................................................................................. 76
Applications ............................................................................................................................... 77
Overview........................................................................................................................... 77
Including Preferences ........................................................................................................... 78
3
, Including domain knowledge: monotonicity constraints................................................. 78
Trolley problem ................................................................................................................ 79
Including Ethical Preferences .................................................................................................... 79
Ethical Modelling: Including fairness and Explainable AI ......................................................... 81
Fairness in modeling stage: measures and methods ........................................................... 81
Measures .......................................................................................................................... 81
Measuring fairness of Y’ ............................................................................................................ 81
Methods ........................................................................................................................... 83
COMPAS ........................................................................................................................... 83
Including Fairness in Modeling ......................................................................................... 84
Explainable AI ....................................................................................................................... 85
Why need for explanations .............................................................................................. 85
Trust........................................................................................................................................... 85
Compliance ................................................................................................................................ 87
Insight ........................................................................................................................................ 87
Improve ..................................................................................................................................... 87
Comprehensible and Explaining ....................................................................................... 88
Global and instance-based explanation methods............................................................ 89
Explanations .............................................................................................................................. 89
ANN/SVM Rule Extraction ......................................................................................................... 90
SVM Rule Extraction .................................................................................................................. 91
Linear Models ............................................................................................................................ 93
Instance-based explanations ..................................................................................................... 93
Advantages ................................................................................................................................ 97
Challenges ................................................................................................................................. 98
Conclusion ................................................................................................................................. 98
Ethical Reporting ...................................................................................................................... 98
Ethical Reporting .............................................................................................................. 98
p-Hacking ................................................................................................................................... 99
Multiple comparisons .............................................................................................................. 100
Case 1: Twitter to predict stock market .................................................................................. 101
Case 2: Reporting in credit scoring .......................................................................................... 103
Introduction to validation ....................................................................................................... 103
Quantitative validation ............................................................................................................ 104
Qualitative validation .............................................................................................................. 108
The advertising technology industry .............................................................................. 108
4
Inhoud
Inleiding .................................................................................................................................. 6
Course and Evaluation........................................................................................................ 6
Why care? ........................................................................................................................... 6
1. Expected from society ............................................................................................................. 6
2. Huge potential risks ................................................................................................................. 6
3. Potential benefits .................................................................................................................... 7
4. Future ...................................................................................................................................... 7
5. SciFi becomes Sci ..................................................................................................................... 7
Goal of the course .................................................................................................................. 8
Ethics in the News................................................................................................................... 8
Data science ethics ................................................................................................................. 8
Trolley Problem .................................................................................................................. 9
Ethics of self-driving cars .................................................................................................... 9
Data, Algorithms and Models........................................................................................... 10
Different Roles.................................................................................................................. 11
FAT ........................................................................................................................................ 11
FAT Flow: a Data Science Ethics Framework .................................................................... 12
FAT Flow: Concepts and Techniques ................................................................................ 13
FAT Flow: Cautionary Tales .............................................................................................. 13
Subjectivity of ethics ........................................................................................................ 13
Discussion Case 1....................................................................................................................... 14
Fair Data Gathering .......................................................................................................... 14
Transparent Data Gathering............................................................................................. 14
Discussion Case 2....................................................................................................................... 14
Fair Data Preparation ....................................................................................................... 15
Transparent Data Preparation ......................................................................................... 15
Fair Data Modelling .......................................................................................................... 15
Transparant Data Modeling ............................................................................................. 15
Fair Model Evaluation ...................................................................................................... 15
Transparent Model Evaluation ......................................................................................... 16
Fair Model Deployment ................................................................................................... 16
Transparent Model Deployment ...................................................................................... 16
Beyond Data Science Ethics .................................................................................................. 16
1
, Ethical AI Frameworks .......................................................................................................... 16
IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems (2018) ............ 16
Ethics guidelines for trustworthy AI (2019) ..................................................................... 17
White House Executive Order on Maintaining American Leadership in Artificial
Intelligence, Feb. 2019 ..................................................................................................... 17
ISO .................................................................................................................................... 17
Discussion Case 3....................................................................................................................... 17
Ethical Data Gathering ............................................................................................................. 18
Privacy and GDPR ................................................................................................................. 18
Privacy .............................................................................................................................. 18
GDPR ................................................................................................................................. 20
GDPR key concepts .................................................................................................................... 20
Discussion Case 1....................................................................................................................... 24
CIA ......................................................................................................................................... 24
Privacy Mechanisms: Encryption and hashing ..................................................................... 24
Symmetric encryption ...................................................................................................... 26
Asymmetric encryption .................................................................................................... 26
Encryption for data protection......................................................................................... 28
Hashing ............................................................................................................................. 29
Quantum Computing ........................................................................................................ 32
Obfuscation ...................................................................................................................... 33
Government Backdoor ......................................................................................................... 33
Public data ............................................................................................................................ 35
Clearview.AI...................................................................................................................... 36
Bias ........................................................................................................................................ 36
Sample Bias ...................................................................................................................... 37
Experimentation ................................................................................................................... 39
Summary data gathering ...................................................................................................... 41
Ethical Data Preprocessing ....................................................................................................... 41
Input Selection ................................................................................................................. 41
Discrimination against sensitive groups: Data Preprocessing for non-discrimination ........ 42
Measuring ......................................................................................................................... 42
Proxies for discrimination.......................................................................................................... 42
Methods ........................................................................................................................... 43
1. Massaging: Relabeling ........................................................................................................... 43
2. Reweighing ............................................................................................................................ 45
2
, 3. Sampling ................................................................................................................................ 47
Experiments ............................................................................................................................... 47
Conclusions................................................................................................................................ 48
Privacy ................................................................................................................................... 49
Defining Target Variable................................................................................................... 49
Measuring Fairness (Revisited) ........................................................................................ 49
COMPAS case............................................................................................................................. 50
Methods to include privacy .............................................................................................. 50
Anonymizing Data ..................................................................................................................... 50
Online Re-identificaiton ................................................................................................... 53
Conclusion: ....................................................................................................................... 55
Data Preprocessing and Modelling: Privacy ............................................................................. 55
Data preprocessing ............................................................................................................... 55
K-anonymity ..................................................................................................................... 55
Recap k-anonymity .................................................................................................................... 55
L-diversity ......................................................................................................................... 56
T-closeness ....................................................................................................................... 58
Differential privacy ........................................................................................................... 59
Privacy loss parameter ε............................................................................................................ 62
How do we add this noise? ....................................................................................................... 63
Assumption 1: Single Count Query. Needed? ........................................................................... 64
Assumption 2: trusted data curator .......................................................................................... 66
Conclusion ........................................................................................................................ 68
Ethical Modelling: Including Privacy and Preferences ............................................................. 69
Including Privacy ................................................................................................................... 69
Differential Privacy ........................................................................................................... 69
Zero Knowledge Proofs .................................................................................................... 69
Homomorphic Encryption ................................................................................................ 70
Secure Multi Party Communication ................................................................................. 72
Applications ............................................................................................................................... 74
Federated Learning .......................................................................................................... 75
Federated Averaging ................................................................................................................. 76
Applications ............................................................................................................................... 77
Overview........................................................................................................................... 77
Including Preferences ........................................................................................................... 78
3
, Including domain knowledge: monotonicity constraints................................................. 78
Trolley problem ................................................................................................................ 79
Including Ethical Preferences .................................................................................................... 79
Ethical Modelling: Including fairness and Explainable AI ......................................................... 81
Fairness in modeling stage: measures and methods ........................................................... 81
Measures .......................................................................................................................... 81
Measuring fairness of Y’ ............................................................................................................ 81
Methods ........................................................................................................................... 83
COMPAS ........................................................................................................................... 83
Including Fairness in Modeling ......................................................................................... 84
Explainable AI ....................................................................................................................... 85
Why need for explanations .............................................................................................. 85
Trust........................................................................................................................................... 85
Compliance ................................................................................................................................ 87
Insight ........................................................................................................................................ 87
Improve ..................................................................................................................................... 87
Comprehensible and Explaining ....................................................................................... 88
Global and instance-based explanation methods............................................................ 89
Explanations .............................................................................................................................. 89
ANN/SVM Rule Extraction ......................................................................................................... 90
SVM Rule Extraction .................................................................................................................. 91
Linear Models ............................................................................................................................ 93
Instance-based explanations ..................................................................................................... 93
Advantages ................................................................................................................................ 97
Challenges ................................................................................................................................. 98
Conclusion ................................................................................................................................. 98
Ethical Reporting ...................................................................................................................... 98
Ethical Reporting .............................................................................................................. 98
p-Hacking ................................................................................................................................... 99
Multiple comparisons .............................................................................................................. 100
Case 1: Twitter to predict stock market .................................................................................. 101
Case 2: Reporting in credit scoring .......................................................................................... 103
Introduction to validation ....................................................................................................... 103
Quantitative validation ............................................................................................................ 104
Qualitative validation .............................................................................................................. 108
The advertising technology industry .............................................................................. 108
4