DATA SCIENCE
P3
KDG | 2021-22
,Inhoudsopgave
1. Data Science Inleiding ................................................................................................................................. 4
1.1 Wat is data?.................................................................................................................................................. 4
1.1.2 Klassieke opdeling van data .................................................................................................................. 4
1.1.3 Andere, nieuwere opdeling van data .................................................................................................... 5
1.1.4 Big data ................................................................................................................................................. 6
1.1.5 Smart data ............................................................................................................................................ 7
1.2 Meetschalen of meetniveaus ........................................................................................................................ 7
1.2.1 De verschillende meetschalen .............................................................................................................. 8
1.2.2 Bepalen van de meetschaal .................................................................................................................. 9
1.2.3 Wat mag en wat mag niet? ................................................................................................................. 11
1.3 Discrete en continue variabelen.................................................................................................................. 13
1.4 Betrouwbaarheid en validiteit van data ..................................................................................................... 13
1.5 Hoe uit data informatie halen? ................................................................................................................... 15
1.5.1 Business Intelligence (BI) .................................................................................................................... 16
1.5.2 Data Analytics ..................................................................................................................................... 16
2 Data Science Processes .............................................................................................................................. 17
2.1 Wat is Data Science? .................................................................................................................................. 17
2.1.1 Data Science vs. Statistics vs AI vs ...................................................................................................... 17
2.1.2 Wat doet een data scientist (en wat doet hij niet)? ........................................................................... 18
2.2 De levenscyclus van een ‘Data Science’-project .......................................................................................... 18
2.3 Data Science Pipeline .................................................................................................................................. 19
2.3.1 Collecting Data .................................................................................................................................... 20
2.3.2 Data Engineering ................................................................................................................................. 20
2.3.3 Data Modeling .................................................................................................................................... 21
2.3.4 Information Distilation ........................................................................................................................ 21
3. Programmeren met Python ....................................................................................................................... 22
3.1 Algemene kenmerken ................................................................................................................................. 22
3.2 Bibliotheken in Python ................................................................................................................................ 22
3.3 Python als rekenmachine ............................................................................................................................ 23
3.4 Variabelen................................................................................................................................................... 23
3.4.1 IEEE 754 .............................................................................................................................................. 24
3.5 Lijsten van gegevens ................................................................................................................................... 25
3.5.1 Python list ........................................................................................................................................... 25
3.5.2 Numpy array ....................................................................................................................................... 27
3.5.3 Pandas series ...................................................................................................................................... 28
3.6 Kwalitatieve gegevens ................................................................................................................................ 30
3.7 Data frames ................................................................................................................................................ 31
3.7.1 Aanmaken ........................................................................................................................................... 31
1
, 3.7.2 Lezen en manipuleren......................................................................................................................... 32
3.7.3 Inlezen en wegschrijven ...................................................................................................................... 34
3.7.4 CVS bestanden .................................................................................................................................... 34
3.8 Functies ....................................................................................................................................................... 36
3.8.1 Functies declareren............................................................................................................................. 36
3.8.2 Parameters.......................................................................................................................................... 37
3.8.3 Controlestructuren ............................................................................................................................. 38
3.8.4 Functies als parameter........................................................................................................................ 39
4. Data management .................................................................................................................................... 40
4.1 Bibliotheken ................................................................................................................................................ 41
4.2 Voorbeeld ................................................................................................................................................... 41
4.3 Inlezen van bestanden ................................................................................................................................ 42
4.4 Ontbrekende waarden ................................................................................................................................ 45
4.4.1 bij het inlezen...................................................................................................................................... 46
4.4.2 Bij het omzetten ................................................................................................................................. 47
4.4.3 Omgaan met ontbrekende waarden................................................................................................... 48
4.5 Verkeerde waarden wijzigen ...................................................................................................................... 49
4.5.1 Vervangen ........................................................................................................................................... 49
4.5.2 Delen vervangen ................................................................................................................................. 49
4.6 Datatypes.................................................................................................................................................... 50
4.7 Datums en tijdstippen ................................................................................................................................. 52
4.8 Groeperen van data .................................................................................................................................... 53
4.9 Samenvoegen van data .............................................................................................................................. 54
4.10 Samenvatting............................................................................................................................................ 56
5. Frequenties ............................................................................................................................................... 56
5.1 Bibliotheken ................................................................................................................................................ 56
5.2 De ruwe gegevens....................................................................................................................................... 56
5.3 Absolute frequenties ................................................................................................................................... 58
5.4 Klassen ........................................................................................................................................................ 59
5.5 Relatieve frequenties .................................................................................................................................. 61
5.6 Cumulatieve frequenties ............................................................................................................................. 62
5.7 Cumulatieve percentages ........................................................................................................................... 63
5.8 Grafieken .................................................................................................................................................... 64
5.8.1 Tekenen met matplotlib ..................................................................................................................... 64
5.8.2 Taartdiagrammen ............................................................................................................................... 66
5.8.3 Staafdiagrammen en histogrammen .................................................................................................. 66
5.8.4 Andere voorstellingen......................................................................................................................... 69
2
, 6. Centrum- en sprijdingsmaten .................................................................................................................... 70
6.1 Bibliotheken ................................................................................................................................................ 70
6.2 De ruwe gegevens....................................................................................................................................... 70
6.3 Centrummaten ............................................................................................................................................ 71
6.3.1 Modus ................................................................................................................................................. 71
6.3.2 Mediaan .............................................................................................................................................. 72
6.3.3 Rekenkundig gemiddelde ................................................................................................................... 74
6.3.4 ............................................................................................................................................................. 76
6.4 Spreidingsmaten ......................................................................................................................................... 78
6.4.1 Bereik .................................................................................................................................................. 79
6.4.2 Interkwartielafstand ........................................................................................................................... 79
6.4.3 Standaardafwijking ............................................................................................................................. 81
6.5 Uitschieters ................................................................................................................................................. 83
6.6 Eigenschappen ............................................................................................................................................ 86
6.6.1 Transformaties .................................................................................................................................... 86
6.6.2 Z-scores .................................................................................................................................................... 87
6.6.3 Links en rechtse scheefheid ................................................................................................................ 88
3
P3
KDG | 2021-22
,Inhoudsopgave
1. Data Science Inleiding ................................................................................................................................. 4
1.1 Wat is data?.................................................................................................................................................. 4
1.1.2 Klassieke opdeling van data .................................................................................................................. 4
1.1.3 Andere, nieuwere opdeling van data .................................................................................................... 5
1.1.4 Big data ................................................................................................................................................. 6
1.1.5 Smart data ............................................................................................................................................ 7
1.2 Meetschalen of meetniveaus ........................................................................................................................ 7
1.2.1 De verschillende meetschalen .............................................................................................................. 8
1.2.2 Bepalen van de meetschaal .................................................................................................................. 9
1.2.3 Wat mag en wat mag niet? ................................................................................................................. 11
1.3 Discrete en continue variabelen.................................................................................................................. 13
1.4 Betrouwbaarheid en validiteit van data ..................................................................................................... 13
1.5 Hoe uit data informatie halen? ................................................................................................................... 15
1.5.1 Business Intelligence (BI) .................................................................................................................... 16
1.5.2 Data Analytics ..................................................................................................................................... 16
2 Data Science Processes .............................................................................................................................. 17
2.1 Wat is Data Science? .................................................................................................................................. 17
2.1.1 Data Science vs. Statistics vs AI vs ...................................................................................................... 17
2.1.2 Wat doet een data scientist (en wat doet hij niet)? ........................................................................... 18
2.2 De levenscyclus van een ‘Data Science’-project .......................................................................................... 18
2.3 Data Science Pipeline .................................................................................................................................. 19
2.3.1 Collecting Data .................................................................................................................................... 20
2.3.2 Data Engineering ................................................................................................................................. 20
2.3.3 Data Modeling .................................................................................................................................... 21
2.3.4 Information Distilation ........................................................................................................................ 21
3. Programmeren met Python ....................................................................................................................... 22
3.1 Algemene kenmerken ................................................................................................................................. 22
3.2 Bibliotheken in Python ................................................................................................................................ 22
3.3 Python als rekenmachine ............................................................................................................................ 23
3.4 Variabelen................................................................................................................................................... 23
3.4.1 IEEE 754 .............................................................................................................................................. 24
3.5 Lijsten van gegevens ................................................................................................................................... 25
3.5.1 Python list ........................................................................................................................................... 25
3.5.2 Numpy array ....................................................................................................................................... 27
3.5.3 Pandas series ...................................................................................................................................... 28
3.6 Kwalitatieve gegevens ................................................................................................................................ 30
3.7 Data frames ................................................................................................................................................ 31
3.7.1 Aanmaken ........................................................................................................................................... 31
1
, 3.7.2 Lezen en manipuleren......................................................................................................................... 32
3.7.3 Inlezen en wegschrijven ...................................................................................................................... 34
3.7.4 CVS bestanden .................................................................................................................................... 34
3.8 Functies ....................................................................................................................................................... 36
3.8.1 Functies declareren............................................................................................................................. 36
3.8.2 Parameters.......................................................................................................................................... 37
3.8.3 Controlestructuren ............................................................................................................................. 38
3.8.4 Functies als parameter........................................................................................................................ 39
4. Data management .................................................................................................................................... 40
4.1 Bibliotheken ................................................................................................................................................ 41
4.2 Voorbeeld ................................................................................................................................................... 41
4.3 Inlezen van bestanden ................................................................................................................................ 42
4.4 Ontbrekende waarden ................................................................................................................................ 45
4.4.1 bij het inlezen...................................................................................................................................... 46
4.4.2 Bij het omzetten ................................................................................................................................. 47
4.4.3 Omgaan met ontbrekende waarden................................................................................................... 48
4.5 Verkeerde waarden wijzigen ...................................................................................................................... 49
4.5.1 Vervangen ........................................................................................................................................... 49
4.5.2 Delen vervangen ................................................................................................................................. 49
4.6 Datatypes.................................................................................................................................................... 50
4.7 Datums en tijdstippen ................................................................................................................................. 52
4.8 Groeperen van data .................................................................................................................................... 53
4.9 Samenvoegen van data .............................................................................................................................. 54
4.10 Samenvatting............................................................................................................................................ 56
5. Frequenties ............................................................................................................................................... 56
5.1 Bibliotheken ................................................................................................................................................ 56
5.2 De ruwe gegevens....................................................................................................................................... 56
5.3 Absolute frequenties ................................................................................................................................... 58
5.4 Klassen ........................................................................................................................................................ 59
5.5 Relatieve frequenties .................................................................................................................................. 61
5.6 Cumulatieve frequenties ............................................................................................................................. 62
5.7 Cumulatieve percentages ........................................................................................................................... 63
5.8 Grafieken .................................................................................................................................................... 64
5.8.1 Tekenen met matplotlib ..................................................................................................................... 64
5.8.2 Taartdiagrammen ............................................................................................................................... 66
5.8.3 Staafdiagrammen en histogrammen .................................................................................................. 66
5.8.4 Andere voorstellingen......................................................................................................................... 69
2
, 6. Centrum- en sprijdingsmaten .................................................................................................................... 70
6.1 Bibliotheken ................................................................................................................................................ 70
6.2 De ruwe gegevens....................................................................................................................................... 70
6.3 Centrummaten ............................................................................................................................................ 71
6.3.1 Modus ................................................................................................................................................. 71
6.3.2 Mediaan .............................................................................................................................................. 72
6.3.3 Rekenkundig gemiddelde ................................................................................................................... 74
6.3.4 ............................................................................................................................................................. 76
6.4 Spreidingsmaten ......................................................................................................................................... 78
6.4.1 Bereik .................................................................................................................................................. 79
6.4.2 Interkwartielafstand ........................................................................................................................... 79
6.4.3 Standaardafwijking ............................................................................................................................. 81
6.5 Uitschieters ................................................................................................................................................. 83
6.6 Eigenschappen ............................................................................................................................................ 86
6.6.1 Transformaties .................................................................................................................................... 86
6.6.2 Z-scores .................................................................................................................................................... 87
6.6.3 Links en rechtse scheefheid ................................................................................................................ 88
3