Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Summary

Samenvatting Data science 1 - theorie P4

Rating
-
Sold
2
Pages
41
Uploaded on
17-01-2023
Written in
2021/2022

Een samenvatting van de theorie van data science 1 uit periode 4 van toegepaste informatica aan kdg.

Institution
Course

Content preview

DATA SCIENCE
P4




KDG | 2021-22

,Inhoudsopgave

Inhoudsopgave ............................................................................................................................................... 1

1. Samenhang ................................................................................................................................................. 3

1.1 Causaliteit ..................................................................................................................................................... 3

1.2 Het voorbeeld ............................................................................................................................................... 3

1.3 De correlatiecoëfficiënt van Pearson ............................................................................................................ 4

1.4 Rangcorrelatie .............................................................................................................................................. 5
1.4.1 Spearman .............................................................................................................................................. 6
1.4.2 Kendall .................................................................................................................................................. 6

1.5 Lineaire regressie .......................................................................................................................................... 6
1.5.1 Bepalen van de lijn ................................................................................................................................ 6
1.5.2 Regressie in Python ............................................................................................................................... 7
1.5.3 Verklarende variantie ........................................................................................................................... 7

1.6 Niet-lineaire regressie ................................................................................................................................... 8

2. Forecasting ................................................................................................................................................. 9

2.1 Het voorbeeld ............................................................................................................................................... 9

2.2 Forecasting op basis van het verleden ........................................................................................................ 10
2.2.1 Naïve forecasting ................................................................................................................................ 10
2.2.2 Gemiddelde van alle vorige waarden ................................................................................................. 11
2.2.3 Voortschrijdend gemiddelde .............................................................................................................. 11
2.2.4 Lineaire combinatie ............................................................................................................................ 12

2.3 Betrouwbaarheid van een model................................................................................................................ 13

2.4 Een model voor de data maken .................................................................................................................. 14
2.4.1 Trend forecasting ................................................................................................................................ 14
2.4.2 Seasonal forecasting ........................................................................................................................... 16

3. Beslissingsbomen ...................................................................................................................................... 19

3.1 Voorbeelden................................................................................................................................................ 20
3.1.1 Ad eater .............................................................................................................................................. 20
3.1.2 The simpsons ...................................................................................................................................... 20

3.2 ID3 .............................................................................................................................................................. 20
3.2.1 Het basisalgoritme .............................................................................................................................. 21
3.2.2 Information gain ................................................................................................................................. 22
3.2.3 Beperkingen ........................................................................................................................................ 24

3.3 Andere algoritmes ...................................................................................................................................... 24
3.3.1 Het CART algoritme............................................................................................................................. 25
3.3.2 ID3 verbeterd ...................................................................................................................................... 27

4. Clustering.................................................................................................................................................. 27




1

, 4.1 Meerdmensionale ruimten ......................................................................................................................... 28

4.2 Afstanden.................................................................................................................................................... 28
4.2.1 Euclidische afstand ............................................................................................................................. 28
4.2.2 Manhattan afstand ............................................................................................................................. 29
4.2.3 Genormaliseerde afstand ................................................................................................................... 30
4.2.4 Andere afstandsmaten ....................................................................................................................... 30

4.3 Meetniveaus ............................................................................................................................................... 31
4.3.1 Ordinale gegevens .............................................................................................................................. 31
4.3.2 Nominale gegevens............................................................................................................................. 31

4.4 Clusters zoeken ........................................................................................................................................... 32
4.4.1 K-means clustering.............................................................................................................................. 32
4.4.2 Hiërarchische clustering...................................................................................................................... 35

4.5 Clusters en beslissingsbomen...................................................................................................................... 36

5. Principal component analysis .................................................................................................................... 38

5.1 Voorbeelden................................................................................................................................................ 38
5.1.1 Simpsons ............................................................................................................................................. 38
5.1.2 Cijfers herkennen ................................................................................................................................ 39

5.2 Werkwijze ................................................................................................................................................... 39




2

, 1. Samenhang

1.1 Causaliteit

Verbanden zijn een correlatie.

2 soorten:

• Positieve correlatie: de ene variabele stijgt als de andere ook stijgt
• Negatieve correlatie: de ene variabele daalt als de andere stijgt

à gaan er vaak vanuit dat er een causaal verband bestaat: we gaan er vanuit dat de ene
variabele afhankelijk is van de andere en we deze dus kunnen voorspellen adhv de andere.

!! er is niet altijd een causaal verband: correlatie kan toeval zijn of er kan een andere
connectie zijn

Bv correlatie tussen zakkenrollers en aantal verkochte ijsjes: aantal ijsjes niet oorzaak van
zakkenrollers, maar gemeenschappelijke factor: goed weer & veel mensen

1.2 Het voorbeeld

We werken met het voorbeeld van aantal LinkedIn connecties en het loon. Is er een verband
tussen deze 2?

Met een scatterplot kunnen we snel te weten komen of er mogelijk een correlatie is: Voor
iedere rij in de tabel wordt het aantal connecties gebruikt als x-coördinaat en het loon als y-
coördinaat. Iedere lijn correspondeert dan met een punt in een vlak.




(functie voor in Python)

Hier zie je dat er hoogst waarschijnlijk een verband is: hoe hoger het loon hoe meer
connecties à wel met variatie

Als er een perfect verband zou zijn zou er een rechte lijn te zien zijn.




3

Written for

Institution
Study
Course

Document information

Uploaded on
January 17, 2023
Number of pages
41
Written in
2021/2022
Type
SUMMARY

Subjects

$7.67
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF


Also available in package deal

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
compie Karel de Grote-Hogeschool
Follow You need to be logged in order to follow users or courses
Sold
29
Member since
3 year
Number of followers
6
Documents
21
Last sold
2 months ago

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Trending documents

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions