Resumen

volledige samenvatting Data Science & AI

Puntuación

Vendido

Páginas

Subido en

21-10-2022

Escrito en

2021/2022

volledige samenvatting Data Science & AI

Institución

Grado

Vista previa del contenido

1

Samenvatting DSAI theorie
Hoofdstuk 1: Basisbegrippen, steekproefonderzoek....................................................................................2
Hoofdstuk 1: python.....................................................................................................................................4
Hoofdstuk 2: Analyse van 1 variabele...........................................................................................................8
Hoofdstuk 2: python...................................................................................................................................12
Hoofdstuk 3.1: Kansrekening, de centrale limietstelling, statistische toetsen............................................17
Hoofdstuk 3.1: python................................................................................................................................23
Hoofdstuk 3.2: Hypothesetoetsen..............................................................................................................26
Hoofdstuk 3.2: python................................................................................................................................31
Hoofdstuk 4: Analyse van 2 kwalitatieve variabelen...................................................................................34
Hoofdstuk 4: python...................................................................................................................................39
Hoofdstuk 5: Analyse van 2 variabelen: kwalitatief vs kwantitatief............................................................42
Hoofdstuk 5: python...................................................................................................................................47
Hoofdstuk 6: Analyse van 2 kwantitatieve variabelen................................................................................51
Hoofdstuk 6: python...................................................................................................................................55
Hoofdstuk 7: Tijdserie-analyse....................................................................................................................63
Hoofdstuk 7: python...................................................................................................................................67
Random.......................................................................................................................................................86

, 2

Hoofdstuk 1: Basisbegrippen,
steekproefonderzoek
The scientific method
Wat zijn de doelstellingen van de wetenschappelijke methode?
 Gebaseerd op empirische validering zijn we geïnteresseerd in:
o Exploration
o Description
o Prediction
o Verification

The Research Process – stappen
1) Probleemstelling: wat is de onderzoeksvraag?
2) Exacte informatiebehoefte: wat moet er juist gemeten worden?
3) Research: doe onderzoek adhv simulaties, expirimenten, enquêtes, …
4) Data verwerken
5) Data analyseren
6) Conclusies

Basic concepts in Research
Variables & values
 Variable: algemene eigenschap v/h object
 Value: specifieke eigenschap v/h object

Meetniveaus / types variabelen (van elk de kenmerken & voorbeeld geven)
 Kwalitatieve meetniveaus (moet niet perse numeriek zijn, beperkt aantal waarden)
o Nominaal: categorieën
 Bv geslacht, ras, land
o Ordinaal: volgorde / ranking
 Bv hoe oud ge zijt, uw rang int leger, level van educatie, score op 10, …
 Kwantitatieve meetniveaus (numeriek, veel verschillende waarden)
o Interval: er is geen specifiek nulpunt  geen proporties
 Bv °C, °F, …  kan negatief zijn
o Ratio: er is een specifiek nulpunt  proporties
 Bv Kelvin, afstand in meter, gewicht in kg, …  kan niet negatief zijn

Relaties tussen variabelen
 Causal relationships:
o 1 variabele heeft een effect op de andere
o De Cause is de onafhankelijke variabele
o De Consequence is de afhankelijke variabele
o Niet elke relatie is causal!

, 3

Sample Testing
Sample & population
 Populatie: de verzameling van alle objecten dat je wilt analyseren (de dataset dus)
 Sample/steekproef: een deel v/d verzameling
o Onder voorwaarden zijn de resultaten v/d sample vergelijkbaar voor de populatie

Sampling method
Hoe kies je de elementen uit?

 Random sample: elk element uit de sample heeft evenveel kans om gekozen te worden
o (=Aselecte steekproef)
 Non-random sample: bepaalde elementen hebben een grotere kans om gekozen te worden
o (=selecte steekproef)
o Meestal niet representatief, omda ge zelf ewa kiest wa er in uw sample zit

Sampling errors
 Accidental sampling errors:
o Puur toeval
 Systematic sampling errors:
o Bv online survey: mensen zonder internet worden uitgezonderd
o Bv vrijwillige survey: enkel mensen die geïnteresseerd zijn doen mee
o …

Non-sampling errors
 Accidental non-sampling errors:
o Bv perongeluk het verkeerde antwoord aangeduid
 Systematic non-sampling errors:
o Bv het measuring equipment dat niet goed gekalibreerd is
o Bv de mensen liegen
o Bv gij hebt een effect op de mensen waardoor ze niet eerlijk reageren enz

, 4

Hoofdstuk 1: python
Packages
import numpy as np # "Scientific computing"
import scipy.stats as stats # Statistical tests

import pandas as pd # Data Frame
from pandas.api.types import CategoricalDtype

import matplotlib.pyplot as plt # Basic visualisation
from statsmodels.graphics.mosaicplot import mosaic # Mosaic diagram
import seaborn as sns # Advanced data visualisation
import altair as alt # Alternative visualisation system

Datasets
Dataset lezen
 titanic =
pd.read_csv('https://raw.githubusercontent.com/DataRepo2019/Data-
files/master/titanic.csv')
 soms moet ge het scheidingsteken zelf zeggen:
o android = pd.read_csv(linkblabla.csv', sep=';')
 of via google drive: (zien wa erin zit: !ls)
o from google.colab import drive
#drive koppelen om eraan te kunnen
drive.mount('/content/drive/')
ais = pd.read_csv('/content/drive/MyDrive/DSAI/data/ais.csv')
Eerste / laatste entries weergeven
 titanic.head(10)
 titanic.tail(10)
Eigenschappen dataset
 Algemene eigenschappen v/d dataset: titanic.info() OF titanic.describe()
 Aantal rijen: len(titanic)
 Aantal kolommen: len(titanic.columns)
 Aantal rijen & kolommen: titanic.shape
 Datatypes van elke kolom: titanic.dtypes
 Aantal kolommen van elk datatype: titanic.dtypes.value_counts()
 Eigenschappen van een datatype: titanic.Survived.describe()
o Of natuurlijk titanic[‘Survived’].describe()
 Unieke/verschillende values van kwalitatieve variabelen: titanic.Embarked.unique()
o Geeft de mogelijke waarden weer voor Embarked: [‘S’ ‘C’ ‘Q’ nan]
Indices
 Als er een ‘ID kolom’ is, moet je deze dan ook markeren als index
o titanic.set_index([‘PassengerId’])

Informar violación de derechos de autor

Escuela, estudio y materia

Institución: Hogeschool Gent (HoGent)
Estudio: Toegepaste Informatica
Grado: Data Science & AI

Todos documentos para esta materia (1)

Información del documento

Subido en: 21 de octubre de 2022
Número de páginas: 87
Escrito en: 2021/2022
Tipo: RESUMEN

Temas

python
steekproef
steekproefonderzoek
sample
sampletesting

$6.56

Accede al documento completo:

100% de satisfacción garantizada

Inmediatamente disponible después del pago

Tanto en línea como en PDF

No estas atado a nada

Conoce al vendedor

easyIT

4.0

(5)

Documento también disponible en un lote

Conoce al vendedor

easyIT Hogeschool Gent

Ver perfil

Seguir

Vendido

Miembro desde

5 año

Número de seguidores

Documentos

Última venta

1 mes hace

4.0

5 reseñas

Documentos populares

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

100% de satisfacción garantizada: ¿Cómo funciona?

Nuestra garantía de satisfacción le asegura que siempre encontrará un documento de estudio a tu medida. Tu rellenas un formulario y nuestro equipo de atención al cliente se encarga del resto.

Who am I buying this summary from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller easyIT. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy this summary for $6.56. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 45,681 summaries were sold in the last 30 days Founded in 2010, the go-to place to buy summaries for 16 years now