Notas de lectura

Data Science hoorcollege aantekeningen

Puntuación

Vendido

Páginas

Subido en

09-12-2025

Escrito en

2025/2026

Data Science hoorcollege aantekeningen. Aantekeningen van alle hoorcolleges van Data Science aan de UvA voor de studie Informatiekunde.

Institución

Grado

Ups! No podemos cargar tu documento ahora. Inténtalo de nuevo o contacta con soporte.

Informar violación de derechos de autor

Escuela, estudio y materia

Institución: Universiteit van Amsterdam (UvA)
Estudio: Informatiekunde
Grado: Data Science (5072DASC6Y)

Todos documentos para esta materia (9)

Información del documento

Subido en: 9 de diciembre de 2025
Número de páginas: 47
Escrito en: 2025/2026
Tipo: Notas de lectura
Profesor(es): Y. hsu
Contiene: Todas las clases

Temas

data science
aantekeningen
hoorcolleges

Vista previa del contenido

Week 1

,Hoorcollege 1
FN = False negative, je voorspelt niks maar er is wel wat
FP = False Positive, je voorspelt wat maar er is niks
TP = True Positive, je voorspelt wat en er is wat

Hoorcollege 2
Process data
- Maak data ready voor gebruik (missing data, foute data)

Innerjoin = Tabellen samenvoegen op basis van overeenkomenden variabelen

leftjoin = Alles van tabel A + overeenkomende variabelen A en B
rightjoin = Alles van tabel B + overeenkomende variabelen A en B
outer join = Alles

Scaling = transforms variables to have another distribution, which puts variables at the
same scale and makes the data work better on many models.
- Z-score scaling
- (representing how many standard deviations from the mean)
- (D-D.mean()) / D.std()
- min-max scaling
- (making the value range between 0 and 1)
- outliers verwijderen
- (D-D.min()) / (D.max()-D.min())

Quantization = transforms a continuous set of values (e.g., integers)
into a discrete set (e.g., categories). For example, age is quantized to age range
- bin = [0,20,50,200]
- L = ["1-20","21-50","51+"]
- pandas.cut(D["age"], bin, labels=L)

resample = You can resample time series data (i.e., the data with time stamps) to a different
frequency (e.g., hourly) using different aggregation methods (e.g., mean).
- D.resample("60Min", label="right").mean()

rolling window operation = You can use the rolling window operation to transform time
series data using different aggregation methods (e.g., sum).
- D["v2"]=D["v1"].rolling(window=3).sum()

,Transformation =

Regular expression
- To extract data from text or match text patterns

-

Drop data
- .. u dont need
- pandas.drop(columns=["year"])
- pandas.drop([5, 6])

replace the missing values
- With mean, median or constant

model missing values
- 𝑦 is the variable/column that has the missing values, 𝑋 means other variables, and 𝐹
is a regression function.
- 𝑦 = 𝐹(𝑋)

MCAR (Missing Completely At Random) = Missing data is a completely random subset
(no relations) of the entire dataset

MAR (Missing at Random) = Missing data is only related to variables other than the one
having missing data

MNAR (Missing Not At Random) = Missing data is related to the variable that has the
missing data. (e.g., sensitive questions

, Classification & regression
Classification = Categorieën voorspellen (labels).
Regression = Numerieke waarden voorspellen (getallen).

Classification
- f.e. kijken of iets spam of niet is
- Veel voorbeelden nodig om het model te trainen
- Extract features (information) using human knowledge
- Door features x te gebruiken om een message als data point te plotten

Lijn bedenken bij de punten (linear classifier)
- Eerst error metric (hoe goed of slecht is de lijn)
- Sum of distances between the misclassified points and line f
- Als een punt misclassified is = aan de verkeerde kant van de lijn

-
- Probleem is dat het na error = 0 alles goed is en het algoritme op elk moment kan
stoppen dus krijg je niet steeds hetzelfde resultaat

Evaluation metrics (Is het model goed of niet?, = to compare different models)
- Kan niet door te kijken naar error want die wordt bij elk model anders gemeten

-
- What if dataset is imbalanced (Sommige classes hebben veel minder data)

$6.58

Accede al documento completo:

100% de satisfacción garantizada

Inmediatamente disponible después del pago

Tanto en línea como en PDF

No estas atado a nada

Conoce al vendedor

stanbakker2

Conoce al vendedor

stanbakker2

Ver perfil

Seguir

Vendido

Miembro desde

2 año

Número de seguidores

Documentos

Última venta

7 meses hace

0.0

0 reseñas

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

100% de satisfacción garantizada: ¿Cómo funciona?

Nuestra garantía de satisfacción le asegura que siempre encontrará un documento de estudio a tu medida. Tu rellenas un formulario y nuestro equipo de atención al cliente se encarga del resto.

Who am I buying this summary from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller stanbakker2. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy this summary for $6.58. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 45,681 summaries were sold in the last 30 days Founded in 2010, the go-to place to buy summaries for 15 years now