Summary

Summary of key topics Data Science week 1-3

Rating

Sold

Pages

Uploaded on

24-02-2025

Written in

2024/2025

Ideal summary to prepare for the data science exam of the first 3 weeks.

Institution

Course

Whoops! We can’t load your doc right now. Try again or contact support.

Report Copyright Violation

Written for

Institution: Universiteit van Amsterdam (UvA)
Study: informatiekunde
Course: Data Science

All documents for this subject (8)

Document information

Uploaded on: February 24, 2025
Number of pages: 15
Written in: 2024/2025
Type: Summary

Subjects

deep learning
data science
neural networks
decision trees
random forest model
gradient descent
precision
recall
accuracy
f1 score

Content preview

Overzicht begrippen week 1-3
Lecture 2

Data science pipeline

Frame problems
● in the real world, we need to define and frame the problems first

Collect data
● in the real world, you may need to collect data using sensors, crowdsourcing, mobile
apps
● There are also other sources for getting public datasets, such as Hugging Face,
Zenodo, Google Dataset Search, etc

Preprocess Data
● Filtering → can reduce a set of data based on specific criteria
○ vb. left table can be reduced to the right table using a population threshold
○ df[df[“population”]>500000]
● Aggregation → reduces a set of data to a descriptive statistic
○ vb. left table is reduced to a single number by computing the mean value
○ df[“population”].mean()
● Grouping → divides a table into groups by column values, which can be chained
with data aggregation to produce descriptive statistics for each group
○ vb. df.groupby(“province”).sum()

, ● Sorting → rearranges data based on values in a column, which can be useful for
inspection
○ vb. right table is sorted by population
○ df.sort_values(by=[“population””])
● Concatenation → combines multiple datasets that have the same variables
○ vb. two left tables can be concatenated into the right table
○ pandas.concat([df_A, df_B])
● Merging and joining → method to merge multiple data tables which have an
overlapping set of instances
○ vb. use “city” as the key to merge A and B
○ A.merge(B, how=”inner/left/right/outer”, on=”city”)

● Quantization → transforms a continuous set of values (e.g. integers) into a discrete
set (e.g. categories)
○ vb. age is quantized to age range
○ bin = [0,20,50,200]
○ L=[“1-20”, “21-50”, “51+”]
○ pandas.cut(D[“age”], bin, labels=L)
● Scaling → transforms variables to have another distribution, which puts variables at
the same scale and makes the data work better on many models
○ vb. Z-score scaling → represents how many standard deviations from the
mean
○ (df-df.mean()) / df.std()
○ vb. min-max scaling → making the value range between 0 and 1
○ (df-df.min()) / (df.max()-df.min())
● Resample time series data to a different frequency using different aggregation
methods
○ vb. resample to hourly frequency using mean
○ df.resample(“60min”, label=”right”).mean()
● Rolling → to transform time series data using different aggregation methods
○ vb. df[“new_column”] = df[“column1”].rolling(window=3).sum()
● Transformation → can be applied to rows or columns in a dataframe
○ df[“wind_sine”] = np.sin(np.deg2rad(D[“wind_deg”]))
● Extract data → from text or match text patterns with regular expression → language
to specify search patterns

$8.59

Get access to the full document:

100% satisfaction guarantee

Immediately available after payment

Both online and in PDF

No strings attached

Get to know the seller

FloorReeuwijk

3.5

(2)

Also available in package deal

Get to know the seller

FloorReeuwijk Universiteit van Amsterdam

View profile

Sold

Member since

1 year

Number of followers

Documents

Last sold

1 month ago

3.5

2 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller FloorReeuwijk. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $8.59. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 48957 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 15 years now