100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

Summary of key topics Data Science week 1-3

Rating
-
Sold
-
Pages
15
Uploaded on
24-02-2025
Written in
2024/2025

Ideal summary to prepare for the data science exam of the first 3 weeks.

Institution
Course









Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Course

Document information

Uploaded on
February 24, 2025
Number of pages
15
Written in
2024/2025
Type
Summary

Subjects

Content preview

Overzicht begrippen week 1-3
Lecture 2

Data science pipeline




Frame problems
●​ in the real world, we need to define and frame the problems first


Collect data
●​ in the real world, you may need to collect data using sensors, crowdsourcing, mobile
apps
●​ There are also other sources for getting public datasets, such as Hugging Face,
Zenodo, Google Dataset Search, etc



Preprocess Data
●​ Filtering → can reduce a set of data based on specific criteria
○​ vb. left table can be reduced to the right table using a population threshold
○​ df[df[“population”]>500000]
●​ Aggregation → reduces a set of data to a descriptive statistic
○​ vb. left table is reduced to a single number by computing the mean value
○​ df[“population”].mean()
●​ Grouping → divides a table into groups by column values, which can be chained
with data aggregation to produce descriptive statistics for each group
○​ vb. df.groupby(“province”).sum()

, ●​ Sorting → rearranges data based on values in a column, which can be useful for
inspection
○​ vb. right table is sorted by population
○​ df.sort_values(by=[“population””])
●​ Concatenation → combines multiple datasets that have the same variables
○​ vb. two left tables can be concatenated into the right table
○​ pandas.concat([df_A, df_B])
●​ Merging and joining → method to merge multiple data tables which have an
overlapping set of instances
○​ vb. use “city” as the key to merge A and B
○​ A.merge(B, how=”inner/left/right/outer”, on=”city”)




●​ Quantization → transforms a continuous set of values (e.g. integers) into a discrete
set (e.g. categories)
○​ vb. age is quantized to age range
○​ bin = [0,20,50,200]
○​ L=[“1-20”, “21-50”, “51+”]
○​ pandas.cut(D[“age”], bin, labels=L)
●​ Scaling → transforms variables to have another distribution, which puts variables at
the same scale and makes the data work better on many models
○​ vb. Z-score scaling → represents how many standard deviations from the
mean
○​ (df-df.mean()) / df.std()
○​ vb. min-max scaling → making the value range between 0 and 1
○​ (df-df.min()) / (df.max()-df.min())
●​ Resample time series data to a different frequency using different aggregation
methods
○​ vb. resample to hourly frequency using mean
○​ df.resample(“60min”, label=”right”).mean()
●​ Rolling → to transform time series data using different aggregation methods
○​ vb. df[“new_column”] = df[“column1”].rolling(window=3).sum()
●​ Transformation → can be applied to rows or columns in a dataframe
○​ df[“wind_sine”] = np.sin(np.deg2rad(D[“wind_deg”]))
●​ Extract data → from text or match text patterns with regular expression → language
to specify search patterns
$8.59
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached


Also available in package deal

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
FloorReeuwijk Universiteit van Amsterdam
Follow You need to be logged in order to follow users or courses
Sold
19
Member since
1 year
Number of followers
0
Documents
18
Last sold
1 month ago

3.5

2 reviews

5
0
4
1
3
1
2
0
1
0

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions