100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Class notes

Financial Services Analytics Lecture 3 Avoiding the “garbage in, garbage out” trap

Rating
-
Sold
3
Pages
56
Uploaded on
29-11-2021
Written in
2021/2022

Financial Services Analytics Lecture 3 Avoiding the “garbage in, garbage out” trap alle lesnotities!

Institution
Course











Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Course

Document information

Uploaded on
November 29, 2021
File latest updated on
December 1, 2021
Number of pages
56
Written in
2021/2022
Type
Class notes
Professor(s)
Kris boudt
Contains
Lecture 3

Subjects

Content preview

All the lectures are about data driven decision making.
Data brings in information but could also bring in garbage. When we don’t distinguish the
information from the garbage, we have the trap of garbage in, garbage out namely that
garbage data coming in leads to garbage decisions.




FSA Lecture 3 1

,You can have the best model but if your data is garbage, your results will be garbage.
You need also good models, even if you have perfect data but a garbage model you will have
garbage results.
There is one exception where we can have some garbage data (data with outliers, data with
duplicates, data which is missing) but when we have good models, models that can deal with
that garbage will still lead to reliable results even there is some garbage in the data.




FSA Lecture 3 2

,That’s the use of robust models, models that are robust to this types of problems in the data.
Either we avoid the garbage and we do data cleaning or we design the models to be robust
such that they still make reliable decisions in the presence of garbage.




➔ Importance of data cleaning (and to be efficient at doing it)


4




Data cleaning is unavoidable when handling data. Fortunately a big part of data cleaning can
be automated, that laid into routines and therefore delegated to algorithms that will do the
work. Even though we find that data scientist spend most of their time on collecting the data,
clean the data and organizing the data.




FSA Lecture 3 3

, Data cleaning, handling data is a bit like going to the doctor. First the doctor needs to diagnose
what’s going on and then propose a solution. Here we also going to diagnose the type of dirty
data: duplicates, missing values and outliers. Depending on the type of dirty data we will give
different solutions for example with duplicates we decide to remove them, missing data we
can also remove them or do imputation, imputation means that we replace the missing value
with a reasonable number and similarly for outliers, we can decide to remove them or replace
them with a reasonable value.




FSA Lecture 3 4

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
hwugent Universiteit Gent
Follow You need to be logged in order to follow users or courses
Sold
427
Member since
8 year
Number of followers
307
Documents
1
Last sold
3 months ago

Beoordelingen zijn altijd welkom.

3.8

75 reviews

5
22
4
31
3
13
2
1
1
8

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions