100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Lecture notes

Alle lectures content Data Mining for Businnes & Governance

Rating
-
Sold
1
Pages
51
Uploaded on
20-03-2024
Written in
2023/2024

Document containing all the information for the Data Mining for Business & Governance course. This course is given in spring 2024.

Institution
Module











Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Module

Document information

Uploaded on
March 20, 2024
Number of pages
51
Written in
2023/2024
Type
Lecture notes
Professor(s)
Dr gonzalo nápoles
Contains
All classes

Subjects

Content preview

Module 1: Introductory Concepts
Missing values:
- Sometimes, we have instances that have missing values for some features.
- It is of paramount importance to deal with this situation before building any machine
learning or data mining model.
- Missing values might result from fields that are not always applicable, incomplete
measurements, lost values.

Imputation strategies for missing values
- The simplest strategy would be to remove the feature containing missing values. This
strategy is recommended when the majority of the instances have missing values for
that feature.
o However: There are situations in which we have a few features or the feature
we want to remove is deemed relevant.
- If we have scattered missing values and few features, we might want to remove the
instances having missing values.
o However: There are situations in which we have a limited number of
instances.
- The third strategy is the most popular. It consists of replacing the missing values for a
given feature with a representative value such as the mean, the median or the mode
of that feature.
o However: We need to be aware that we are introducing noise.
- Fancier strategies include estimating the missing values with a machine learning
model trained on the non-missing information.
o Remark: More about missing values will be covered in Statistics course.


Normalization




Between 0-1

,Standardization




With boundaries


Normalization vs Standardization




Correlation (question part of exam)

,X2 association measure




Symbolic feature = categorical feature like eye color

, Encoding categorical features
Some machine learning, data mining algorithms or platforms cannot operate with categorical
features → therefore we need to encode these features as numerical quantities.

1 Label encoding
- Assigning integer numbers to each category. It only makes sense if there is an ordinal
relationship among the categories.
o For example: Weekdays, months, rating etc.
2 One-hot encoding




Class imbalance
More accurate if you predict with the blue feature because of the more frequency.
£5.40
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Get to know the seller
Seller avatar
matsvandersteen

Get to know the seller

Seller avatar
matsvandersteen Tilburg University
Follow You need to be logged in order to follow users or courses
Sold
1
Member since
1 year
Number of followers
0
Documents
1
Last sold
5 months ago

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these revision notes.

Didn't get what you expected? Choose another document

No problem! You can straightaway pick a different document that better suits what you're after.

Pay as you like, start learning straight away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and smashed it. It really can be that simple.”

Alisha Student

Frequently asked questions