100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Class notes

2024 Machine Learning Notes Highlights (second part)

Rating
-
Sold
-
Pages
39
Uploaded on
09-02-2024
Written in
2023/2024

I achieved a score of 18 out of 20, the greatest distinction, in the 'Machine Learning' course in 2024. This success is attributed to the systematic study material I authored on my own. In the second part, it contains chapter 6 to 9, covering the KNN, clustering, recommendation system, ANN, text mining, etc. with a meticulously made navagation pane.

Show more Read less
Institution
Course











Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Course

Document information

Uploaded on
February 9, 2024
Number of pages
39
Written in
2023/2024
Type
Class notes
Professor(s)
David martens
Contains
All classes

Subjects

Content preview

10.31 Lec6 Clustering & Association rules
Significant point of this Lec (SP):
• Revisit: Supervised vs. unsupervised learning
• Knn
• Clustering
• Apriori & association rules
• Recommender system


Highlight:
1 Revisit: Supervised and unsupervised model
Supervised model (=predictive data mining), means you discover patterns in training
set to predict value of target variable of items in test set (i.e. discrete target variables:
classification; continuous target variables: regression), whereas unsupervised
model(=descriptive data mining) means you discover regularities in data without
notion of target variable.


Classification, regression, and causal modeling generally are solved with supervised
methods. Similarity matching, link prediction, and data reduction could be either.
Clustering, co-occurrence grouping, and profiling generally are unsupervised. The
fundamental principles of data mining that we will present underlie all these types of
technique.
2 knn

• GOAL = find k instances that are most similar to data point
• Attention: [the importance of standardization] Numeric attributes may have
vastly different ranges, and unless they are scaled appropriately the effect of
one attribute with a wide range can swamp the effect of another with a much
smaller range.
• Number of k and weight vote:




43

,2.1 similarity measures and an example of cosine distance:




44

,Anothter example:
If two data points, (2,2) (8,8)
d=1-(2*8+2*8)/!·"(2^2+2^2)*·"!8^2+8^2""
d=1-32/32
d=0


2.2 Issues/advantages and disadvantages with knn:
¿ It’s comprehensible: justification for model and data instances
¿ Computational efficiency: Training time=0. As a “lazy learner ”, it waits until a
prediction is asked.
¿ Curse of dimensionality: KNN always takes all features into account to calculate
the similarity. Therefore: [selection of features] having too many attributes, or
many that are irrelevant to the similarity judgment, which demands for a data
scientist’s domain knowledge.
¿ Nature of attributes: 1) scaling of attributes; 2) dummy encoding




The ads and disads of KNN:




45

, Advantages
1. Simplicity and Intuitiveness: kNN is incredibly straightforward and easy to
understand, making it a good starting point for algorithm learning and
application.
2. No Training Phase: kNN is a lazy learner, meaning it doesn't learn a
discriminative function from the training data but memorizes the training
dataset instead.
3. Versatility: It can be used for both classification and regression problems.



Disadvantages
1. Scalability: kNN can be computationally expensive, especially with large
datasets, as the distance needs to be calculated between each test sample and
all training samples.
2. Curse of Dimensionality: kNN suffers significantly as the dimensionality of the
data increases because it becomes difficult to compute distances in high-
dimensional space.
3. Optimal k Value: Selecting the optimal value of k is crucial for the
performance of the algorithm, and it can be computationally intensive to
find this value.


3 Clustering


• Goal : Dividing data into clusters such that there is maximal similarity between
items within the cluster and maximal dissimilarity between items of
different clusters.




46
R129,73
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Get to know the seller
Seller avatar
thaboty
1,0
(3)

Get to know the seller

Seller avatar
thaboty Universiteit Antwerpen
Follow You need to be logged in order to follow users or courses
Sold
5
Member since
1 year
Number of followers
2
Documents
5
Last sold
10 months ago

1,0

3 reviews

5
0
4
0
3
0
2
0
1
3

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can immediately select a different document that better matches what you need.

Pay how you prefer, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card or EFT and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions