Garantie de satisfaction à 100% Disponible immédiatement après paiement En ligne et en PDF Tu n'es attaché à rien 4.2 TrustPilot
logo-home
Notes de cours

2024 Machine Learning Notes Highlights (second part)

Note
-
Vendu
-
Pages
39
Publié le
09-02-2024
Écrit en
2023/2024

I achieved a score of 18 out of 20, the greatest distinction, in the 'Machine Learning' course in 2024. This success is attributed to the systematic study material I authored on my own. In the second part, it contains chapter 6 to 9, covering the KNN, clustering, recommendation system, ANN, text mining, etc. with a meticulously made navagation pane.

Montrer plus Lire moins











Oups ! Impossible de charger votre document. Réessayez ou contactez le support.

Infos sur le Document

Publié le
9 février 2024
Nombre de pages
39
Écrit en
2023/2024
Type
Notes de cours
Professeur(s)
David martens
Contient
Toutes les classes

Sujets

Aperçu du contenu

10.31 Lec6 Clustering & Association rules
Significant point of this Lec (SP):
• Revisit: Supervised vs. unsupervised learning
• Knn
• Clustering
• Apriori & association rules
• Recommender system


Highlight:
1 Revisit: Supervised and unsupervised model
Supervised model (=predictive data mining), means you discover patterns in training
set to predict value of target variable of items in test set (i.e. discrete target variables:
classification; continuous target variables: regression), whereas unsupervised
model(=descriptive data mining) means you discover regularities in data without
notion of target variable.


Classification, regression, and causal modeling generally are solved with supervised
methods. Similarity matching, link prediction, and data reduction could be either.
Clustering, co-occurrence grouping, and profiling generally are unsupervised. The
fundamental principles of data mining that we will present underlie all these types of
technique.
2 knn

• GOAL = find k instances that are most similar to data point
• Attention: [the importance of standardization] Numeric attributes may have
vastly different ranges, and unless they are scaled appropriately the effect of
one attribute with a wide range can swamp the effect of another with a much
smaller range.
• Number of k and weight vote:




43

,2.1 similarity measures and an example of cosine distance:




44

,Anothter example:
If two data points, (2,2) (8,8)
d=1-(2*8+2*8)/!·"(2^2+2^2)*·"!8^2+8^2""
d=1-32/32
d=0


2.2 Issues/advantages and disadvantages with knn:
¿ It’s comprehensible: justification for model and data instances
¿ Computational efficiency: Training time=0. As a “lazy learner ”, it waits until a
prediction is asked.
¿ Curse of dimensionality: KNN always takes all features into account to calculate
the similarity. Therefore: [selection of features] having too many attributes, or
many that are irrelevant to the similarity judgment, which demands for a data
scientist’s domain knowledge.
¿ Nature of attributes: 1) scaling of attributes; 2) dummy encoding




The ads and disads of KNN:




45

, Advantages
1. Simplicity and Intuitiveness: kNN is incredibly straightforward and easy to
understand, making it a good starting point for algorithm learning and
application.
2. No Training Phase: kNN is a lazy learner, meaning it doesn't learn a
discriminative function from the training data but memorizes the training
dataset instead.
3. Versatility: It can be used for both classification and regression problems.



Disadvantages
1. Scalability: kNN can be computationally expensive, especially with large
datasets, as the distance needs to be calculated between each test sample and
all training samples.
2. Curse of Dimensionality: kNN suffers significantly as the dimensionality of the
data increases because it becomes difficult to compute distances in high-
dimensional space.
3. Optimal k Value: Selecting the optimal value of k is crucial for the
performance of the algorithm, and it can be computationally intensive to
find this value.


3 Clustering


• Goal : Dividing data into clusters such that there is maximal similarity between
items within the cluster and maximal dissimilarity between items of
different clusters.




46
€6,39
Accéder à l'intégralité du document:

Garantie de satisfaction à 100%
Disponible immédiatement après paiement
En ligne et en PDF
Tu n'es attaché à rien

Faites connaissance avec le vendeur
Seller avatar
thaboty
1,0
(3)

Faites connaissance avec le vendeur

Seller avatar
thaboty Universiteit Antwerpen
Voir profil
S'abonner Vous devez être connecté afin de suivre les étudiants ou les cours
Vendu
5
Membre depuis
1 année
Nombre de followers
2
Documents
5
Dernière vente
10 mois de cela

1,0

3 revues

5
0
4
0
3
0
2
0
1
3

Récemment consulté par vous

Pourquoi les étudiants choisissent Stuvia

Créé par d'autres étudiants, vérifié par les avis

Une qualité sur laquelle compter : rédigé par des étudiants qui ont réussi et évalué par d'autres qui ont utilisé ce document.

Le document ne convient pas ? Choisis un autre document

Aucun souci ! Tu peux sélectionner directement un autre document qui correspond mieux à ce que tu cherches.

Paye comme tu veux, apprends aussitôt

Aucun abonnement, aucun engagement. Paye selon tes habitudes par carte de crédit et télécharge ton document PDF instantanément.

Student with book image

“Acheté, téléchargé et réussi. C'est aussi simple que ça.”

Alisha Student

Foire aux questions