100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

ECB3ADAVE2 - Applied Data Analysis and Visualization II - Full Summary

Rating
4.8
(17)
Sold
56
Pages
49
Uploaded on
07-11-2021
Written in
2021/2022

A detailed summary of all the relevant unsupervised learning methods. Based on the book, articles, lecture slides, exercises & assignments and articles and videos I found through Google. Edit: I got told that the hyperlinks in the document don't work. Once you have bought the summary, please send me a message () and I'll send you the pdf with working hyperlinks through :)

Show more Read less
Institution
Course











Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Course

Document information

Uploaded on
November 7, 2021
File latest updated on
November 8, 2021
Number of pages
49
Written in
2021/2022
Type
Summary

Subjects

Content preview

Applied Data Analysis and Visualization II
Universiteit Utrecht – ECB3ADAVE2

Written by Lisanne Louwerse


Summary

,Table of content
WEEK 1 ............................................................................................................................................................. 3
SUPERVISED VS. UNSUPERVISED LEARNING.................................................................................................................... 3
ASSOCIATION RULE ANALYSIS ..................................................................................................................................... 3
WEEK 2 ............................................................................................................................................................. 6
WHAT IS CLUSTERING? ............................................................................................................................................. 6
K-MEANS CLUSTERING .............................................................................................................................................. 7
HIERARCHICAL CLUSTERING ..................................................................................................................................... 11
WEEK 3 ........................................................................................................................................................... 13
DIMENSION REDUCTION.......................................................................................................................................... 13
PRINCIPAL COMPONENT ANALYSIS (PCA) ................................................................................................................... 13
WEEK 4 ........................................................................................................................................................... 19
NON-NEGATIVE MATRIX FACTORIZATION (NMF) ......................................................................................................... 19
PROBABILISTIC LATENT SEMANTIC ANALYSIS (PLSA) .................................................................................................... 21
WEEK 5 ........................................................................................................................................................... 24
FACTOR ANALYSIS (FA) ........................................................................................................................................... 24
INDEPENDENT COMPONENT ANALYSIS (ICA) ............................................................................................................... 27
WEEK 6 ........................................................................................................................................................... 30
MULTIDIMENSIONAL SCALING (MDS) ....................................................................................................................... 30
WEEK 7 ........................................................................................................................................................... 33
CONTINGENCY TABLES AND CORRESPONDENCE TABLES .................................................................................................. 33
CORRESPONDENCE ANALYSIS (CA) ........................................................................................................................... 35
KEY TAKEAWAYS ............................................................................................................................................ 43
ASSOCIATION RULE ANALYSIS ................................................................................................................................... 43
CLUSTER ANALYSIS ................................................................................................................................................. 43
PRINCIPAL COMPONENT ANALYSIS ............................................................................................................................ 44
NON-NEGATIVE MATRIX FACTORIZATION ................................................................................................................... 45
PROBABILISTIC LATENT SEMANTIC ANALYSIS ............................................................................................................... 46
FACTOR ANALYSIS ................................................................................................................................................. 46
INDEPENDENT COMPONENT ANALYSIS ....................................................................................................................... 47
MULTIDIMENSIONAL SCALING.................................................................................................................................. 48
CORRESPONDENCE ANALYSIS ................................................................................................................................... 48




2

,Week 1
Key Words
▪ Supervised / unsupervised learning
▪ Antecedent and consequent
▪ Support, confidence and lift
▪ Apriori algorithm and Apriori principle

Supervised vs. unsupervised learning

▪ Supervised learning
Building a statistical model for predicting / estimating an output (y) based on one or
more inputs (x).
o Classification: predict to which category an observation belongs (qualitative
outcomes).
o Regression: predict a quantitative outcome.

▪ Unsupervised learning
Inputs (x) but no outputs (y). Try to learn structure and relationships from data, like …
… discovering associations among variable values → association rule analysis
… discovering unknown subgroups of observations → clustering
… dimension reduction → principal components analysis


Association rule analysis
Goal: to find joint values of the variables x1, …, xp that appear together most frequently in the
data base.
In the case of binary valued data, association rule analysis is called ‘market basket’ analysis.
Transactions are represented in a binary incidence matrix:
1, if the jth item is purchased as part of transaction i.
xij {
0, if the jth item is not purchased as part of transaction i.




This matrix can now be used to find association rules.
An association rule is the implication

A⇒B antecedent ⇒ consequent
In market basket analysis, it can be seen as an if-then statement:
If you buy A, there is a chance that you buy B as well.
3

, Properties of association rules
The support (or prevalence) of association rule A ⇒ B is the relative frequency of the rule.
It’s the probability of simultaneously observing A and B in a randomly selected market basket,
so Pr(A,B).
number of transactions containing A and B
supp(A ⇒ B) =
total number of transactions

Note that this is the support of an association rule. The support of just an item (set) A is defined as:

number of transactions containing A / total number of transactions.




The confidence of association rule A ⇒ B is the conditional probability of B given A, so
Pr(B|A). It is the likelihood of item B being purchased when item A is purchased.
number of transactions containing A and B
conf(A ⇒ B) =
number of transactions containing A


▪ If conf = 1 : B is always purchased when A is purchased.
▪ If conf = 0 : B is never purchases when A is purchased.


Drawback: The confidence for an association rule having a very frequent consequent (B) will
always be high, even if the antecedent (A) is not frequent. Because of this, a rule containing
two items that actually have a weak association may still have a high confidence value.
To overcome this challenge, lift is introduced.


The lift of association rule A ⇒ B calculates the conditional probability of item B given A,
while controlling for the support (frequency) of B.
number of transactions containing A and B / number of transactions containing A
lift(A ⇒ B) =
number of transactions containing B

In other words:
the rise in the probability of having B in the transaction because of the knowledge that A is present
lift(A ⇒ B) = the probability of having B in the transaction without any knowledge about the presence of A



▪ If lift = 1 A and B are independent.
▪ If lift > 1 A and B often occur together.
▪ If lift < 1 A and B are substitutes to each other. The presence of one item has a
negative effect on the presences of the other item.

Lift can be seen as the “strength” of the rule.



4
$9.62
Get access to the full document:
Purchased by 56 students

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Reviews from verified buyers

Showing 7 of 17 reviews
1 year ago

very good and detailed summary, only thing that is missing is deep learning week 8.

1 year ago

This is a very good summary of the course, but week 2 on linear algebra is missing.

1 year ago

2 year ago

2 year ago

3 year ago

3 year ago

4.8

17 reviews

5
14
4
3
3
0
2
0
1
0
Trustworthy reviews on Stuvia

All reviews are made by real Stuvia users after verified purchases.

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
lisannelouwerse Universiteit Utrecht
Follow You need to be logged in order to follow users or courses
Sold
340
Member since
8 year
Number of followers
248
Documents
0
Last sold
3 days ago
Summaries UU Economics and Business Economics

Feedback is always welcome. Send me a message if you have any comments on how I can improve my summaries. :)

4.6

71 reviews

5
51
4
16
3
3
2
0
1
1

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions