100% de satisfacción garantizada Inmediatamente disponible después del pago Tanto en línea como en PDF No estas atado a nada 4,6 TrustPilot
logo-home
Resumen

ECB3ADAVE2 - Applied Data Analysis and Visualization II - Full Summary

Puntuación
4.8
(17)
Vendido
56
Páginas
49
Subido en
07-11-2021
Escrito en
2021/2022

A detailed summary of all the relevant unsupervised learning methods. Based on the book, articles, lecture slides, exercises & assignments and articles and videos I found through Google. Edit: I got told that the hyperlinks in the document don't work. Once you have bought the summary, please send me a message () and I'll send you the pdf with working hyperlinks through :)

Mostrar más Leer menos
Institución
Grado











Ups! No podemos cargar tu documento ahora. Inténtalo de nuevo o contacta con soporte.

Escuela, estudio y materia

Institución
Estudio
Grado

Información del documento

Subido en
7 de noviembre de 2021
Archivo actualizado en
8 de noviembre de 2021
Número de páginas
49
Escrito en
2021/2022
Tipo
Resumen

Temas

Vista previa del contenido

Applied Data Analysis and Visualization II
Universiteit Utrecht – ECB3ADAVE2

Written by Lisanne Louwerse


Summary

,Table of content
WEEK 1 ............................................................................................................................................................. 3
SUPERVISED VS. UNSUPERVISED LEARNING.................................................................................................................... 3
ASSOCIATION RULE ANALYSIS ..................................................................................................................................... 3
WEEK 2 ............................................................................................................................................................. 6
WHAT IS CLUSTERING? ............................................................................................................................................. 6
K-MEANS CLUSTERING .............................................................................................................................................. 7
HIERARCHICAL CLUSTERING ..................................................................................................................................... 11
WEEK 3 ........................................................................................................................................................... 13
DIMENSION REDUCTION.......................................................................................................................................... 13
PRINCIPAL COMPONENT ANALYSIS (PCA) ................................................................................................................... 13
WEEK 4 ........................................................................................................................................................... 19
NON-NEGATIVE MATRIX FACTORIZATION (NMF) ......................................................................................................... 19
PROBABILISTIC LATENT SEMANTIC ANALYSIS (PLSA) .................................................................................................... 21
WEEK 5 ........................................................................................................................................................... 24
FACTOR ANALYSIS (FA) ........................................................................................................................................... 24
INDEPENDENT COMPONENT ANALYSIS (ICA) ............................................................................................................... 27
WEEK 6 ........................................................................................................................................................... 30
MULTIDIMENSIONAL SCALING (MDS) ....................................................................................................................... 30
WEEK 7 ........................................................................................................................................................... 33
CONTINGENCY TABLES AND CORRESPONDENCE TABLES .................................................................................................. 33
CORRESPONDENCE ANALYSIS (CA) ........................................................................................................................... 35
KEY TAKEAWAYS ............................................................................................................................................ 43
ASSOCIATION RULE ANALYSIS ................................................................................................................................... 43
CLUSTER ANALYSIS ................................................................................................................................................. 43
PRINCIPAL COMPONENT ANALYSIS ............................................................................................................................ 44
NON-NEGATIVE MATRIX FACTORIZATION ................................................................................................................... 45
PROBABILISTIC LATENT SEMANTIC ANALYSIS ............................................................................................................... 46
FACTOR ANALYSIS ................................................................................................................................................. 46
INDEPENDENT COMPONENT ANALYSIS ....................................................................................................................... 47
MULTIDIMENSIONAL SCALING.................................................................................................................................. 48
CORRESPONDENCE ANALYSIS ................................................................................................................................... 48




2

,Week 1
Key Words
▪ Supervised / unsupervised learning
▪ Antecedent and consequent
▪ Support, confidence and lift
▪ Apriori algorithm and Apriori principle

Supervised vs. unsupervised learning

▪ Supervised learning
Building a statistical model for predicting / estimating an output (y) based on one or
more inputs (x).
o Classification: predict to which category an observation belongs (qualitative
outcomes).
o Regression: predict a quantitative outcome.

▪ Unsupervised learning
Inputs (x) but no outputs (y). Try to learn structure and relationships from data, like …
… discovering associations among variable values → association rule analysis
… discovering unknown subgroups of observations → clustering
… dimension reduction → principal components analysis


Association rule analysis
Goal: to find joint values of the variables x1, …, xp that appear together most frequently in the
data base.
In the case of binary valued data, association rule analysis is called ‘market basket’ analysis.
Transactions are represented in a binary incidence matrix:
1, if the jth item is purchased as part of transaction i.
xij {
0, if the jth item is not purchased as part of transaction i.




This matrix can now be used to find association rules.
An association rule is the implication

A⇒B antecedent ⇒ consequent
In market basket analysis, it can be seen as an if-then statement:
If you buy A, there is a chance that you buy B as well.
3

, Properties of association rules
The support (or prevalence) of association rule A ⇒ B is the relative frequency of the rule.
It’s the probability of simultaneously observing A and B in a randomly selected market basket,
so Pr(A,B).
number of transactions containing A and B
supp(A ⇒ B) =
total number of transactions

Note that this is the support of an association rule. The support of just an item (set) A is defined as:

number of transactions containing A / total number of transactions.




The confidence of association rule A ⇒ B is the conditional probability of B given A, so
Pr(B|A). It is the likelihood of item B being purchased when item A is purchased.
number of transactions containing A and B
conf(A ⇒ B) =
number of transactions containing A


▪ If conf = 1 : B is always purchased when A is purchased.
▪ If conf = 0 : B is never purchases when A is purchased.


Drawback: The confidence for an association rule having a very frequent consequent (B) will
always be high, even if the antecedent (A) is not frequent. Because of this, a rule containing
two items that actually have a weak association may still have a high confidence value.
To overcome this challenge, lift is introduced.


The lift of association rule A ⇒ B calculates the conditional probability of item B given A,
while controlling for the support (frequency) of B.
number of transactions containing A and B / number of transactions containing A
lift(A ⇒ B) =
number of transactions containing B

In other words:
the rise in the probability of having B in the transaction because of the knowledge that A is present
lift(A ⇒ B) = the probability of having B in the transaction without any knowledge about the presence of A



▪ If lift = 1 A and B are independent.
▪ If lift > 1 A and B often occur together.
▪ If lift < 1 A and B are substitutes to each other. The presence of one item has a
negative effect on the presences of the other item.

Lift can be seen as the “strength” of the rule.



4
$10.15
Accede al documento completo:
Comprado por 56 estudiantes

100% de satisfacción garantizada
Inmediatamente disponible después del pago
Tanto en línea como en PDF
No estas atado a nada

Reseñas de compradores verificados

Se muestran 7 de 17 comentarios
1 año hace

very good and detailed summary, only thing that is missing is deep learning week 8.

1 año hace

This is a very good summary of the course, but week 2 on linear algebra is missing.

1 año hace

2 año hace

2 año hace

3 año hace

3 año hace

4.8

17 reseñas

5
14
4
3
3
0
2
0
1
0
Reseñas confiables sobre Stuvia

Todas las reseñas las realizan usuarios reales de Stuvia después de compras verificadas.

Conoce al vendedor

Seller avatar
Los indicadores de reputación están sujetos a la cantidad de artículos vendidos por una tarifa y las reseñas que ha recibido por esos documentos. Hay tres niveles: Bronce, Plata y Oro. Cuanto mayor reputación, más podrás confiar en la calidad del trabajo del vendedor.
lisannelouwerse Universiteit Utrecht
Seguir Necesitas iniciar sesión para seguir a otros usuarios o asignaturas
Vendido
340
Miembro desde
9 año
Número de seguidores
248
Documentos
0
Última venta
1 mes hace
Summaries UU Economics and Business Economics

Feedback is always welcome. Send me a message if you have any comments on how I can improve my summaries. :)

4.6

71 reseñas

5
51
4
16
3
3
2
0
1
1

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

Student with book image

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes