Deep Learning – session 6: Text
Embedding
1 Word Representation
One-hot Encoding Represents words as sparse vectors with high dimensionality
but lacks context or semantic relationships between words.
Vectors are orthogonal, no indication of similarity between
certain words. There is a need to represent words based
on their context
Word Embedding Dense vector representations of words that encode semantic
similarity. They overcome one-hot encoding's limitations by
capturing contextual information.
Representation:
if you visualize the word embedding, you’ll see the word
driver (has the same vector in same direction) and the car
is as much related as cyclist & bike, the words have a
relationship with each other.
Figure 1: visualization word embedding
Cosine similarity Defines the similarity between wards by calculating the angle,
if the angle is small, there is a relation w each other
Euclidean distance A measure of dissimilarity between the words, the larger the
distance, the smaller the similarity
2 Word embeddings
Word embedding algorithms:
Word2Vec LDA2Vec
GloVe Sense2Vec
FastText etc
Figure 2: funcdamental idea of word embedding
Embedding
1 Word Representation
One-hot Encoding Represents words as sparse vectors with high dimensionality
but lacks context or semantic relationships between words.
Vectors are orthogonal, no indication of similarity between
certain words. There is a need to represent words based
on their context
Word Embedding Dense vector representations of words that encode semantic
similarity. They overcome one-hot encoding's limitations by
capturing contextual information.
Representation:
if you visualize the word embedding, you’ll see the word
driver (has the same vector in same direction) and the car
is as much related as cyclist & bike, the words have a
relationship with each other.
Figure 1: visualization word embedding
Cosine similarity Defines the similarity between wards by calculating the angle,
if the angle is small, there is a relation w each other
Euclidean distance A measure of dissimilarity between the words, the larger the
distance, the smaller the similarity
2 Word embeddings
Word embedding algorithms:
Word2Vec LDA2Vec
GloVe Sense2Vec
FastText etc
Figure 2: funcdamental idea of word embedding