Embeddings, Graphs, RNNs, LSTMs,
Word2Vec, Masked Language Modeling,
Knowledge Distillation, t-SNE, and
Conditional Language Models (2025 Edition)
Overview:
This quiz assesses your understanding of advanced machine learning and natural language
processing concepts. Topics include:
Embeddings and Graph Embeddings: Learning vector representations for entities and
nodes, preserving similarity and structure for downstream tasks.
Recurrent Neural Networks (RNNs) and LSTMs: Sequential modeling,
vanishing/exploding gradients, and gate mechanisms (input, forget, output) to manage
long-term dependencies.
Word2Vec Models (Skip-Gram and CBOW): Learning word vectors, training
objectives, negative sampling, and hierarchical softmax.
Masked Language Modeling (MLM) and Teacher Forcing: Pre-training strategies for
improved language model performance.
Knowledge Distillation: Compressing large models into smaller ones while maintaining
performance.
Evaluation of Embeddings: Intrinsic (analogy and similarity tasks) vs. extrinsic
(downstream tasks) evaluation.
Dimensionality Reduction and Visualization: Using t-SNE to map high-dimensional
embeddings to 2D/3D space.
Conditional Language Models: Predicting sequences of tokens conditioned on previous
tokens and training strategies.
Bias Mitigation: Debiasing embeddings to reduce gender or other societal biases.
1. Which of the following defines an embedding?
A. A fixed-length input representation
B. A learned map from entities to vectors that encodes similarity
C. A linear classifier
D. A pre-trained RNN output
Answer: B
Rationale: Embeddings map entities into a vector space to capture
similarity.
, 2. Graph embeddings aim to:
A. Reduce graph size
B. Encode connected nodes as more similar vectors than unconnected nodes
C. Train MLPs more efficiently
D. Compute word probabilities
Answer: B
Rationale: Embeddings preserve graph structure for downstream tasks.
3. Skip-Gram Word2Vec predicts:
A. Center word from context
B. Context words from a center word
C. Entire sentence probability
D. Edge embeddings in graphs
Answer: B
Rationale: Skip-Gram maximizes probability of context words given the
center.
4. CBOW Word2Vec predicts:
A. Center word from context words
B. Context words from center word
C. Next word in sequence
D. Hidden state of RNN
Answer: A
Rationale: CBOW uses surrounding words to predict the center word.
5. Negative sampling in Word2Vec:
A. Reduces computation compared to full softmax
B. Increases vocabulary size
C. Ensures exact probabilities
D. Is used only in LSTMs
Answer: A
Rationale: Approximates softmax efficiently with a small set of negative
samples.
6. Vanilla RNN training challenges:
A. Overfitting
B. Vanishing and exploding gradients
C. Lack of embeddings
D. Fixed input size
Answer: B
Rationale: Multiplicative effects of weights over time steps cause gradients
to vanish or explode.
7. Input gate in LSTM:
A. Controls what information to forget