Question 1: What does NLP stand for in the context of artificial intelligence?
A) Natural Learning Process
B) Natural Language Processing
C) Network Language Processing
D) Numeric Language Parsing
Correct Answer: B
Explanation: NLP stands for Natural Language Processing, which is the field focused on the
interaction between computers and human language.
Question 2: Which Python library is most commonly used for natural language processing
tasks?
A) NumPy
B) Matplotlib
C) NLTK
D) Pandas
Correct Answer: C
Explanation: NLTK, the Natural Language Toolkit, is a widely used Python library for NLP
tasks.
Question 3: What is the primary purpose of tokenization in text preprocessing?
A) To remove punctuation from text
B) To break text into individual words or tokens
C) To convert text to lowercase
D) To translate text into another language
Correct Answer: B
Explanation: Tokenization splits text into individual words or tokens, making it easier to process.
Question 4: Which of the following is a common text normalization technique?
A) Image resizing
B) Lowercasing
C) Audio filtering
D) Data encryption
Correct Answer: B
Explanation: Lowercasing is a normalization technique used to standardize text for analysis.
Question 5: In NLTK, what is stemming used for?
A) To determine the language of a text
B) To reduce words to their root form
C) To detect sentiment
D) To tokenize text
Correct Answer: B
Explanation: Stemming reduces words to their base or root form, which helps in text analysis.
,Question 6: What is the purpose of lemmatization in text processing?
A) To count word frequency
B) To remove stopwords
C) To reduce words to their dictionary form
D) To encrypt text
Correct Answer: C
Explanation: Lemmatization reduces words to their canonical form using vocabulary and
morphological analysis.
Question 7: Which NLTK module is primarily used for part-of-speech tagging?
A) nltk.tokenize
B) nltk.corpus
C) nltk.tag
D) nltk.stem
Correct Answer: C
Explanation: The nltk.tag module is used to assign part-of-speech tags to words in a text.
Question 8: Which technique is used to remove common words that may not contribute
much meaning in text analysis?
A) Tokenization
B) Stopword removal
C) Stemming
D) POS tagging
Correct Answer: B
Explanation: Stopword removal eliminates common words such as "the" and "is" that usually do
not add significant meaning.
Question 9: What is the main function of Named Entity Recognition (NER) in NLP?
A) To count the number of words
B) To extract names of people, places, organizations, etc.
C) To perform sentiment analysis
D) To translate text
Correct Answer: B
Explanation: NER identifies and classifies proper nouns and entities within the text.
Question 10: Which Python library would you use to handle regular expressions for text
pattern matching?
A) re
B) json
C) csv
D) xml
Correct Answer: A
Explanation: The built-in re module in Python is used for working with regular expressions.
Question 11: What does the term “corpus” refer to in NLP?
A) A single document
,B) A collection of texts
C) A text processing algorithm
D) A visualization tool
Correct Answer: B
Explanation: A corpus is a large collection of texts used for linguistic research and analysis.
Question 12: In Python’s NLTK, what is the purpose of the 'punkt' tokenizer?
A) To remove punctuation
B) To identify sentence boundaries
C) To tag parts of speech
D) To lemmatize words
Correct Answer: B
Explanation: The 'punkt' tokenizer is designed to identify sentence boundaries for tokenization.
Question 13: Which of the following best describes text encoding?
A) Converting text into numerical representations
B) Translating text into another language
C) Compressing text data
D) Tokenizing text
Correct Answer: A
Explanation: Text encoding involves converting text into numerical representations that can be
processed by algorithms.
Question 14: What is the Bag-of-Words (BoW) model primarily used for?
A) Capturing word order in text
B) Representing text as a frequency distribution of words
C) Generating summaries
D) Parsing grammatical structure
Correct Answer: B
Explanation: The BoW model represents text by counting the occurrence of each word, ignoring
grammar and word order.
Question 15: Which method calculates the importance of a word in a document relative to
a collection of documents?
A) Word2Vec
B) TF-IDF
C) Cosine Similarity
D) LDA
Correct Answer: B
Explanation: TF-IDF (Term Frequency-Inverse Document Frequency) measures the importance
of a word in a document relative to the entire corpus.
Question 16: What are word embeddings in NLP?
A) Graphical representations of text
B) Dense vector representations of words
C) A method for tokenizing text
, D) A type of classification algorithm
Correct Answer: B
Explanation: Word embeddings are dense vector representations that capture semantic meanings
of words.
Question 17: Which algorithm is used to generate word embeddings by predicting
surrounding words?
A) TF-IDF
B) Word2Vec
C) LSTM
D) NER
Correct Answer: B
Explanation: Word2Vec uses neural networks to generate word embeddings by predicting
context words.
Question 18: What is cosine similarity used for in document comparison?
A) Measuring the distance between two vectors
B) Sorting documents alphabetically
C) Counting word frequency
D) Tokenizing text
Correct Answer: A
Explanation: Cosine similarity measures the cosine of the angle between two vectors to
determine how similar they are.
Question 19: Which clustering algorithm is most commonly used for grouping similar text
documents?
A) Linear Regression
B) K-means Clustering
C) Decision Trees
D) Naïve Bayes
Correct Answer: B
Explanation: K-means clustering is a popular algorithm for grouping similar documents based on
feature similarity.
Question 20: In text classification, what does F1-score represent?
A) The harmonic mean of precision and recall
B) The average word count per document
C) The ratio of correct to total predictions
D) The time taken for classification
Correct Answer: A
Explanation: The F1-score is the harmonic mean of precision and recall, providing a balanced
measure of a classifier’s performance.
Question 21: Which machine learning algorithm is often used for text classification tasks?
A) Support Vector Machines (SVM)
B) K-means Clustering