CORRECT ANSWERS
(7.1) Which of the following word pairs are paradigmatically related?
sw sw sw sw sw sw sw sw sw
Check all that apply.
sw sw sw sw
[] car, drive
sw sw
[] computer, laptop
sw sw
[] car, vehicle
sw sw
[] computer, keyboard - ans-
sw sw sw swsw
* computer, laptop: these word pairs tend to occur in the same context. We would g
sw sw sw sw sw sw sw sw sw sw sw sw sw sw sw
enerally not obtain a meaningful sentence by subbing "laptop" and "keyboard"
sw sw sw sw sw sw sw sw sw sw
* car, vehicle
sw sw
(7.2) Which of the following are syntagmatically related? Check all that apply.
sw sw sw sw sw sw sw sw sw sw sw
* computer, keyboard
sw sw
* car, vehicle
sw sw
* computer, laptop
sw sw
* car, drive - ans-* computer, keyboard: these words tend to co-
sw sw sw swsw sw sw sw sw sw sw sw
occur in the same sentences
sw sw sw sw
* car, drive
sw sw
(7.3) Suppose the pseduo-
sw sw sw
document representations for the contexts of terms A and B in the vector space mo
sw sw sw sw sw sw sw sw sw sw sw sw sw sw
del are given as follows:
sw sw sw sw
dA = (0.3, 0.2, 0.4, 0.05, 0, 0.05)
sw sw sw sw sw sw sw
dB = (0.4, 0.1, 0.3, 0, 0.2, 0)
sw sw sw sw sw sw sw
What is the EOWC similarity score?
sw sw sw sw sw
* 0.26
sw
* 0.32
sw
* 0.40
sw
* 0.22 - ans-0.3*0.4 + 0.2*0.1 + 0.4*0.3 + 0 + 0 + 0 = 0.26
sw sw swsw sw sw sw sw sw sw sw sw sw sw sw sw
(7.4) T/ sw
F: adding IDF weighting to the EOWC similarity function will penalize common words
sw sw sw sw sw sw sw sw sw sw sw sw
in the corpus - ans-True: IDF Favors rare words and penalizes common words
sw sw sw sw swsw sw sw sw sw sw sw sw sw
, (7.5) T/F: the EOWC score is non-negative and cannot exceed 1 - ans-
sw sw sw sw sw sw sw sw sw sw sw swsw
True: the EOWC score is the probability that two randomly selected words from the
sw sw sw sw sw sw sw sw sw sw sw sw sw sw
two pseudo- sw
documents are the same. Since it is a probability, its values range between 0 and 1
sw sw sw sw sw sw sw sw sw sw sw sw sw sw sw s
w(inclusive)
(7.6) Which of the following is NOT the reason that NLP is difficult?
sw sw sw sw sw sw sw sw sw sw sw sw
* word-level ambiguity, such as the word "design" can be a noun or a verb
sw sw sw sw sw sw sw sw sw sw sw sw sw sw
* syntactic ambiguity, such as PP attachment
sw sw sw sw sw sw
* coreference resolution, such as reference of "he", "she" or "it"
sw sw sw sw sw sw sw sw sw sw
* difficulty of transforming non-text data into text data - ans-
sw sw sw sw sw sw sw sw sw swsw
* difficulty of transforming non-
sw sw sw sw
text data into text data: this is not the main issue of NLP
sw sw sw sw sw sw sw sw sw sw sw sw
(7.7) Which of the following statements about NLP is correct?
sw sw sw sw sw sw sw sw sw
* for English, parsing POS-
sw sw sw sw
tagging with 100% accuracy is possible if computing resource and time is unlimited
sw sw sw sw sw sw sw sw sw sw sw sw
* POS-tagging is considered to be deep NLP
sw sw sw sw sw sw sw
* shallow NLP tends to be robust, while deep NLP doesn't scale up well
sw sw sw sw sw sw sw sw sw sw sw sw sw
* summarization and translation are considered to be shallow NLP - ans-
sw sw sw sw sw sw sw sw sw sw swsw
* shallow NLP tends to be robust, while deep NLP doesn't scale up well: deep NLP
sw sw sw sw sw sw sw sw sw sw sw sw sw sw sw sw
are much more complicated and difficult
sw sw sw sw sw
(7.8) For sentiment analysis (opinion mining), which kind of text representation is ge
sw sw sw sw sw sw sw sw sw sw sw sw
nerally used? sw
* words, syntactic structures, entities, and relations
sw sw sw sw sw sw
* string
sw
* words and syntactic structures
sw sw sw sw
* words - ans-* words, syntactic structures, entities, and relations
sw sw swsw sw sw sw sw sw sw
(7.9) Paradigmatic related words have:
sw sw sw sw
* high co-occurrence
sw sw
* high context similarity - ans-* high context similarity
sw sw sw sw swsw sw sw sw
ex: Monday, Tuesday, Wednesday
sw sw sw
(7.10) Which of the following help to improve EOWC? Check all that apply:
sw sw sw sw sw sw sw sw sw sw sw sw
[] use sublinear transformation of term frequency
sw sw sw sw sw sw
[] reward matching rare words and discount matching frequent ones - ans-
sw sw sw sw sw sw sw sw sw sw swsw
* use sublinear transformation of term frequency: the increase of EOWC is smaller f
sw sw sw sw sw sw sw sw sw sw sw sw sw
or higher TF
sw sw
* reward matching rare words and discount matching frequent ones: this favors rare
sw sw sw sw sw sw sw sw sw sw sw sw sw
words