100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Exam (elaborations)

Solutions Manual for Introduction to Natural Language Processing by Jacob Eisenstein

Rating
-
Sold
-
Pages
84
Grade
A+
Uploaded on
29-04-2025
Written in
2024/2025

Solutions Manual for Introduction to Natural Language Processing by Jacob Eisenstein

Institution
Introduction To Natural Language Processing
Course
Introduction to Natural Language Processing











Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Introduction to Natural Language Processing
Course
Introduction to Natural Language Processing

Document information

Uploaded on
April 29, 2025
Number of pages
84
Written in
2024/2025
Type
Exam (elaborations)
Contains
Questions & answers

Subjects

Content preview

,Exercise solutions

Linear text classification

1. Let x be a bag-of-words vector such that Vj=1 xj = 1. Verify that the multinomial
probability pmult (x; φ), as defined in Equation 2.12, is identical to the probability of
the same document under a categorical distribution, pcat (w; φ).

2. Suppose you have a single feature x, with the following conditional distribution:


 α, X = 0, Y = 0


1 − α, X = 1, Y = 0
p(x | y) = [B.23]

 1 − β, X = 0, Y = 1


β, X = 1, Y = 1.

Further suppose that the prior is uniform, Pr(Y = 0) = Pr(Y = 1) = 12 , and that
both α > 12 and β > 12 . Given a Naı̈ve Bayes classifier with accurate parameters,
what is the probability of making an error?
Answer:


ŷ(X = 0) =0 [B.24]
ŷ(X = 1) =1 [B.25]
Pr(ŷ = 0 | Y = 1) = Pr(X = 0 | Y = 1) = (1 − β) [B.26]
Pr(ŷ = 1 | Y = 0) = Pr(X = 1 | Y = 0) = (1 − α) [B.27]
1
Pr(ŷ �= y) = (1 − β + 1 − α) [B.28]
2
1
=1 − (α + β) [B.29]
2



3. Derive the maximum-likelihood estimate for the parameter µ in Naı̈ve Bayes.

619

,620 BIBLIOGRAPHY


Answer:

N

L(µ) = log pcat (y (i) ; µ) [B.30]
i=1
N

= log µy(i) [B.31]
i=1
N
� K

� �
�(µ) = log µy(i) − λ µy − 1 [B.32]
i=1 y=1
N � �
∂�(µ) � δ y (i) = y
= −λ [B.33]
∂µy i=1
µy
N
� � �
µy ∝ δ y (i) = y [B.34]
i=1




4. The classification models in the text have a vector of weights for each possible label.
While this is notationally convenient, it is overdetermined: for any linear classifier
that can be obtained with K × V weights, an equivalent classifier can be constructed
using (K − 1) × V weights.




a) Describe how to construct this classifier. Specifically, if given a set of weights
θ and a feature function f (x, y), explain how to construct alternative weights
and feature function θ � and f � (x, y), such that,



∀y, y � ∈ Y, θ · f (x, y) − θ · f (x, y � ) = θ � · f � (x, y) − θ � · f � (x, y � ). [B.35]




b) Explain how your construction justifies the well-known alternative form for
binary logistic regression, Pr(Y = 1 | x; θ) = 1+exp(−θ
1
� ·x) = σ(θ · x), where σ


is the sigmoid function.

Jacob Eisenstein. Draft of January 16, 2019.

, BIBLIOGRAPHY 621


Answer:
a) Let θK,j indicate the weight for base feature j in class K. Then θk,j�
= θk,j − θK,j , and
f (x, y) = f (x, y) for all y < K. This means that θ · f (x, K) = 0.


b) In binary classification, θ � = θ0 − θ1 .

exp (θ · f (x, 0))
Pr(Y = 0 | x; θ) = [B.36]
exp (θ · f (x, 0)) + exp (θ · f (x, 1))
1
= [B.37]
1 + exp (θ · f (x, 1) − θ · f (x, 0))
1
= . [B.38]
1 + exp (−θ � · x)




5. Suppose you have two labeled datasets D1 and D2 , with the same features and la-
bels.



• Let θ (1) be the unregularized logistic regression (LR) coefficients from training
on dataset D1 .
• Let θ (2) be the unregularized LR coefficients (same model) from training on
dataset D2 .
• Let θ ∗ be the unregularized LR coefficients from training on the combined
dataset D1 ∪ D2 .



Under these conditions, prove that for any feature j,


(1) (2)
θj∗ ≥ min(θj , θj )
(1) (2)
θj∗ ≤ max(θj , θj ).




6. Let θ̂ be the solution to an unregularized logistic regression problem, and let θ ∗ be
the solution to the same problem, with L2 regularization. Prove that ||θ ∗ ||22 ≤ ||θ̂||22 .

Under contract with MIT Press, shared under CC-BY-NC-ND license.

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
AcademiContent Aalborg University
View profile
Follow You need to be logged in order to follow users or courses
Sold
3060
Member since
6 year
Number of followers
2132
Documents
1236
Last sold
18 hours ago

4.0

386 reviews

5
203
4
84
3
38
2
17
1
44

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions