Exam (elaborations)

Stanford University STATS 231 hw1-solutions.

Rating

Sold

Pages

Grade

A+

Uploaded on

06-07-2021

Written in

2020/2021

Homework 1 solutions CS229T/STATS231 (Fall) 1. Value of labeled data (11 points) In many applications, labeled data is expensive and therefore limited, while unlabeled data is cheap and therefore abundant. For example, there are tons of images on the web, but getting labeled images is much harder. What is the statistical value of having labeled data versus unlabeled data? This problem will explore this formally using asymptotics. Specifically, suppose we have an exponential family model over a discrete latent variable h and a discrete observed variable x: pθ(h; x) = expfθ · φ(h; x) − A(θ)g; where A(θ) = log Ph;x expfθ · φ(h; x)g is the usual log-partition function. Suppose that n examples (h(1); x(1)); : : : ; (h(n); x(n)) are drawn i.i.d. from some true distribution pθ∗. Define the following two estimators: ^ θ sup = arg max θ2Rd 1 n nX i =1 log pθ(h(i); x(i)) (1) ^ θ unsup = arg max θ2Rd 1 n nX i =1 log X h pθ(h; x(i)): (2) The supervised estimator θ^sup uses the variable h(i) and maximizes the joint likelihood, while the unsupervised estimator θ^ unsup marginalizes out the latent variable h. One important caveat: our results will hold when we assume that data is actually generated from our model family and that unsupervised learning is possible. Otherwise, labeled data is worth a lot more. a. (2 points) (supervised asymptotic variance) Compute the asymptotic variance of θ^sup: that is, given that pn(θ^sup − θ∗) −! N d (0; Vsup), write an expression for Vsup that depends on expectations/variances involving φ. Solution: Using notation from class, let ‘ denote the log-likelihood and L be the expected log-likelihood. Recall that V sup = r2L(θ∗)−1 Covθ∗[r‘(z; θ∗)]r2L(θ∗)−1 = Covθ∗[r‘(z; θ∗)]−1; where the second equality follows from Bartlett’s identity since we have assumed the model is well specified. Now, letting z = (h; x), r‘(z; θ∗) = r(θ∗ · φ(h; x) − A(θ∗)) = φ(h; x) − Eθ∗[φ(h; x)]; so V sup = Covθ∗ [φ(h; x) − Eθ∗[r‘(z; θ∗)]]−1 = Covθ∗[φ(h; x)]−1 : 1b. (2 points) (unsupervised asymptotic variance) Compute the asymptotic variance of θ^unsup: that is, given that pn(θ^unsup − θ∗) −! N d (0; Vunsup), write an expression for Vunsup that depends on expectations/variances involving φ. Solution: Similar to the previous part, we have r‘(z; θ∗) = r log X h expfθ∗ · φ(h; x) − A(θ∗)g = Ph pθ∗(h; x) · (φ(h; x) − Eθ∗[φ(h; x)]) Ph pθ∗(h; x) = Eθ∗[φ(h; x) j x] − Eθ∗[φ(h; x)]: Therefore, V unsup = Covθ∗[Eθ∗ [φ(h; x) j x] − Eθ∗[φ(h; x)]]−1 = Covθ∗ [Eθ∗[φ(h; x) j x]]−1 : c. (3 points) (comparing estimators) Prove that θ^sup has lower (or equal) asymptotic variance compared to θ^unsup. That is, show that V sup Vunsup; Solution: We have, V −1 sup = Covθ∗[φ(h; x)] = Eθ∗[Covθ∗[φ(h; x) j x]] + Covθ∗[Eθ∗[φ(h; x) j x]] = Eθ∗[Covθ∗[φ(h; x) j x]] + Vunsup −1 V −1 unsup; where the last inequality follows since the covariance is positive semi-definite. Therefore, Vsup Vunsup; i.e., ^

Show more Read less

Institution

Course

Whoops! We can’t load your doc right now. Try again or contact support.

Report Copyright Violation

Written for

Institution: Stanford University
Course: STATS 231

All documents for this subject (11)

Document information

Uploaded on: July 6, 2021
Number of pages: 10
Written in: 2020/2021
Type: Exam (elaborations)
Contains: Questions & answers

Subjects

oct 10
labeled data is exp
homework 1 solutions cs229tstats231 fall 20182019 note please do not copy or distribute due date wed
11pm 1 value of labeled data 11 points in many applications

Content preview

Homework 1 solutions
CS229T/STATS231 (Fall 2018–2019)
Note: please do not copy or distribute.

Due date: Wed, Oct 10, 11pm

1. Value of labeled data (11 points)
In many applications, labeled data is expensive and therefore limited, while unlabeled data is cheap and
therefore abundant. For example, there are tons of images on the web, but getting labeled images is much
harder. What is the statistical value of having labeled data versus unlabeled data? This problem will explore
this formally using asymptotics.
Specifically, suppose we have an exponential family model over a discrete latent variable h and a discrete
observed variable x:
pθ (h, x) = exp{θ · φ(h, x) − A(θ)},
P
where A(θ) = log h,x exp{θ · φ(h, x)} is the usual log-partition function.
Suppose that n examples (h(1) , x(1) ), . . . , (h(n) , x(n) ) are drawn i.i.d. from some true distribution pθ∗ .
Define the following two estimators:
n
1X
θ̂sup = arg max log pθ (h(i) , x(i) ) (1)
θ∈Rd n i=1
n
1X X
θ̂unsup = arg max log pθ (h, x(i) ). (2)
θ∈Rd n i=1
h

The supervised estimator θ̂sup uses the variable h(i) and maximizes the joint likelihood, while the unsuper-
vised estimator θ̂unsup marginalizes out the latent variable h.
One important caveat: our results will hold when we assume that data is actually generated from our
model family and that unsupervised learning is possible. Otherwise, labeled data is worth a lot more.

a. (2 points) (supervised asymptotic variance) Compute the asymptotic variance of θ̂sup : that is,
√ d
given that n(θ̂sup − θ∗ ) −
→ N (0, Vsup ), write an expression for Vsup that depends on expectations/variances
involving φ.

Solution:
Using notation from class, let ` denote the log-likelihood and L be the expected log-likelihood. Recall that

Vsup = ∇2 L(θ∗ )−1 Covθ∗ [∇`(z, θ∗ )]∇2 L(θ∗ )−1 = Covθ∗ [∇`(z, θ∗ )]−1 ,

where the second equality follows from Bartlett’s identity since we have assumed the model is well specified.
Now, letting z = (h, x),

∇`(z, θ∗ ) = ∇(θ∗ · φ(h, x) − A(θ∗ )) = φ(h, x) − Eθ∗ [φ(h, x)],

so
−1
Vsup = Covθ∗ [φ(h, x) − Eθ∗ [∇`(z, θ∗ )]] = Covθ∗ [φ(h, x)]−1 .

1

, b. (2 points) (unsupervised asymptotic variance) Compute the asymptotic variance of θ̂unsup : that
√ d
is, given that n(θ̂unsup − θ∗ ) −
→ N (0, Vunsup ), write an expression for Vunsup that depends on expectation-
s/variances involving φ.

Solution:
Similar to the previous part, we have
X
∇`(z, θ∗ ) = ∇ log exp{θ∗ · φ(h, x) − A(θ∗ )}
h
P
· (φ(h, x) − Eθ∗ [φ(h, x)])
h pθ (h, x)P
∗
=
h pθ (h, x)
∗

= Eθ∗ [φ(h, x) | x] − Eθ∗ [φ(h, x)].

Therefore,
−1 −1
Vunsup = Covθ∗ [Eθ∗ [φ(h, x) | x] − Eθ∗ [φ(h, x)]] = Covθ∗ [Eθ∗ [φ(h, x) | x]] .

c. (3 points) (comparing estimators) Prove that θ̂sup has lower (or equal) asymptotic variance
compared to θ̂unsup . That is, show that

Vsup Vunsup ,

Solution:
We have,
−1
Vsup = Covθ∗ [φ(h, x)] = Eθ∗ [Covθ∗ [φ(h, x) | x]] + Covθ∗ [Eθ∗ [φ(h, x) | x]]
−1
= Eθ∗ [Covθ∗ [φ(h, x) | x]] + Vunsup
−1
Vunsup ,

where the last inequality follows since the covariance is positive semi-definite. Therefore, Vsup Vunsup ; i.e.,
θ̂sup has lower asymptotic variance.

d. (4 points)
Consider the exponential family
1
pθ (h, x) = exp(θhx),
Z
where h, x ∈ {0, 1} and Z = h,x∈{0,1}2 exp(θhx).1 Essentially, (h, x) is a pair of correlated biased coin
P

flips, where pθ (1, 1) = exp(θ)/Z and pθ (0, 0) = pθ (0, 1) = pθ (1, 0) = 1/Z.
1Z is often referred to as the partition function.

2

$9.49

Get access to the full document:

100% satisfaction guarantee

Immediately available after payment

Both online and in PDF

No strings attached

Get to know the seller

Themanehoppe

3.4

(48)

Also available in package deal

Get to know the seller

Themanehoppe American Intercontinental University Online

View profile

Sold

292

Member since

4 year

Number of followers

223

Documents

3485

Last sold

3 months ago

3.4

48 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller Themanehoppe. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $9.49. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 41729 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 15 years now

Stanford University STATS 231 hw1-solutions.

Written for

Document information

Subjects

Content preview

Also available in package deal

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?