Exam (elaborations)

Stanford UniversitySTATS 231hw0-solutions

Rating

Sold

Pages

Grade

A+

Uploaded on

06-07-2021

Written in

2020/2021

Homework 0 solutions CS229T/STATS231 1. Linear algebra (0 points) a (dual norm of L1 norm) The L1 norm k · k1 of a vector v 2 Rn is defined as kvk1 = nX i =1 jvij: (1) The dual norm k · k∗ of a norm k · k is defined as kvk∗ = sup kwk≤1 (v · w): (2) Compute the dual norm of the L1 norm. (Here v · w denotes the inner product between v and w: v · w , Pn i=1 viwi) Solution: We will prove that sup kwk1≤1 (v · w) = max i2[n] vi = kvk1 (3) which implies that the dual norm of L1 norm is L1 norm. Towards proving (3), we first observe that v · w = nX i =1 viwi ≤ nX i =1 jvij · jwij (4) ≤ nX i =1 kvk1 · jwij (5) = kvk1kwk1 (6) (7) Therefore, sup kwk1≤1 (v · w) ≤ kvk1 (8) 1We argue equality can be attained: let i? be such that vi? = arg maxi jvij = kvk1, then setting w = ei? (where ei denotes the vector with 1 on the i-th coordinate and 0 elsewhere) gives (v · w) = kvk1. Thus we complete the proof of equation (3). Remarks: dual norms are useful to bound inner products: u · v ≤ kukkvk∗, which follows directly from the definition of the dual norm. This is a generalization of the Cauchy-Schwartz inequality (which is for the L2 norm). In general, the Lp norm and the Lq norm are dual when 1=p + 1=q = 1. b (trace is sum of singular values) The nuclear norm of a matrix A 2 Rn×n is defined as Pn i=1 jσi(A)j, where the σ1(A); : : : ; σn(A) are the singular values of A. Show that the nuclear norm of a symmetric positive semi-definite matrix A is equal to its trace (tr(A) = Pn i=1 Aii). (For this reason, the nuclear norm is sometimes called the trace norm.) (Hint: use the fact that tr(AB) = tr(BA).) Solution: As A is PSD, the SV D of A has the form A = USU>. Using the trace rotation trick, tr(A) = tr(USU>) = tr(U>US) = tr(IS) = X i σi(A) = X i jσi(A)j: (9) The last equality used that singular values are non-negative. c. (3 bonus points) (trace is bounded by nuclear norm) Show that the trace of a square matrix A 2 Rn×n is always less than or equal to its nuclear norm. Solution: Suppose the SVD decomposition of A is A = UΣV >. Using the trace rotation trick, tr(A) = tr(V >UΣ) (10) Let R = V >U. Let Ui and Vi denote the i-th column of U and V respectively. Since Ui and Vi are unit vectors by the property of SVD, we have jRiij = j hUi; Vii j ≤ 1. Therefore, tr(A) = tr(RΣ) = nX i =1 RiiΣii (because Σ is a diagonal matrix) ≤ nX i =1 jΣiij (because jRiij ≤ 1) = kAk? Remark: The equality is achieved when the left and right singular subspaces are aligned (Ui = Vi) | which is exactly the case in part (b). SVD is generally a very powerful tool to deal with various linear algebraic quantities. 22. Subgradients of loss functions (0 points) Consider the prediction problem of mapping some input x 2 Rd to output y (in regression, we have y 2 R; in classification, we have y 2 f−1; +1g). A linear predictor is governed by a weight vector w 2 Rd, and we typically wish to choose w to minimize the cumulative loss over a set of training examples. Two popular loss functions for classification and regression are defined (on a single example (x; y)) as follows: • Squared loss: ‘(w; x; y) = 12(y − w · x)2. • Hinge loss: ‘(w; x; y) = maxf1 − yw · x; 0g. Let’s study some properties of these loss functions. These will be used throughout the entire class, so it’s important to obtain a good intuition for them. a (convexity of loss functions) Show that each of the two loss functions is convex. Hint: whenever possible, use the compositional properties of convexity (i.e., sum of two convex functions is convex, etc.). Solution: (Squared loss). First, the function g(t) = 12t2 is convex since g00(t) = 1 ≥ 0. Recall the following composition property of convexity: if a function g(t) is convex, so is the function f(w) = g(a>w + b). In other words, composition of a convex function with an affince mapping is still convex. Consequently, ‘(w; x; y) = g(y−w·x) is convex. (Hinge loss). First, the function g(t) = max(t; 0) is convex, as it is the supremum of two linear functions (i.e. g1(t) = t and g2(t) = 0. ) Since ‘(w; x; y) = g(1 − yw · x) is an affine mapping of g(t), we can conclude that ‘(w; x; h) is convex. b (subgradients of loss functions) Compute the subgradient of each of the two loss functions with respect to w. Recall that the subgradient of a convex function f(w) at a point w, denoted @f(w), is the set

Show more Read less

Institution

Course

Whoops! We can’t load your doc right now. Try again or contact support.

Report Copyright Violation

Written for

Institution: Stanford University
Course: STATS 231

All documents for this subject (11)

Document information

Uploaded on: July 6, 2021
Number of pages: 10
Written in: 2020/2021
Type: Exam (elaborations)
Contains: Questions & answers

Subjects

homework 0 solutions cs229tstats231 1 linear algebra 0 points a dual norm of l1 norm the l1 norm k · k1 of a vector v 2 rn is defined as kvk1 nx i 1 jvij 1 the dual norm k · k∗ of a norm

Content preview

Homework 0 solutions
CS229T/STATS231 (Fall 2018–2019)
Note: please do not copy or distribute.

Due date: 10/03/2018, 11pm

This is a diagnostic homework and will not count towards your grade (but the bonus points do count).
It should give you an idea of the types of concepts and skills required for the course, and also give you an
opportunity to practice some things in case you’re rusty. It also will allow you to see how we grade.

1. Linear algebra (0 points)

a (dual norm of L1 norm) The L1 norm k · k1 of a vector v ∈ Rn is defined as
n
X
kvk1 = |vi |. (1)
i=1

The dual norm k · k∗ of a norm k · k is defined as

kvk∗ = sup (v · w). (2)
kwk≤1

Compute the dual norm of the L1 norm. (Here v · w denotes the inner product between v and w: v · w ,
P n
i=1 vi wi )

Solution:
We will prove that

sup (v · w) = max vi = kvk∞ (3)
kwk1 ≤1 i∈[n]

which implies that the dual norm of L1 norm is L∞ norm.
Towards proving (3), we first observe that
n
X n
X
v·w = vi w i ≤ |vi | · |wi | (4)
i=1 i=1
Xn
≤ kvk∞ · |wi | (5)
i=1
= kvk∞ kwk1 (6)
(7)

Therefore,

sup (v · w) ≤ kvk∞ (8)
kwk1 ≤1

1

, We argue equality can be attained: let i? be such that vi? = arg maxi |vi | = kvk∞ , then setting w = ei?
(where ei denotes the vector with 1 on the i-th coordinate and 0 elsewhere) gives (v · w) = kvk∞ . Thus we
complete the proof of equation (3).
Remarks: dual norms are useful to bound inner products: u · v ≤ kukkvk∗ , which follows directly from the
definition of the dual norm. This is a generalization of the Cauchy-Schwartz inequality (which is for the L2
norm).
In general, the Lp norm and the Lq norm are dual when 1/p + 1/q = 1.

Pnb (trace is sum of singular values) The nuclear norm of a matrix A ∈ Rn×n is defined as
i=1 |σi (A)|, where the σ1 (A), . . . , σn (A) are the singular values of A. Show
Pn that the nuclear norm of a
symmetric positive semi-definite matrix A is equal to its trace (tr(A) = i=1 Aii ). (For this reason, the
nuclear norm is sometimes called the trace norm.) (Hint: use the fact that tr(AB) = tr(BA).)

Solution:
As A is PSD, the SV D of A has the form A = U SU > . Using the trace rotation trick,
X X
tr(A) = tr(U SU > ) = tr(U > U S) = tr(IS) = σi (A) = |σi (A)|. (9)
i i

The last equality used that singular values are non-negative.

c. (3 bonus points) (trace is bounded by nuclear norm) Show that the trace of a square matrix
A ∈ Rn×n is always less than or equal to its nuclear norm.

Solution:
Suppose the SVD decomposition of A is A = U ΣV > . Using the trace rotation trick,

tr(A) = tr(V > U Σ) (10)

Let R = V > U . Let Ui and Vi denote the i-th column of U and V respectively. Since Ui and Vi are unit
vectors by the property of SVD, we have |Rii | = | hUi , Vi i | ≤ 1. Therefore,
n
X
tr(A) = tr(RΣ) = Rii Σii (because Σ is a diagonal matrix)
i=1
n
X
≤ |Σii | (because |Rii | ≤ 1)
i=1
= kAk?

Remark: The equality is achieved when the left and right singular subspaces are aligned (Ui = Vi ) —
which is exactly the case in part (b).

SVD is generally a very powerful tool to deal with various linear algebraic quantities.

2

$9.49

Get access to the full document:

100% satisfaction guarantee

Immediately available after payment

Both online and in PDF

No strings attached

Get to know the seller

Themanehoppe

3.4

(48)

Also available in package deal

Get to know the seller

Themanehoppe American Intercontinental University Online

View profile

Sold

292

Member since

4 year

Number of followers

223

Documents

3485

Last sold

3 months ago

3.4

48 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller Themanehoppe. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $9.49. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 42750 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 15 years now

Stanford UniversitySTATS 231hw0-solutions

Written for

Document information

Subjects

Content preview

Also available in package deal

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?