100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Exam (elaborations)

Stanford UniversitySTATS 231hw0-solutions

Rating
-
Sold
2
Pages
10
Grade
A+
Uploaded on
06-07-2021
Written in
2020/2021

Homework 0 solutions CS229T/STATS231 1. Linear algebra (0 points) a (dual norm of L1 norm) The L1 norm k · k1 of a vector v 2 Rn is defined as kvk1 = nX i =1 jvij: (1) The dual norm k · k∗ of a norm k · k is defined as kvk∗ = sup kwk≤1 (v · w): (2) Compute the dual norm of the L1 norm. (Here v · w denotes the inner product between v and w: v · w , Pn i=1 viwi) Solution: We will prove that sup kwk1≤1 (v · w) = max i2[n] vi = kvk1 (3) which implies that the dual norm of L1 norm is L1 norm. Towards proving (3), we first observe that v · w = nX i =1 viwi ≤ nX i =1 jvij · jwij (4) ≤ nX i =1 kvk1 · jwij (5) = kvk1kwk1 (6) (7) Therefore, sup kwk1≤1 (v · w) ≤ kvk1 (8) 1We argue equality can be attained: let i? be such that vi? = arg maxi jvij = kvk1, then setting w = ei? (where ei denotes the vector with 1 on the i-th coordinate and 0 elsewhere) gives (v · w) = kvk1. Thus we complete the proof of equation (3). Remarks: dual norms are useful to bound inner products: u · v ≤ kukkvk∗, which follows directly from the definition of the dual norm. This is a generalization of the Cauchy-Schwartz inequality (which is for the L2 norm). In general, the Lp norm and the Lq norm are dual when 1=p + 1=q = 1. b (trace is sum of singular values) The nuclear norm of a matrix A 2 Rn×n is defined as Pn i=1 jσi(A)j, where the σ1(A); : : : ; σn(A) are the singular values of A. Show that the nuclear norm of a symmetric positive semi-definite matrix A is equal to its trace (tr(A) = Pn i=1 Aii). (For this reason, the nuclear norm is sometimes called the trace norm.) (Hint: use the fact that tr(AB) = tr(BA).) Solution: As A is PSD, the SV D of A has the form A = USU>. Using the trace rotation trick, tr(A) = tr(USU>) = tr(U>US) = tr(IS) = X i σi(A) = X i jσi(A)j: (9) The last equality used that singular values are non-negative. c. (3 bonus points) (trace is bounded by nuclear norm) Show that the trace of a square matrix A 2 Rn×n is always less than or equal to its nuclear norm. Solution: Suppose the SVD decomposition of A is A = UΣV >. Using the trace rotation trick, tr(A) = tr(V >UΣ) (10) Let R = V >U. Let Ui and Vi denote the i-th column of U and V respectively. Since Ui and Vi are unit vectors by the property of SVD, we have jRiij = j hUi; Vii j ≤ 1. Therefore, tr(A) = tr(RΣ) = nX i =1 RiiΣii (because Σ is a diagonal matrix) ≤ nX i =1 jΣiij (because jRiij ≤ 1) = kAk? Remark: The equality is achieved when the left and right singular subspaces are aligned (Ui = Vi) | which is exactly the case in part (b). SVD is generally a very powerful tool to deal with various linear algebraic quantities. 22. Subgradients of loss functions (0 points) Consider the prediction problem of mapping some input x 2 Rd to output y (in regression, we have y 2 R; in classification, we have y 2 f−1; +1g). A linear predictor is governed by a weight vector w 2 Rd, and we typically wish to choose w to minimize the cumulative loss over a set of training examples. Two popular loss functions for classification and regression are defined (on a single example (x; y)) as follows: • Squared loss: ‘(w; x; y) = 12(y − w · x)2. • Hinge loss: ‘(w; x; y) = maxf1 − yw · x; 0g. Let’s study some properties of these loss functions. These will be used throughout the entire class, so it’s important to obtain a good intuition for them. a (convexity of loss functions) Show that each of the two loss functions is convex. Hint: whenever possible, use the compositional properties of convexity (i.e., sum of two convex functions is convex, etc.). Solution: (Squared loss). First, the function g(t) = 12t2 is convex since g00(t) = 1 ≥ 0. Recall the following composition property of convexity: if a function g(t) is convex, so is the function f(w) = g(a>w + b). In other words, composition of a convex function with an affince mapping is still convex. Consequently, ‘(w; x; y) = g(y−w·x) is convex. (Hinge loss). First, the function g(t) = max(t; 0) is convex, as it is the supremum of two linear functions (i.e. g1(t) = t and g2(t) = 0. ) Since ‘(w; x; y) = g(1 − yw · x) is an affine mapping of g(t), we can conclude that ‘(w; x; h) is convex. b (subgradients of loss functions) Compute the subgradient of each of the two loss functions with respect to w. Recall that the subgradient of a convex function f(w) at a point w, denoted @f(w), is the set

Show more Read less
Institution
Course









Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Course

Document information

Uploaded on
July 6, 2021
Number of pages
10
Written in
2020/2021
Type
Exam (elaborations)
Contains
Questions & answers

Subjects

Content preview

Homework 0 solutions
CS229T/STATS231 (Fall 2018–2019)
Note: please do not copy or distribute.

Due date: 10/03/2018, 11pm

This is a diagnostic homework and will not count towards your grade (but the bonus points do count).
It should give you an idea of the types of concepts and skills required for the course, and also give you an
opportunity to practice some things in case you’re rusty. It also will allow you to see how we grade.




1. Linear algebra (0 points)



a (dual norm of L1 norm) The L1 norm k · k1 of a vector v ∈ Rn is defined as
n
X
kvk1 = |vi |. (1)
i=1

The dual norm k · k∗ of a norm k · k is defined as

kvk∗ = sup (v · w). (2)
kwk≤1


Compute the dual norm of the L1 norm. (Here v · w denotes the inner product between v and w: v · w ,
P n
i=1 vi wi )


Solution:
We will prove that

sup (v · w) = max vi = kvk∞ (3)
kwk1 ≤1 i∈[n]


which implies that the dual norm of L1 norm is L∞ norm.
Towards proving (3), we first observe that
n
X n
X
v·w = vi w i ≤ |vi | · |wi | (4)
i=1 i=1
Xn
≤ kvk∞ · |wi | (5)
i=1
= kvk∞ kwk1 (6)
(7)

Therefore,

sup (v · w) ≤ kvk∞ (8)
kwk1 ≤1




1

, We argue equality can be attained: let i? be such that vi? = arg maxi |vi | = kvk∞ , then setting w = ei?
(where ei denotes the vector with 1 on the i-th coordinate and 0 elsewhere) gives (v · w) = kvk∞ . Thus we
complete the proof of equation (3).
Remarks: dual norms are useful to bound inner products: u · v ≤ kukkvk∗ , which follows directly from the
definition of the dual norm. This is a generalization of the Cauchy-Schwartz inequality (which is for the L2
norm).
In general, the Lp norm and the Lq norm are dual when 1/p + 1/q = 1.




Pnb (trace is sum of singular values) The nuclear norm of a matrix A ∈ Rn×n is defined as
i=1 |σi (A)|, where the σ1 (A), . . . , σn (A) are the singular values of A. Show
Pn that the nuclear norm of a
symmetric positive semi-definite matrix A is equal to its trace (tr(A) = i=1 Aii ). (For this reason, the
nuclear norm is sometimes called the trace norm.) (Hint: use the fact that tr(AB) = tr(BA).)

Solution:
As A is PSD, the SV D of A has the form A = U SU > . Using the trace rotation trick,
X X
tr(A) = tr(U SU > ) = tr(U > U S) = tr(IS) = σi (A) = |σi (A)|. (9)
i i

The last equality used that singular values are non-negative.



c. (3 bonus points) (trace is bounded by nuclear norm) Show that the trace of a square matrix
A ∈ Rn×n is always less than or equal to its nuclear norm.

Solution:
Suppose the SVD decomposition of A is A = U ΣV > . Using the trace rotation trick,

tr(A) = tr(V > U Σ) (10)

Let R = V > U . Let Ui and Vi denote the i-th column of U and V respectively. Since Ui and Vi are unit
vectors by the property of SVD, we have |Rii | = | hUi , Vi i | ≤ 1. Therefore,
n
X
tr(A) = tr(RΣ) = Rii Σii (because Σ is a diagonal matrix)
i=1
n
X
≤ |Σii | (because |Rii | ≤ 1)
i=1
= kAk?

Remark: The equality is achieved when the left and right singular subspaces are aligned (Ui = Vi ) —
which is exactly the case in part (b).

SVD is generally a very powerful tool to deal with various linear algebraic quantities.




2
$9.49
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached


Also available in package deal

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
Themanehoppe American Intercontinental University Online
Follow You need to be logged in order to follow users or courses
Sold
292
Member since
4 year
Number of followers
223
Documents
3485
Last sold
3 months ago

3.4

48 reviews

5
21
4
5
3
7
2
3
1
12

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions