100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
Tentamen (uitwerkingen)

Stanford UniversitySTATS 231hw0-solutions

Beoordeling
-
Verkocht
2
Pagina's
10
Cijfer
A+
Geüpload op
06-07-2021
Geschreven in
2020/2021

Homework 0 solutions CS229T/STATS231 1. Linear algebra (0 points) a (dual norm of L1 norm) The L1 norm k · k1 of a vector v 2 Rn is defined as kvk1 = nX i =1 jvij: (1) The dual norm k · k∗ of a norm k · k is defined as kvk∗ = sup kwk≤1 (v · w): (2) Compute the dual norm of the L1 norm. (Here v · w denotes the inner product between v and w: v · w , Pn i=1 viwi) Solution: We will prove that sup kwk1≤1 (v · w) = max i2[n] vi = kvk1 (3) which implies that the dual norm of L1 norm is L1 norm. Towards proving (3), we first observe that v · w = nX i =1 viwi ≤ nX i =1 jvij · jwij (4) ≤ nX i =1 kvk1 · jwij (5) = kvk1kwk1 (6) (7) Therefore, sup kwk1≤1 (v · w) ≤ kvk1 (8) 1We argue equality can be attained: let i? be such that vi? = arg maxi jvij = kvk1, then setting w = ei? (where ei denotes the vector with 1 on the i-th coordinate and 0 elsewhere) gives (v · w) = kvk1. Thus we complete the proof of equation (3). Remarks: dual norms are useful to bound inner products: u · v ≤ kukkvk∗, which follows directly from the definition of the dual norm. This is a generalization of the Cauchy-Schwartz inequality (which is for the L2 norm). In general, the Lp norm and the Lq norm are dual when 1=p + 1=q = 1. b (trace is sum of singular values) The nuclear norm of a matrix A 2 Rn×n is defined as Pn i=1 jσi(A)j, where the σ1(A); : : : ; σn(A) are the singular values of A. Show that the nuclear norm of a symmetric positive semi-definite matrix A is equal to its trace (tr(A) = Pn i=1 Aii). (For this reason, the nuclear norm is sometimes called the trace norm.) (Hint: use the fact that tr(AB) = tr(BA).) Solution: As A is PSD, the SV D of A has the form A = USU>. Using the trace rotation trick, tr(A) = tr(USU>) = tr(U>US) = tr(IS) = X i σi(A) = X i jσi(A)j: (9) The last equality used that singular values are non-negative. c. (3 bonus points) (trace is bounded by nuclear norm) Show that the trace of a square matrix A 2 Rn×n is always less than or equal to its nuclear norm. Solution: Suppose the SVD decomposition of A is A = UΣV >. Using the trace rotation trick, tr(A) = tr(V >UΣ) (10) Let R = V >U. Let Ui and Vi denote the i-th column of U and V respectively. Since Ui and Vi are unit vectors by the property of SVD, we have jRiij = j hUi; Vii j ≤ 1. Therefore, tr(A) = tr(RΣ) = nX i =1 RiiΣii (because Σ is a diagonal matrix) ≤ nX i =1 jΣiij (because jRiij ≤ 1) = kAk? Remark: The equality is achieved when the left and right singular subspaces are aligned (Ui = Vi) | which is exactly the case in part (b). SVD is generally a very powerful tool to deal with various linear algebraic quantities. 22. Subgradients of loss functions (0 points) Consider the prediction problem of mapping some input x 2 Rd to output y (in regression, we have y 2 R; in classification, we have y 2 f−1; +1g). A linear predictor is governed by a weight vector w 2 Rd, and we typically wish to choose w to minimize the cumulative loss over a set of training examples. Two popular loss functions for classification and regression are defined (on a single example (x; y)) as follows: • Squared loss: ‘(w; x; y) = 12(y − w · x)2. • Hinge loss: ‘(w; x; y) = maxf1 − yw · x; 0g. Let’s study some properties of these loss functions. These will be used throughout the entire class, so it’s important to obtain a good intuition for them. a (convexity of loss functions) Show that each of the two loss functions is convex. Hint: whenever possible, use the compositional properties of convexity (i.e., sum of two convex functions is convex, etc.). Solution: (Squared loss). First, the function g(t) = 12t2 is convex since g00(t) = 1 ≥ 0. Recall the following composition property of convexity: if a function g(t) is convex, so is the function f(w) = g(a>w + b). In other words, composition of a convex function with an affince mapping is still convex. Consequently, ‘(w; x; y) = g(y−w·x) is convex. (Hinge loss). First, the function g(t) = max(t; 0) is convex, as it is the supremum of two linear functions (i.e. g1(t) = t and g2(t) = 0. ) Since ‘(w; x; y) = g(1 − yw · x) is an affine mapping of g(t), we can conclude that ‘(w; x; h) is convex. b (subgradients of loss functions) Compute the subgradient of each of the two loss functions with respect to w. Recall that the subgradient of a convex function f(w) at a point w, denoted @f(w), is the set

Meer zien Lees minder
Instelling
Vak









Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Geschreven voor

Instelling
Vak

Documentinformatie

Geüpload op
6 juli 2021
Aantal pagina's
10
Geschreven in
2020/2021
Type
Tentamen (uitwerkingen)
Bevat
Vragen en antwoorden

Onderwerpen

Voorbeeld van de inhoud

Homework 0 solutions
CS229T/STATS231 (Fall 2018–2019)
Note: please do not copy or distribute.

Due date: 10/03/2018, 11pm

This is a diagnostic homework and will not count towards your grade (but the bonus points do count).
It should give you an idea of the types of concepts and skills required for the course, and also give you an
opportunity to practice some things in case you’re rusty. It also will allow you to see how we grade.




1. Linear algebra (0 points)



a (dual norm of L1 norm) The L1 norm k · k1 of a vector v ∈ Rn is defined as
n
X
kvk1 = |vi |. (1)
i=1

The dual norm k · k∗ of a norm k · k is defined as

kvk∗ = sup (v · w). (2)
kwk≤1


Compute the dual norm of the L1 norm. (Here v · w denotes the inner product between v and w: v · w ,
P n
i=1 vi wi )


Solution:
We will prove that

sup (v · w) = max vi = kvk∞ (3)
kwk1 ≤1 i∈[n]


which implies that the dual norm of L1 norm is L∞ norm.
Towards proving (3), we first observe that
n
X n
X
v·w = vi w i ≤ |vi | · |wi | (4)
i=1 i=1
Xn
≤ kvk∞ · |wi | (5)
i=1
= kvk∞ kwk1 (6)
(7)

Therefore,

sup (v · w) ≤ kvk∞ (8)
kwk1 ≤1




1

, We argue equality can be attained: let i? be such that vi? = arg maxi |vi | = kvk∞ , then setting w = ei?
(where ei denotes the vector with 1 on the i-th coordinate and 0 elsewhere) gives (v · w) = kvk∞ . Thus we
complete the proof of equation (3).
Remarks: dual norms are useful to bound inner products: u · v ≤ kukkvk∗ , which follows directly from the
definition of the dual norm. This is a generalization of the Cauchy-Schwartz inequality (which is for the L2
norm).
In general, the Lp norm and the Lq norm are dual when 1/p + 1/q = 1.




Pnb (trace is sum of singular values) The nuclear norm of a matrix A ∈ Rn×n is defined as
i=1 |σi (A)|, where the σ1 (A), . . . , σn (A) are the singular values of A. Show
Pn that the nuclear norm of a
symmetric positive semi-definite matrix A is equal to its trace (tr(A) = i=1 Aii ). (For this reason, the
nuclear norm is sometimes called the trace norm.) (Hint: use the fact that tr(AB) = tr(BA).)

Solution:
As A is PSD, the SV D of A has the form A = U SU > . Using the trace rotation trick,
X X
tr(A) = tr(U SU > ) = tr(U > U S) = tr(IS) = σi (A) = |σi (A)|. (9)
i i

The last equality used that singular values are non-negative.



c. (3 bonus points) (trace is bounded by nuclear norm) Show that the trace of a square matrix
A ∈ Rn×n is always less than or equal to its nuclear norm.

Solution:
Suppose the SVD decomposition of A is A = U ΣV > . Using the trace rotation trick,

tr(A) = tr(V > U Σ) (10)

Let R = V > U . Let Ui and Vi denote the i-th column of U and V respectively. Since Ui and Vi are unit
vectors by the property of SVD, we have |Rii | = | hUi , Vi i | ≤ 1. Therefore,
n
X
tr(A) = tr(RΣ) = Rii Σii (because Σ is a diagonal matrix)
i=1
n
X
≤ |Σii | (because |Rii | ≤ 1)
i=1
= kAk?

Remark: The equality is achieved when the left and right singular subspaces are aligned (Ui = Vi ) —
which is exactly the case in part (b).

SVD is generally a very powerful tool to deal with various linear algebraic quantities.




2
€8,30
Krijg toegang tot het volledige document:

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Lees online óf als PDF
Geen vaste maandelijkse kosten


Ook beschikbaar in voordeelbundel

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
Themanehoppe American Intercontinental University Online
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
292
Lid sinds
4 jaar
Aantal volgers
223
Documenten
3485
Laatst verkocht
3 maanden geleden

3,4

48 beoordelingen

5
21
4
5
3
7
2
3
1
12

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen