t
bra U U M
ionby s
,Contents
1 Linear Algebra and Optimization: An Introduction
U U U U U 1
2 Linear Transformations and Linear Systems
U U U U 17
3 Diagonalizable Matrices and Eigenvectors U U U 35
4 OptimizationBasics:AMachineLearningView U U U U U 47
5 Optimization Challenges and Advanced Solutions
U U U U 57
6 Lagrangian Relaxation and Duality
U U U 63
7 Singular Value Decomposition
U U 71
8 Matrix Factorization
U 81
9 The Linear Algebra of Similarity
U U U U 89
10 The Linear Algebra of Graphs
U U U U 95
11 Optimization in Computational GraphsU U U 101
,Chapter 1 U
Linear Algebra and Optimization: An Introduction
U U U U U
1. For any two vectors x and y, which are each of length a, show that (i)
U U U U U U U U U U U U U U U
x− y is orthogonal tox+y, and(ii) the dot product of x−3y and x+3y is
U U U U U U U U U U U U U U U U U U
negative.
U
(i) The first is simp·ly−x x y y using the distributive property of matrix
U U U U U U U U U U U U U
multiplication. The d· ot product of a vector with itself is its squared length.
U U U
U
U U U U U U U U U U
Sincebothvectorsareofthesamelength,itfollowsthattheresultis0.(ii)
U U U U U U U U U U U U U U U U
In the second case, one can use a similar argument to show that the result is a2 −
U U U U U U U U U U U U U U U U
U
9a2, which is negative.
U U U U
2. Consider a situation in which you have three matrices A, B, and C, of sizes U U U U U U U U U U U U U U
10×2, 2×10,and 10×10, respectively. U U U U
(a) Suppose you had to compute the matrix product ABC. From an efficiency per- U U U U U U U U U U U U
spective, wouldit computationally makemore sensetocompute (AB)C or would
U U t U U U U U
itmakemoresensetocomputeA(BC)?
U U U U U U U
(b) If you had to compute the matrix product CAB, would it make more sense to
U U U U U U U U U U U U U U
compute (CA)B or C(AB)? U U U U
The main point is to keep the size of the intermediate matrix as small as
U U U U U U U U U U U U U U
possible in order to reduce both computational and space requirements. In
U U U U U U U U U U U
the case of ABC, it makes sense to compute BC first. In the case of CAB it
U U U U U U U U U U U U U U U U U
makes sense to compute CA first. This type of associativity property
U U U U U U U U U U U
isused frequently inmachinelearningin orderto reducecomputational
U U U U U
requirements.
U
3. Show that if a matrix A satisfies A = AT,then allthe diagonal elements
U U U U U U U U U U U U U
of thematrixare0.
U U U U
Note that A + AT = 0. However, this matrix also contains twice the
U U U U
U
U U U U U U U U
diagonal elements of A on its diagonal. Therefore, the diagonal
U U U U U U U U U U
elementsofAmustbe0.
U U U U U U
4. Show that ifwe have a matrix satisfying A= UAT,thenforanycolumn U U U U U U U U U U
vectorx,wehave x Ax=0.
U
T
U U U U
U
U U
Note that the transpose of the scalar xT Ax remains unchanged. Therefore,
U U U U U U
U
U U U
1
, we have
U
xTAx=(xTAx)T =xTATx=−xTAx. Therefore, we have 2xTAx=0.
U U
U U
U U U U U U U U U
2