Linear Algebra and Optimization for Machine
Learning
1st Edition by Charu Aggarwal. Chapters 1 – 11
vii
,Contents
1 LinearW AlgebraW andW Optimization:W AnW Introduction 1
2 LinearW TransformationsW andW LinearW Systems 17
3 Diagonalizable W MatricesW andW Eigenvectors 35
4 OptimizationWBasics:WAWMachineWLearningWView 47
5 OptimizationW ChallengesW andW AdvancedW Solutions 57
6 LagrangianW RelaxationW andW Duality 63
7 SingularW ValueW Decomposition 71
8 MatrixW Factorization 81
9 TheW LinearW AlgebraW ofW Similarity 89
10 TheW LinearW AlgebraW ofW Graphs 95
11 OptimizationW inW ComputationalW Graphs 101
viii
,ChapterW 1
LinearWAlgebraWandWOptimization:WAnWIntroduction
1. ForW anyW twoW vectorsW xW andW y,W whichW areW eachW ofW lengthW a,W showW thatW (i)
W xW− Wy W isWorthogonalWtoWxW+Wy,W andW(ii)W theWdotWproduct Wof WxW−W3yW andWxW+W
3yW isW negative.
(i)WTheWfirstWisWsimply ·W −Wx · xW yW yWusingWtheWdistributiveWpropertyWofWmatrixW
W W
multiplication.WTheWdotWproductWofWaWvectorWwithWitselfWisWitsWsquaredWlen
gth.WSinceWbothWvectorsWareWofWtheWsameWlength,WitWfollowsWthatWtheWresultW
isW0.W(ii)WInWtheWsecondWcase,WoneWcanWuseWaWsimilarWargumentWtoWshowWthat
Wthe WresultWis Wa W− W9a ,W whichWisWnegative.
2 2
2. ConsiderW aW situationW inW whichW youW haveW threeW matricesW A,W B,W andW C,W ofW siz
esW 10W×W2,W2W×W10,WandW10W×W10,Wrespectively.
(a) SupposeWyouWhadWtoWcomputeWtheWmatrixWproductWABC.WFromWanWefficie
ncyWper-
Wspective,WwouldWitWcomputationallyWmakeWmoreWsenseWtoWcomputeW(AB)CWo
rWwouldWitWmakeWmoreWsenseWtoWcomputeWA(BC)?
(b) IfWyouWhadWtoWcomputeWtheWmatrixWproductWCAB,WwouldWitWmakeWmoreWse
nseWtoWcomputeW (CA)BW orW C(AB)?
TheWmainWpointWisWtoWkeepWtheWsizeWofWtheWintermediateWmatrixWasWsma
llWasWpossibleW inWorderWtoWreduceWbothWcomputationalWandWspaceWrequir
ements.WInWtheWcaseWofWABC,WitWmakesWsenseWtoWcomputeWBCWfirst.WInWth
eWcaseWofWCABWitWmakesWsenseWtoWcomputeWCAWfirst.WThisWtypeWofWassoci
ativityWpropertyWisWusedWfrequentlyWinWmachineWlearningWinWorderWtoWre
duceWcomputationalWrequirements.
3. ShowW thatW ifW aW matrixW AW satisfiesW—AW =
ATW,W thenW allW theW diagonalW elementsW of
W the WmatrixWare W0.
NoteWthatWAW+WATW=W0.WHowever,WthisWmatrixWalsoWcontainsWtwiceWtheWd
iagonalWelementsWofWAWonWitsWdiagonal.WTherefore,WtheWdiagonalWeleme
ntsWofWAWmustWbeW0.
4. ShowWthatWifWweWhaveWaWmatrixWsatisfying
— WAW=
1
, ATW,WthenWforWanyWcolumnWvectorWx,
weWhaveW x WAxW=W0.
W
T
NoteW thatW theW transposeW ofW theW scalarW xTWAxW remainsW unchanged.W Therefore,W
weW have
xTWAxW=W(xTWAx)TW =WxTWATWxW=W−xTWAx.W Therefore,W weW haveW 2xTWAxW=W0
.
2