Linear Algebra and Optimization for Machine
Learning
1st Edition by Charu Aggarwal. Chapters 1 – 11
vii
,Contents
1 LinearW AlgebraW andW Optimization:W AnW Introduction 1
2 LinearW TransformationsW andW LinearW Systems 17
3 Diagonalizable W MatricesW andW Eigenvectors 35
4 OptimizationWBasics:WAWMachineWLearningWView 47
5 OptimizationW ChallengesW andW AdvancedW Solutions 57
6 LagrangianW RelaxationW andW Duality 63
7 SingularW ValueW Decomposition 71
8 MatrixW Factorization 81
9 TheW LinearW AlgebraW ofW Similarity 89
10 TheW LinearW AlgebraW ofW Graphs 95
11 OptimizationW inW ComputationalW Graphs 101
viii
,ChapterW 1
LinearWAlgebraWandWOptimization:WAnWIntroduction
1. ForW anyW twoW vectorsW xW andW y,W whichW are W eachW ofW lengthW a,W showW
thatW (i)W xW−WyW isWorthogonalWtoWx W+Wy,W andW(ii)W theWdotWproduct WofW
xW−W3yW andWx W+W3yW isW negative.
(i)WTheWfirstWisWsimply
·W − Wx W xW yW yWusingWtheWdistributiveWpropertyWof
W ·
WmatrixWmultiplication.WTheWdotWproductWofWaWvectorWwithWitselfWisW
itsWsquaredWlength.WSinceWbothWvectorsWareWofWtheWsameWlength,WitWf
ollowsWthatWtheWresultWisW0.W(ii)WInWtheWsecondWcase,WoneWcanWuseWa
WsimilarWargumentWtoWshowWthatWthe WresultWisWa 2W−W9a 2,WwhichWisW
negative.
2. ConsiderW a W situation W inW whichW youW have W threeW matricesW A,W B,W andW C,
W of W sizes W 10W×W2, W2 W×W10,Wand W 10 W×W10,W respectively.
(a) Suppose WyouWhadWtoWcompute WtheWmatrixWproduct WABC.WFrom WanWe
fficiency Wper-
Wspective,WwouldWit Wcomputationally Wmake WmoreWsense WtoWcompute W(A
B)CWorWwouldWitWmakeWmoreWsense WtoWcompute WA(BC)?
(b) IfWyouWhadWtoWcompute WtheWmatrixWproduct WCAB,WwouldWitWmakeW
moreWsense WtoWcomputeW (CA)BW orW C(AB)?
TheWmainWpointWisWtoWkeepWtheWsizeWofWtheWintermediateWmatrix
WasWsmallWasWpossibleW inWorderWtoWreduceWbothWcomputationalWan
dWspaceWrequirements.WInWtheWcaseWofWABC,WitWmakesWsenseWtoWc
omputeWBCWfirst.WInWtheWcaseWofWCABWitWmakesWsenseWtoWcomput
eWCAWfirst.WThisWtypeWofWassociativityWpropertyWisWusedWfrequentl
yWinWmachineWlearningWinWorderWtoWreduceWcomputationalWrequire
ments.
3. — W AW =
ShowW thatW ifW aW matrixW AW satisfies
ATW,W thenW allW theW diagonalWelement
sW ofW theWmatrixWare W0.
NoteWthatWAW+WATW=W0.WHowever,WthisWmatrixWalsoWcontainsWtwi
ceWtheWdiagonalWelementsWofWAWonWitsWdiagonal.WTherefore,WtheWd
iagonalWelementsWofWAWmustWbeW0.
1
, 4. ShowWthatWifWweWhaveWa WmatrixWsatisfyingWAW=
—AT W, Wthen WforWany Wcolumn Wvecto
rWx,WweWhaveW x WAxW=W0.
T
NoteW thatW theW transposeW ofW theW scalarW xTWAxW remainsW unchanged.W There
fore,W weW have
x T
WAxW=W(x WAx) W =Wx WA Wx W=W−x WAx.W Therefore,W weW haveW 2x
T T T T T
WAxW=W0.
T
2