100% de satisfacción garantizada Inmediatamente disponible después del pago Tanto en línea como en PDF No estas atado a nada 4,6 TrustPilot
logo-home
Examen

TEST BANK FOR Linear Algebra and Optimization for Machine Learning By Charu C. Aggarwal

Puntuación
-
Vendido
-
Páginas
114
Grado
A+
Subido en
16-11-2021
Escrito en
2021/2022

Exam (elaborations) TEST BANK FOR Linear Algebra and Optimization for Machine Learning By Charu C. Aggarwal Instructor’s Solution Manual for “Linear Algebra and Optimization for Machine Learning” Charu C. Aggarwal IBM T. J. Watson Research Center Yorktown Heights, NY March 21, 2021 Contents 1 Linear Algebra and Optimization: An Introduction 1 2 Linear Transformations and Linear Systems 17 3 Diagonalizable Matrices and Eigenvectors 35 4 Optimization Basics: A Machine Learning View 47 5 Optimization Challenges and Advanced Solutions 57 6 Lagrangian Relaxation and Duality 63 7 Singular Value Decomposition 71 8 Matrix Factorization 81 9 The Linear Algebra of Similarity 89 10 The Linear Algebra of Graphs 95 11 Optimization in Computational Graphs 101 vii Chapter 1 Linear Algebra and Optimization: An Introduction 1. For any two vectors x and y, which are each of length a, show that (i) x − y is orthogonal to x + y, and (ii) the dot product of x − 3y and x + 3y is negative. (i) The first is simply x·x−y·y using the distributive property of matrix multiplication. The dot product of a vector with itself is its squared length. Since both vectors are of the same length, it follows that the result is 0. (ii) In the second case, one can use a similar argument to show that the result is a2 − 9a2, which is negative. 2. Consider a situation in which you have three matrices A, B, and C, of sizes 10 × 2, 2 × 10, and 10 × 10, respectively. (a) Suppose you had to compute the matrix product ABC. From an efficiency perspective, would it computationally make more sense to compute (AB)C or would it make more sense to compute A(BC)? (b) If you had to compute the matrix product CAB, would it make more sense to compute (CA)B or C(AB)? The main point is to keep the size of the intermediate matrix as small as possible in order to reduce both computational and space requirements. In the case of ABC, it makes sense to compute BC first. In the case of CAB it makes sense to compute CA first. This type of associativity property is used frequently in machine learning in order to reduce computational requirements. 3. Show that if a matrix A satisfies A = −AT , then all the diagonal elements of the matrix are 0. Note that A+AT = 0. However, this matrix also contains twice the diagonal elements of A on its diagonal. Therefore, the diagonal elements of A must be 0. 4. Show that if we have a matrix satisfying A = −AT , then for any column vector x, we have xTAx = 0. Note that the transpose of the scalar xTAx remains unchanged. Therefore, we have xTAx = (xTAx)T = xTAT x = −xTAx. Therefore, we have 2xTAx = 0. 1 5. Show that if we have a matrix A, which can be written as A = DDT for some matrix D, then we have xTAx ≥ 0 for any column vector x. The scalar xTAx can be shown to be equal to ||DT x||2. 6. Show that the matrix product AB remains unchanged if we scale the ith column of A and the ith row of B by respective factors that are inverses of each other. The idea is to express the matrix multiplication as the sum of outer-products of columns of A and rows of B. AB =  k AkBk Here, Ak is the kth column of A and Bk is the kth row of B. Note that the expression on the right does not change if we multiply Ai by α and divide Bi by α. Each component of the sum remains unchanged including the ith component, where the scaling factors cancel each other out. 7. Show that any matrix product AB can be expressed in the form AΔB, where A is a matrix in which the sum of the squares of the entries in each column is 1, B is a matrix in which the sum of the squares of the entries in each row is 1, and Δ is an appropriately chosen diagonal matrix with nonnegative entries on the diagonal. After expressing the matrix product as the sum of outer-products, we can scale each vector in the outer-product to unit-norm, while pulling out a scalar multiple for the outer-product component. The matrices A and B contain these normalized vectors, whereas Δ contains these scalar multiples. In other words, consider the case, where we have the product in the following form using the kth column Ai of A and the kth row Bi of B: AB =  k AkBk One can express this matrix product in the following form: AB =  k AkBk    δkk Ak Ak Bk Bk We create a diagonal matrix Δ in which the kth diagonal entry is δkk and then create A and B as the normalized versions of A and B, respectively. 8. Discuss how a permutation matrix can be converted to the identity matrix using at most d elementary row operations of a single type. Use this fact to express A as the product of at most d elementary matrix operators. Only row interchange operations are required to convert it to the identity matrix. In particular, in the ith iteration, we interchange the ith row of A with whatever row contains the ith row of the identity matrix. A permutation matrix will always contain such a row. This matrix can be represented as the product of at most d elementary row interchange operators by treating each interchange operation as a matrix multiplication. 9. Suppose that you reorder all the columns of an invertible matrix A using some random permutation, and you know A−1 for the original matrix. Show how you can (simply) 2 compute the inverse of the reordered matrix from A−1 without having to invert the new matrix from scratch. Provide an argument in terms of elementary matrices. All the rows of A−1 are interchanged using exactly the same permutation as the columns of A are permuted. This is because if P is the permutation matrix that creates AP, then PTA−1 is the inverse of AP. However, PT performs exactly the same reordering on the rows of A as P performs on the columns of A. 10. Suppose that you have approximately factorized an n×d matrix D as D ≈ UV T, where U is an n × k matrix and V is a d × k matrix. Show how you can derive an infinite number of alternative factorizations UV T of D, which satisfy UV T = UV T . Let P be any invertible matrix of size k×k. Then, we set U = UP, and V  = V (P−1)T . It can be easily shown that UV T = UV T . 11. Either prove each of the following statements or provide a counterexample: (a) The order in which you apply two elementary row operations to a matrix does not affect the final result. (b) The order in which you apply an elementary row operation and an elementary column operation does not affect the final result. It is best to think of these problems in terms of elementary matrix operations. (a) If you start with the matrix A, then the two successive row operations corresponding to matrices E1 and E2 create the matrix E2E1A. Note that matrix multiplication is not commutative and this is not the same as E1E2A. For example, rotation matrices do not commute with scaling matrices. Scaling the first row by 2 followed by interchanging the first and second rows creates a different result than the one obtained by reversing these operations. (b) In this case, if the row and column operators are Er and Ec, the final result is ErAEc. Because of the associativity of matrix multiplication, (ErA)Ec and Er(AEc) are the same. The result follows that the order does not matter. 12. Discuss why some power of a permutation matrix is always the identity matrix. There are a finite number of permutations of a sequence. Therefore, after some number k of repeated permutations by P, the sequence will be repeated. In other words we have Pk = I. 13. Consider the matrix polynomial t i=0 aiAi. A straightforward evaluation of this polynomial will require O(t2) matrix multiplications. Discuss how you can reduce the number of multiplications to O(t) by rearranging the polynomial. The matrix polynomial can be written as a0I + A( t i=1 aiAi−1). This can be further expanded as follows: a0I + A( t i=1 aiAi−1) = a0I + A(a1I + A( t i=2 aiAi−2)) = a0I + A(a1I + A(a2I + A( t i=3 aiAi−2))) Using this type of expansion recursively, one can obtain the desired result. 3 14. Let A = [aij ] be a 2 × 2 matrix with a12 = 1, and 0s in all other entries. Show that A1/2 does not exist even after allowing complex-valued entries. Suppose that such a matrix exists. If the entries of the 2×2 matrix A1/2 listed row-wise are a, b, c, and d, then we obtain the following system of equations: a2 + bc = 0 cb + d2 = 0 ab + bd = 1 ca + dc = 0 Using the first two equations, we obtain a2 − d2 = 0, which means either a = −d or a = d. Note that a = −d is not possible because the third equation would be violated. Using a = d, we can eliminate d to obtain the following system: a2 + bc = 0 2ab = 1 ac = 0 Since ab is nonzero and ac is zero, it means that a cannot be zero, and c is zero. However, if c is zero, then the first equation implies that a is zero is well. Therefore, we reach a contradiction. 15. Parallelogram law: The parallelogram law sates that the sum of the squares of the sides of a parallelogram is equal to the sum of the squares of its diagonals. Write this law as a vector identity in terms of vectors A and B. Now use vector algebra to show why this vector identity must hold. The identity is as follows: 2||A||2 + 2||B||2 = ||A − B||2 + ||A + B||2 One can expand the right-hand side by using dot products, and then apply the distributive property to show that it is equal to the left-hand side. ||A − B||2 + ||A + B||2 = A · A − 2A · B + B · B + A · A + 2A · B + B · B After canceling out the terms involving A · B and consolidating others, we get the desired result. 16. Write the first four terms of the Taylor expansion of the following univariate functions about x = a: (i) log(x); (ii) sin(x); (iii) 1/x; (iv) exp(x). (i) log(a) + (x − a)/a − (x − a)2/(2a2) + (x − a)3/(3a3) (ii) sin(a) + (x − a)cos(a) − (x−a)2 2 sin(a) − (x−a)3 6 cos(a) (iii) 1 a − (x−a) a2 + (x−a)2 a3 − (x−a)3 a4 (iv) exp(a) + (x − a)exp(a) + (x−a)2 2 exp(a) + (x−a)3 6 exp(a) 17. Use the multivariate Taylor expansion to provide a quadratic approximation of sin(x+ y) in the vicinity of [x, y] = [0, 0]. Confirm that this approximation loses its accuracy with increasing distance from the origin. 4 The Taylor expansion is as follows: [1, 1][x, y]T + [x, y]  0 0 0 0 [x, y]T The resulting approximation is x+y. Note that if x = π/200 and y = π/200 radians, which are small angles, we have sin(x + y) = 0. ≈ π/200 + π/200. However, if we choose large values of x and y like x = y = π, then sin(x + y) = 0  π + π. 18. Consider a case where a d×k matrix P is initialized by setting all values randomly to either −1 or +1 with equal probability, and then dividing all entries by √ d. Discuss why the columns of P will be (roughly) mutually orthogonal for large values of d of the order of 106. This trick is used frequently in machine learning for rapidly generating the random projection of an n × d data matrix D as D = DP. The dot product between any pair of columns has mean of 0 and standard deviation of 1/ √ d, since it is the sum of d iid random variables from the Bernoulli distribution. For large values of d like 106, the standard deviation will be of the order of 10−3, and the distribution will be close to normal. Since most of the density of normal distributions is captured between ±3 standard deviations, it means that the cosine of the angle between each pair of columns will be between −0.003 and 0.003 with high probability. This means that the vectors are very nearly orthogonal. In particular, the pairwise angles will lie between 89.83o and 90.17o with high probability. 19. Consider the perturbed matrix A = A + B, where the value of  is small and A,B are d × d matrices. Show the following approximation: A −1  ≈ A −1 − A −1BA −1 This approximation is useful when A−1 is already known. A −1  = (A + B) −1 = [A(I + A −1B)] −1 = (I + A −1B) −1A −1 = (I − A −1B + 2(A −1B)2 − . . .)A −1 ≈ A −1 − A −1BA −1 One can verify that the product of A with (A−1−A−1BA−1) differs from the identity matrix by a term dependent on 2, which is assumed to be negligible. The approach is particularly efficient when B is very sparse, such as when it contains a small number of nonzero columns. 20. Suppose that you have a 5 × 5 matrix A, in which the rows/columns correspond to people in a social network in the order John, Mary, Jack, Tim, and Robin. The entry (i, j) corresponds to the number of times person i sent a message to person j. Define a matrix P, so that PAPT contains the same information, but with the rows/columns in the order Mary, Tim, John, Robin, and Jack. The permutation matrix is P as follows: P = ⎡ ⎢⎢⎢⎢⎣ 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 ⎤ ⎥⎥⎥⎥⎦ 5 21. Suppose that the vectors x, y and x − y have lengths 2, 3, and 4, respectively. Find the length of x + y using only vector algebra (and no Euclidean geometry). This follows from the parallelogram law discussed in an earlier exercise. The corresponding length is √ 22 + 22 + 32 + 33 − 42 = √ 10. 22. Show that the inverse of a symmetric matrix is symmetric. Suppose that A is symmetric and B is the inverse of A. Then, we have AB = BA = I. Taking the transpose, we obtain (AB)T = (BA)T = I. Using symmetry of A, this is the same as saying that BTA = ABT = I. In other words, BT is the inverse of A as well. However, since the inverse is unique, we must have BT = B. Alternatively, we can use AB = ABT = I to assert that A(B −BT ) = 0. Left multiplying by B, we get (BA)(B − BT ) = 0. Since BA = I, we have B − BT = 0. In other words, B = BT . 23. Let A1,A2, . . . Ad be d×d matrices that are strictly upper triangular. Then, the product of A1,A2, . . . Ad is the zero matrix. Let Bi be given by A1A2 . . . Ai. It can be shown inductively that Bi is also strictly triangular, but with at least i zero rows and columns. Therefore, Bd will be the zero matrix. 24. Apollonius’s identity: Let ABC be a triangle, and AD be the median from A to BC. Show the following using only vector algebra and no Euclidean geometry: AB2 + AC2 = 2(AD2 + BD2) You will get the simplest algebra by orienting your triangle properly with respect to the origin. Put the vertex of A at a and D as the origin. Then the vertices of B and C are b and c = −b. Then, the identity reduces to showing the following: ||a − b||2 + ||a + b||2 = 2(||a||2 + ||b||2) This is easy to show using dot products. In fact, the Apollonius identity reduces to the parallelogram law in this case! 25. Sine law: Express the sine of the interior angle between a and b (i.e., the angle not greater than 180 degrees) purely in terms of a · a, b · b, and a · b. You are allowed to use sin2(x) + cos2(x) = 1. Consider a triangle, two sides of which are the vectors a and b. The opposite angles to these vectors are A and B, respectively. Show the following using only vector algebra and no Euclidean geometry: a sin(A) = b sin(B) The sine of the angle is  1 −  a·b a b 2 . This is essentially obtained using sin(x) =  1 − cos2(x). Since we are only looking for interior angles, the sine is a positive quantity. Therefore, we can ignore the possibility sin(x) = −  1 − cos2(x). 6 The angle A is formed between vectors a−b and 0−b. The angle B is formed between vectors b − a and 0 − a. Therefore, we obtain the following for the first ratio: a sin(A) = a  1 −  −b·(a−b) −b a−b 2 = aba − b  a2b2 − (a · b)2 The second expression is obtained by applying the same approach to the the angle B: b sin(B) = b  1 −  −a·(b−a) −a b−a 2 = abb − a  a2b2 − (a · b)2 Note that the two expressions are the same because we have ||a − b|| = ||b − a||. This proves the result. 26. Trigonometry with vector algebra: Consider a unit vector x = [1, 0]T . The vector v1 is obtained by rotating x counter-clockwise at angle θ1, and v2 is obtained by rotating x clockwise at an angle θ2. Use the rotation matrix to obtain the coordinates of unit vectors v1 and v2. Use this setup to show the following well-known trigonometric identity: cos(θ1 + θ2) = cos(θ1)cos(θ2) − sin(θ1)sin(θ2) On using the rotation matrix, one obtains the vectors [cos(θ1), sin(θ1)] and [cos(θ2),−sin(θ2)] as follows:  cos(θ1) −sin(θ1) sin(θ1) cos(θ1)  1 0 ,  cos(−θ2) −sin(−θ2) sin(−θ2) cos(− θ2)  1 0 The value of cos(θ1 +θ2) is the cosine of the angle between the vectors, which also be algebraically obtained by using the dot product between the two rotated versions of [1, 0]T . As a result, we algebraically obtain the following dot product: cos(θ1 + θ2) = cos(θ1)cos(θ2) − sin(θ1)sin(θ2) 27. Coordinate geometry with matrix algebra: Consider the two lines y = 3x + 4 and y = 5x + 2 in the 2-dimensional plane. Find the intersection of the two lines by writing the equations in the following form for appropriately chosen A and b: A  x y = b Find the intersection coordinates (x, y) of the two lines by inverting matrix A. 7 The linear equation is as follows:  −3 1 −5 1  x y =  4 2 One can show that the inverse of the matrix is as follows: A −1 = 1 2  1 −1 5 −3 On computing A−1b, one obtains x = 1 and y = 7. 28. Use the matrix inversion lemma to invert a 10×10 matrix with 1s in each

Mostrar más Leer menos











Ups! No podemos cargar tu documento ahora. Inténtalo de nuevo o contacta con soporte.

Información del documento

Subido en
16 de noviembre de 2021
Número de páginas
114
Escrito en
2021/2022
Tipo
Examen
Contiene
Preguntas y respuestas

Temas

Vista previa del contenido

,Instructor’s Solution Manual for “Linear Algebra and
Optimization for Machine Learning”

Charu C. Aggarwal
IBM T. J. Watson Research Center
Yorktown Heights, NY

March 21, 2021

,Contents

1 Linear Algebra and Optimization: An Introduction 1

2 Linear Transformations and Linear Systems 17

3 Diagonalizable Matrices and Eigenvectors 35

4 Optimization Basics: A Machine Learning View 47

5 Optimization Challenges and Advanced Solutions 57

6 Lagrangian Relaxation and Duality 63

7 Singular Value Decomposition 71

8 Matrix Factorization 81

9 The Linear Algebra of Similarity 89

10 The Linear Algebra of Graphs 95

11 Optimization in Computational Graphs 101




vii

, Chapter 1

Linear Algebra and
Optimization: An Introduction

1. For any two vectors x and y, which are each of length a, show that (i) x − y is
orthogonal to x + y, and (ii) the dot product of x − 3y and x + 3y is negative.
(i) The first is simply x·x−y·y using the distributive property of matrix multiplication.
The dot product of a vector with itself is its squared length. Since both vectors are of
the same length, it follows that the result is 0. (ii) In the second case, one can use a
similar argument to show that the result is a2 − 9a2 , which is negative.

2. Consider a situation in which you have three matrices A, B, and C, of sizes 10 × 2,
2 × 10, and 10 × 10, respectively.

(a) Suppose you had to compute the matrix product ABC. From an efficiency per-
spective, would it computationally make more sense to compute (AB)C or would
it make more sense to compute A(BC)?
(b) If you had to compute the matrix product CAB, would it make more sense to
compute (CA)B or C(AB)?

The main point is to keep the size of the intermediate matrix as small as possible
in order to reduce both computational and space requirements. In the case of ABC,
it makes sense to compute BC first. In the case of CAB it makes sense to compute
CA first. This type of associativity property is used frequently in machine learning in
order to reduce computational requirements.

3. Show that if a matrix A satisfies A = −AT , then all the diagonal elements of the
matrix are 0.
Note that A + AT = 0. However, this matrix also contains twice the diagonal elements
of A on its diagonal. Therefore, the diagonal elements of A must be 0.

4. Show that if we have a matrix satisfying A = −AT , then for any column vector x, we
have xT Ax = 0.
Note that the transpose of the scalar xT Ax remains unchanged. Therefore, we have
xT Ax = (xT Ax)T = xT AT x = −xT Ax. Therefore, we have 2xT Ax = 0.

1
$14.49
Accede al documento completo:

100% de satisfacción garantizada
Inmediatamente disponible después del pago
Tanto en línea como en PDF
No estas atado a nada

Conoce al vendedor

Seller avatar
Los indicadores de reputación están sujetos a la cantidad de artículos vendidos por una tarifa y las reseñas que ha recibido por esos documentos. Hay tres niveles: Bronce, Plata y Oro. Cuanto mayor reputación, más podrás confiar en la calidad del trabajo del vendedor.
Expert001 Chamberlain School Of Nursing
Ver perfil
Seguir Necesitas iniciar sesión para seguir a otros usuarios o asignaturas
Vendido
802
Miembro desde
4 año
Número de seguidores
566
Documentos
1190
Última venta
2 días hace
Expert001

High quality, well written Test Banks, Guides, Solution Manuals and Exams to enhance your learning potential and take your grades to new heights. Kindly leave a review and suggestions. We do take pride in our high-quality services and we are always ready to support all clients.

4.2

159 reseñas

5
104
4
18
3
14
2
7
1
16

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

Student with book image

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes