Chapter 1
Linear and Matrix Algebra
This chapter summarizes some important results of linear and matrix algebra that are
instrumental in deriving many statistical results in subsequent chapters. Our emphasis
is given to special matrices and their properties. Although the coverage of these mathe-
matical topics is rather brief, it is self-contained. Readers may also consult other linear
and matrix algebra textbooks for more detailed discussions; see e.g., Anton (1981),
Basilevsky (1983), Graybill (1969), and Noble and Daniel (1977).
1.1 Basic Notations
A matrix is an array of numbers. In what follows, a matrix is denoted by an upper-case
alphabet in boldface (e.g., A), and its (i, j) th element (the element at the i th row and
j th column) is denoted by the corresponding lower-case alphabet with subscripts ij
(e.g., aij ). Specifically, a m × n matrix A contains m rows and n columns and can be
expressed as
a11 a12 ... a1n
a
21 a22 . . . a2n
A= . .. .. .. .
.. . . .
am1 am2 . . . amn
An n × 1 (1 × n) matrix is an n-dimensional column (row) vector. Every vector will be
denoted by a lower-case alphabet in boldface (e.g., z), and its i th element is denoted
by the corresponding lower-case alphabet with subscript i (e.g., zi ). An 1 × 1 matrix is
just a scalar. For a matrix A, its i th column is denoted as ai .
A matrix is square if its number of rows equals the number of columns. A matrix is
said to be diagonal if its off-diagonal elements (i.e., aij , i = j) are all zeros and at least
1
,2 CHAPTER 1. LINEAR AND MATRIX ALGEBRA
one of its diagonal elements is non-zero, i.e., aii = 0 for some i = 1, . . . , n. A diagonal
matrix whose diagonal elements are all ones is an identity matrix, denoted as I; we also
write the n × n identity matrix as I n . A matrix A is said to be lower (upper) triangular
if aij = 0 for i < (>) j. We let 0 denote the matrix whose elements are all zeros.
For a vector-valued function f : Rm → Rn , ∇θ f (θ) is the m × n matrix of the
first-order derivatives of f with respect to the elements of θ:
∂f1 (θ) ∂f2 (θ) ∂fn (θ)
∂θ1 ∂θ1 ... ∂θ1
∂f1 (θ) ∂f2 (θ) ∂fn (θ)
∂θ2 ∂θ2 ... ∂θ2
∇θ f (θ) =
.. .. ..
.
..
. . . .
∂f1 (θ) ∂f2 (θ) ∂fn (θ)
∂θm ∂θm ... ∂θm
When n = 1, ∇θ f (θ) is the (column) gradient vector of f (θ). The m × m Hessian
matrix of the second-order derivatives of the real-valued function f (θ) is
∂ 2 f (θ) ∂ 2 f (θ) ∂ 2 f (θ)
∂θ1 ∂θ1 ∂θ1 ∂θ2 ... ∂θ1 ∂θm
∂ 2 f (θ) ∂ 2 f (θ) ∂ 2 f (θ)
...
∇θ f (θ) = ∇θ (∇θ f (θ)) =
∂θ2 ∂θ1 ∂θ2 ∂θ2 ∂θ2 ∂θm
2 .
.. .. .. ..
. . . .
∂ 2 f (θ) ∂ 2 f (θ) ∂ 2 f (θ)
∂θm ∂θ1 ∂θm ∂θ2 ... ∂θm ∂θm
1.2 Matrix Operations
Two matrices are said to be of the same size if they have the same number of rows and
same number of columns. Matrix equality is defined for two matrices of the same size.
Given two m × n matrices A and B, A = B if aij = bij for every i, j. The transpose of
an m × n matrix A, denoted as A , is the n × m matrix whose (i, j) th element is the
(j, i) th element of A. The transpose of a column vector is a row vector; the transpose
of a scalar is just the scalar itself. A matrix A is said to be symmetric if A = A , i.e.,
aij = aji for all i, j. Clearly, a diagonal matrix is symmetric, but a triangular matrix is
not.
Matrix addition is also defined for two matrices of the same size. Given two m × n
matrices A and B, their sum, C = A + B, is the m × n matrix with the (i, j) th element
cij = aij + bij . Note that matrix addition, if defined, is commutative:
A + B = B + A,
c Chung-Ming Kuan, 2002
, 1.2. MATRIX OPERATIONS 3
and associative:
A + (B + C) = (A + B) + C.
Also, A + 0 = A.
The scalar multiplication of the scalar c and matrix A is the matrix cA whose (i, j) th
element is caij . Clearly, cA = Ac, and −A = −1 × A. Thus, A + (−A) = A − A = 0.
Given two matrices A and B, the matrix multiplication AB is defined only when the
number of columns of A is the same as the number of rows of B. Specifically, when A
is m × n and B is n × p, their product, C = AB, is the m × p matrix whose (i, j) th
element is
n
cij = aik bkj .
k=1
Matrix multiplication is not commutative, i.e., AB = BA; in fact, when AB is defined,
BA need not be defined. On the other hand, matrix multiplication is associative:
A(BC) = (AB)C,
and distributive with respect to matrix addition:
A(B + C) = AB + AC.
It is easy to verify that (AB) = B A . For an m × n matrix A, I m A = AI n = A.
The inner product of two d-dimensional vectors y and z is the scalar
d
yz= yi zi .
i=1
If y is m-dimensional and z is n-dimensional, their outer product is the matrix yz
whose (i, j) th element is yi zj . In particular,
d
zz = zi2 ,
i=1
which is non-negative and induces the standard Euclidean norm of z as z = (z z)1/2 .
The vector with Euclidean norm zero must be a zero vector; the vector with Euclidean
norm one is referred to as a unit vector. For example,
1 √3 1 1 1
(1 0 0), 0 , √ √ √ ,
2 2 2 3 6
c Chung-Ming Kuan, 2002
Linear and Matrix Algebra
This chapter summarizes some important results of linear and matrix algebra that are
instrumental in deriving many statistical results in subsequent chapters. Our emphasis
is given to special matrices and their properties. Although the coverage of these mathe-
matical topics is rather brief, it is self-contained. Readers may also consult other linear
and matrix algebra textbooks for more detailed discussions; see e.g., Anton (1981),
Basilevsky (1983), Graybill (1969), and Noble and Daniel (1977).
1.1 Basic Notations
A matrix is an array of numbers. In what follows, a matrix is denoted by an upper-case
alphabet in boldface (e.g., A), and its (i, j) th element (the element at the i th row and
j th column) is denoted by the corresponding lower-case alphabet with subscripts ij
(e.g., aij ). Specifically, a m × n matrix A contains m rows and n columns and can be
expressed as
a11 a12 ... a1n
a
21 a22 . . . a2n
A= . .. .. .. .
.. . . .
am1 am2 . . . amn
An n × 1 (1 × n) matrix is an n-dimensional column (row) vector. Every vector will be
denoted by a lower-case alphabet in boldface (e.g., z), and its i th element is denoted
by the corresponding lower-case alphabet with subscript i (e.g., zi ). An 1 × 1 matrix is
just a scalar. For a matrix A, its i th column is denoted as ai .
A matrix is square if its number of rows equals the number of columns. A matrix is
said to be diagonal if its off-diagonal elements (i.e., aij , i = j) are all zeros and at least
1
,2 CHAPTER 1. LINEAR AND MATRIX ALGEBRA
one of its diagonal elements is non-zero, i.e., aii = 0 for some i = 1, . . . , n. A diagonal
matrix whose diagonal elements are all ones is an identity matrix, denoted as I; we also
write the n × n identity matrix as I n . A matrix A is said to be lower (upper) triangular
if aij = 0 for i < (>) j. We let 0 denote the matrix whose elements are all zeros.
For a vector-valued function f : Rm → Rn , ∇θ f (θ) is the m × n matrix of the
first-order derivatives of f with respect to the elements of θ:
∂f1 (θ) ∂f2 (θ) ∂fn (θ)
∂θ1 ∂θ1 ... ∂θ1
∂f1 (θ) ∂f2 (θ) ∂fn (θ)
∂θ2 ∂θ2 ... ∂θ2
∇θ f (θ) =
.. .. ..
.
..
. . . .
∂f1 (θ) ∂f2 (θ) ∂fn (θ)
∂θm ∂θm ... ∂θm
When n = 1, ∇θ f (θ) is the (column) gradient vector of f (θ). The m × m Hessian
matrix of the second-order derivatives of the real-valued function f (θ) is
∂ 2 f (θ) ∂ 2 f (θ) ∂ 2 f (θ)
∂θ1 ∂θ1 ∂θ1 ∂θ2 ... ∂θ1 ∂θm
∂ 2 f (θ) ∂ 2 f (θ) ∂ 2 f (θ)
...
∇θ f (θ) = ∇θ (∇θ f (θ)) =
∂θ2 ∂θ1 ∂θ2 ∂θ2 ∂θ2 ∂θm
2 .
.. .. .. ..
. . . .
∂ 2 f (θ) ∂ 2 f (θ) ∂ 2 f (θ)
∂θm ∂θ1 ∂θm ∂θ2 ... ∂θm ∂θm
1.2 Matrix Operations
Two matrices are said to be of the same size if they have the same number of rows and
same number of columns. Matrix equality is defined for two matrices of the same size.
Given two m × n matrices A and B, A = B if aij = bij for every i, j. The transpose of
an m × n matrix A, denoted as A , is the n × m matrix whose (i, j) th element is the
(j, i) th element of A. The transpose of a column vector is a row vector; the transpose
of a scalar is just the scalar itself. A matrix A is said to be symmetric if A = A , i.e.,
aij = aji for all i, j. Clearly, a diagonal matrix is symmetric, but a triangular matrix is
not.
Matrix addition is also defined for two matrices of the same size. Given two m × n
matrices A and B, their sum, C = A + B, is the m × n matrix with the (i, j) th element
cij = aij + bij . Note that matrix addition, if defined, is commutative:
A + B = B + A,
c Chung-Ming Kuan, 2002
, 1.2. MATRIX OPERATIONS 3
and associative:
A + (B + C) = (A + B) + C.
Also, A + 0 = A.
The scalar multiplication of the scalar c and matrix A is the matrix cA whose (i, j) th
element is caij . Clearly, cA = Ac, and −A = −1 × A. Thus, A + (−A) = A − A = 0.
Given two matrices A and B, the matrix multiplication AB is defined only when the
number of columns of A is the same as the number of rows of B. Specifically, when A
is m × n and B is n × p, their product, C = AB, is the m × p matrix whose (i, j) th
element is
n
cij = aik bkj .
k=1
Matrix multiplication is not commutative, i.e., AB = BA; in fact, when AB is defined,
BA need not be defined. On the other hand, matrix multiplication is associative:
A(BC) = (AB)C,
and distributive with respect to matrix addition:
A(B + C) = AB + AC.
It is easy to verify that (AB) = B A . For an m × n matrix A, I m A = AI n = A.
The inner product of two d-dimensional vectors y and z is the scalar
d
yz= yi zi .
i=1
If y is m-dimensional and z is n-dimensional, their outer product is the matrix yz
whose (i, j) th element is yi zj . In particular,
d
zz = zi2 ,
i=1
which is non-negative and induces the standard Euclidean norm of z as z = (z z)1/2 .
The vector with Euclidean norm zero must be a zero vector; the vector with Euclidean
norm one is referred to as a unit vector. For example,
1 √3 1 1 1
(1 0 0), 0 , √ √ √ ,
2 2 2 3 6
c Chung-Ming Kuan, 2002