Macroeconomics Summary
B.J.H. De Jong
September 23, 2021
Week 1
Lecture 1: The Likelihood Principle
A sample space is the set of all samples you could draw. Denoted by; Y. The parametric statistical
model (or parametric class) F is a set of pdf’s with the same given functional form, of which the elements
differ only by having different values of some finite-dimensional parameter θ.
F := {f (·; θ) | θ ∈ Θ ⊆ Rk } k<∞
The likelihood function for the parametric statistical model F is a function L : Θ → R+ , defined as the
following shows and it gives the density value of the data as a function of the parameter.
n
Y
L(θ; y) := c(y)f (y; θ) = c f (yi ; θ) (1)
i=1
Likelihoods for different samples are said to be equivalent if their ratio does not depend on θ. The notation
is L(θ; y) ∝ L(θ; z). I.e. if we can write: L(θ; y) = c(y, z)L(θ; z); where the function c does not depend on
the parameter.
Lecture 2: Sufficiency (35)
A statistic is a function T : Y → Rr , r ∈ N+ , such that T(y) does not depend on θ, with t = T(y) its
realization, or sample value. Some remarks about statistics: 1) A statistic can be multi-dimensional. 2) The
collection of order statistics is a statistic. 3) The likelihood function is ”not” a statistic, as it depends on the
parameter theta. 4) The maximum of the likelihood function can be a statistic. We do not want unnecessary
large statistic thus: For some F, a statistic T(y) is sufficient for θ if it takes the same value at two points
y, z ∈ Y only if y and z have equivalent likelihoods:
T (y) = T (z) =⇒ L(θ; y) ∝ L(θ; z) ∀θ ∈ Θ
We can also say: If T(y) is sufficient for θ, then it contains all the information necessary to compute the
likelihood. Notice that the ‘trivial’ statistic T(y) = y is always sufficient. Neyman’s factorization’s
Theorem: For some F, T (·) is sufficient for θ iff we can factorize:
f (y; θ) = h(y)g(T (y); θ). (2)
This also implies that a one-to-one function of a sufficient statistic is also a sufficient statistic. We could
also check sufficiency by showing that the conditional distribution f (y|T (y) = t)does not depend on θ. This
is not very much discussed.
For some F, a sufficient statistic T(y) is minimal sufficient for θ if it takes distinct values only at points
in Y with non-equivalent likelihoods:
T (y) = T (z) ⇐⇒ L(θ; y) ∝ L(θ; z) ∀θ ∈ Θ
1
, We can also say: the minimum amount of information we need to characterize the likelihood. This is
equivalent to the condition:
L(θ; y)
(3)
L(θ; z)
is free of θ iff T(y) = T(z).
For a statistic T(y) we can partition Y into subsets Yt , on which T(y) = t, where t is in the range of T(·).
We can also make a partitioning of Y using the notion of equivalent likelihood.
Week 2
Lecture 3: Exponential families (66)
A parametric family is said to be exponential of order r if its densities of an observation yj can be written as
r
!
X
f (yj ; θ) = q(yj )exp ψi (θ)ti (yj ) − τ (θ) (1)
i=1
where the ti (yj ) do not depend on θ and the ψi (θ) and τ (θ) do not depend on yj . If the exponential family
(1) is in reduced form, then T = (t1 (y), ..., tr (y)) is minimal sufficient for θ.
An exponential family is regular if:
1. The parameter space Θ is natural, i.e.
(Z r
! )
X
Θ= q(y)exp ψi (θ)ti (y) dν(y) < ∞
Y i=1
2. dimΘ = k = r, the dimension of the minimal sufficient statistic.
3. The function θ 7→ ψ(θ) = (ψ1 (θ), ..., ψr (θ)) is invertible
4. The functions ψ1 (θ), ..., ψr (θ) are infinitely often differentiable in θ.
If we know a function belongs to the exponential family we can do this:
τ 0 (θ)
E[t(Yj )] = (2)
ψ 0 (θ)
The equation (2) simplifies if ψ(θ) = θ. Such a parametrization is called canonical, with ψ the canonical
parameter.
Lecture 4: Maximum Likelihood (83)
Estimation is finding an estimator θ̂ for the true parameter that generated our data, say θ0 . Definition: An
estimator is a function θ̂ : Y → Θ. The maximum likelihood estimator (MLE) of θ is an element θ̂ ∈ Θ
which attains the maximum value of the likelihood L(θ) in Θ, i.e.
L(θ̂) = max L(θ) (3)
θ∈Θ
the basic idea of maximum likelihood estimation is to find the parameter that maximizes the chance of seeing
the sample we have. We have that the likelihood and the log-likelihood give the same maximum.
If L(θ) is differentiable and Θ is an open subset of Rk , then the MLE must satisfy:
∂
`(θ)|θ=θ̂ = 0 (4)
∂θ
2
B.J.H. De Jong
September 23, 2021
Week 1
Lecture 1: The Likelihood Principle
A sample space is the set of all samples you could draw. Denoted by; Y. The parametric statistical
model (or parametric class) F is a set of pdf’s with the same given functional form, of which the elements
differ only by having different values of some finite-dimensional parameter θ.
F := {f (·; θ) | θ ∈ Θ ⊆ Rk } k<∞
The likelihood function for the parametric statistical model F is a function L : Θ → R+ , defined as the
following shows and it gives the density value of the data as a function of the parameter.
n
Y
L(θ; y) := c(y)f (y; θ) = c f (yi ; θ) (1)
i=1
Likelihoods for different samples are said to be equivalent if their ratio does not depend on θ. The notation
is L(θ; y) ∝ L(θ; z). I.e. if we can write: L(θ; y) = c(y, z)L(θ; z); where the function c does not depend on
the parameter.
Lecture 2: Sufficiency (35)
A statistic is a function T : Y → Rr , r ∈ N+ , such that T(y) does not depend on θ, with t = T(y) its
realization, or sample value. Some remarks about statistics: 1) A statistic can be multi-dimensional. 2) The
collection of order statistics is a statistic. 3) The likelihood function is ”not” a statistic, as it depends on the
parameter theta. 4) The maximum of the likelihood function can be a statistic. We do not want unnecessary
large statistic thus: For some F, a statistic T(y) is sufficient for θ if it takes the same value at two points
y, z ∈ Y only if y and z have equivalent likelihoods:
T (y) = T (z) =⇒ L(θ; y) ∝ L(θ; z) ∀θ ∈ Θ
We can also say: If T(y) is sufficient for θ, then it contains all the information necessary to compute the
likelihood. Notice that the ‘trivial’ statistic T(y) = y is always sufficient. Neyman’s factorization’s
Theorem: For some F, T (·) is sufficient for θ iff we can factorize:
f (y; θ) = h(y)g(T (y); θ). (2)
This also implies that a one-to-one function of a sufficient statistic is also a sufficient statistic. We could
also check sufficiency by showing that the conditional distribution f (y|T (y) = t)does not depend on θ. This
is not very much discussed.
For some F, a sufficient statistic T(y) is minimal sufficient for θ if it takes distinct values only at points
in Y with non-equivalent likelihoods:
T (y) = T (z) ⇐⇒ L(θ; y) ∝ L(θ; z) ∀θ ∈ Θ
1
, We can also say: the minimum amount of information we need to characterize the likelihood. This is
equivalent to the condition:
L(θ; y)
(3)
L(θ; z)
is free of θ iff T(y) = T(z).
For a statistic T(y) we can partition Y into subsets Yt , on which T(y) = t, where t is in the range of T(·).
We can also make a partitioning of Y using the notion of equivalent likelihood.
Week 2
Lecture 3: Exponential families (66)
A parametric family is said to be exponential of order r if its densities of an observation yj can be written as
r
!
X
f (yj ; θ) = q(yj )exp ψi (θ)ti (yj ) − τ (θ) (1)
i=1
where the ti (yj ) do not depend on θ and the ψi (θ) and τ (θ) do not depend on yj . If the exponential family
(1) is in reduced form, then T = (t1 (y), ..., tr (y)) is minimal sufficient for θ.
An exponential family is regular if:
1. The parameter space Θ is natural, i.e.
(Z r
! )
X
Θ= q(y)exp ψi (θ)ti (y) dν(y) < ∞
Y i=1
2. dimΘ = k = r, the dimension of the minimal sufficient statistic.
3. The function θ 7→ ψ(θ) = (ψ1 (θ), ..., ψr (θ)) is invertible
4. The functions ψ1 (θ), ..., ψr (θ) are infinitely often differentiable in θ.
If we know a function belongs to the exponential family we can do this:
τ 0 (θ)
E[t(Yj )] = (2)
ψ 0 (θ)
The equation (2) simplifies if ψ(θ) = θ. Such a parametrization is called canonical, with ψ the canonical
parameter.
Lecture 4: Maximum Likelihood (83)
Estimation is finding an estimator θ̂ for the true parameter that generated our data, say θ0 . Definition: An
estimator is a function θ̂ : Y → Θ. The maximum likelihood estimator (MLE) of θ is an element θ̂ ∈ Θ
which attains the maximum value of the likelihood L(θ) in Θ, i.e.
L(θ̂) = max L(θ) (3)
θ∈Θ
the basic idea of maximum likelihood estimation is to find the parameter that maximizes the chance of seeing
the sample we have. We have that the likelihood and the log-likelihood give the same maximum.
If L(θ) is differentiable and Θ is an open subset of Rk , then the MLE must satisfy:
∂
`(θ)|θ=θ̂ = 0 (4)
∂θ
2