State space models¶

Consider $n_{x}$ -dimensional VAR(1) process:

x_{t} = A x_{t - 1} + ε_{t}, ε_{t} \sim WN (0, Σ)

However, some variables of $x_{t}$ are not observed, i.e. no data exists for them

We can define a matrix $C$ that maps $x_{t}$ into a $n_{z}$ -dimensional $z_{t}$ collecting the observed variables in $x_{t}$ :

z_{t} = C x_{t}

Example:

$x_{t} = [x_{1, t}, x_{2, t}]^{'}$
$x_{1, t}$ - unobserved, $x_{2, t}$ - observed

\begin{array}{r} z_{t} = \underset{C}{\underset{⏟}{[\begin{array}{cc} 0, 1 \end{array}]}} [\begin{array}{c} x_{1, t} \\ x_{2, t} \end{array}] \end{array}

The case where $z_{t}$ is a subset of $x_{t}$ is one example of linear state space model

\begin{array}{r} \begin{aligned} x_{t} & = A x_{t - 1} + ε_{t}, \\ z_{t} & = C x_{t} + ν_{t} \end{aligned} \end{array}

which, in turn, is a special case of the class of (non-linear) state space models

\begin{array}{r} \begin{aligned} x_{t} & = f (x_{t - 1}, ε_{t}), \\ z_{t} & = g (x_{t}, ν_{t}) \end{aligned} \end{array}

Simplest example: AR(1) model with measurement error:

\begin{array}{r} \begin{aligned} x_{t} & = α x_{t - 1} + ε_{t}, \\ z_{t} & = x_{t} + ν_{t} \end{aligned} \end{array}

Gaussian linear state space model¶

\begin{array}{r} \begin{aligned} x_{t} & = A x_{t - 1} + ε_{t}, ε_{t} \sim N (0, Σ_{ε}) state (transition) equation \\ z_{t} & = C x_{t} + ν_{t}, ν_{t} \sim N (0, Σ_{ν}) observation equation \\ x_{0} & \sim N (0, Σ_{0}), initial state \end{aligned} \end{array}

Note 1: This is a time-invariant model. This can be relaxed with some or all of the matrices $A$ , $C$ , $Σ_{ε}$ , and $Σ_{ν}$ being functions of $t$ .

Note 2: We can add an intercept in one or both of the state and observation equations.

Note 3: $A$ can be a companion matrix, so the unobserved variables could follow a general VAR§

Note 4: $ε_{t}$ and $ν_{t}$ are assumed to be independent, but that can be relaxed.

Note 5: $x_{0}$ is independent from all $ε_{t}$ and $ν_{t}$

Autocovariances of $z_{t}$

\begin{array}{r} \begin{aligned} Γ_{z} (0) & = cov (z_{t}, z_{t}) \\ = cov (C x_{t}, C x_{t}) + Σ_{ν} \\ = C Γ_{x} (0) C^{'} + Σ_{ν} \end{aligned} \end{array}

\begin{array}{r} \begin{aligned} Γ_{z} (k) & = cov (z_{t}, z_{t - k}) \\ = cov (C x_{t}, C x_{t - k}) \\ = C Γ_{x} (k) C^{'} \end{aligned} \end{array}

Note: We get $Γ_{x} (0)$ and $Γ_{x} (k)$ as in the last lecture

Stationarity of $x_{t}$

x_{0} \sim N (0, Σ_{0}),

requres

Σ_{0} = Γ_{x} (0)

Marginal distribution of $X = [x_{1}^{'}, x_{2}^{'}, \dots, x_{T}^{'}]^{'}$ ¶

\begin{array}{r} \underset{X}{\underset{⏟}{[\begin{array}{c} x_{1} \\ x_{2} \\ x_{3} \\ x_{4} \\ ⋮ \\ x_{T} \end{array}]}} = \underset{A}{\underset{⏟}{[\begin{array}{cccccccc} A & I & 0 & 0 & \dots & 0 & 0 & 0 \\ A^{2} & A & I & 0 & \dots & 0 & 0 & 0 \\ A^{3} & A^{2} & A & I & \dots & 0 & 0 & 0 \\ A^{4} & A^{3} & A^{2} & A & \dots & 0 & 0 & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & \dots & ⋮ & ⋮ & ⋮ \\ A^{T} & A^{T - 1} & A^{T - 2} & A^{T - 3} & \dots & A^{2} & A & I \end{array}]}} \underset{E}{\underset{⏟}{[\begin{array}{c} x_{0} \\ ε_{1} \\ ε_{2} \\ ε_{3} \\ ⋮ \\ ε_{T - 1} \\ ε_{T} \end{array}]}} \end{array}

X \sim N (0, A Σ_{E} A^{'})

Note: see HW4 part 1 for an alternative way to write the system. Check Efficient simulation and integrated likelihood estimation in state space models for applications of that approach.

Marginal distribution of $Z = {[z_{1}^{'}, z_{2}^{'}, \dots, z_{T}^{'}]}^{'}$ ¶

\begin{array}{r} \underset{Z}{\underset{⏟}{[\begin{array}{c} z_{1} \\ z_{2} \\ ⋮ \\ z_{T} \end{array}]}} = \underset{C}{\underset{⏟}{[\begin{array}{cccc} C & 0 & \dots & 0 \\ 0 & C & \dots & 0 \\ ⋮ & ⋮ & \dots & ⋮ \\ 0 & 0 & \dots & C \end{array}]}} \underset{X}{\underset{⏟}{[\begin{array}{c} x_{1} \\ x_{2} \\ ⋮ \\ x_{T} \end{array}]}} + \underset{V}{\underset{⏟}{[\begin{array}{c} ν_{1} \\ ν_{2} \\ ⋮ \\ ν_{T} \end{array}]}} \end{array}

Question: How would you compute $C$ ?

Z \sim N (0, C Σ_{X} C^{'} + Σ_{V})

Applications:¶

likelihood function: distribution of $Z$
forecasting: distribution of $z_{t + h}$ , given $Z_{1 : t} = [z_{1}^{'}, z_{2}^{'}, \dots, z_{t}^{'}]^{'}$

Joint distribution of ${[X^{'}, Z^{'}]}^{'}$ ¶

\begin{array}{r} [\begin{array}{c} X \\ Z \end{array}] \sim N (0, [\begin{array}{c} Σ_{X} & Σ_{X Z} \\ Σ_{Z X} & Σ_{Z} \end{array}]) \end{array}

Moments of the conditional distribution of $X$ given $Z$ ¶

$E (X | Z) = Σ_{X Z} Σ_{Z}^{- 1} Z$
$cov (X | Z) = Σ_{X X} - Σ_{X Z} Σ_{Z}^{- 1} Σ_{Z X}$

Note: The conditional variance of $X$ given $Z$ does not depend on the data $Z$

Moments of the conditional distribution¶

of $X_{t_{1} : t_{2}} = [x_{t_{1}}^{'}, \dots, x_{t_{2}}^{'}]^{'}$ given $Z_{t_{3} : t_{4}} = [z_{t_{3}}^{'}, \dots, z_{t_{4}}^{'}]^{'}$

{Σ_{X X}}_{(t_{1} : t_{2}), (t_{1} : t_{1})}, {Σ_{Z Z}}_{(t_{3} : t_{4}), (t_{3} : t_{4})}, {Σ_{X Z}}_{(t_{1} : t_{2}), (t_{3} : t_{4})} {Σ_{Z X}}_{(t_{3} : t_{4}), (t_{1} : t_{2})}

Applications:¶

filtering: distribution of $x_{t}$ , given $Z_{1 : t} = [z_{1}^{'}, z_{2}^{'}, \dots, z_{t}^{'}]^{'}$
state prediction: distribution of $x_{t + h}$ , given $Z_{1 : t}$
smoothing: distribution of $x_{t}$ , given $Z_{1 : T}$

Kalman filter¶

Let

\begin{array}{r} \begin{aligned} x_{t | t - 1} & = E (x_{t} | Z_{1 : t - 1}), Σ_{t | t - 1}^{x} = cov (x_{t} | Z_{1 : t - 1}) \\ x_{t | t} & = E (x_{t} | Z_{1 : t}), Σ_{t | t}^{x} = cov (x_{t} | Z_{1 : t}) \\ z_{t | t - 1} & = E (z_{t} | Z_{1 : t - 1}), Σ_{t | t - 1}^{z} = cov (z_{t} | Z_{1 : t - 1}) \end{aligned} \end{array}

and

x_{0 | 0} = 0, Σ_{0 | 0}^{x} = Σ_{0}

optimal one-step ahead forecast of $x$

\begin{array}{r} \begin{aligned} x_{t | t - 1} & = A x_{t - 1 | t - 1} \\ Σ_{t | t - 1}^{x} & = A Σ_{t - 1 | t - 1}^{x} A^{'} + Σ_{ε} \end{aligned} \end{array}

optimal one-step ahead forecast of $z$

\begin{array}{r} \begin{aligned} z_{t | t - 1} & = C x_{t | t - 1} \\ Σ_{t | t - 1}^{z} & = A Σ_{t | t - 1}^{x} A^{'} + Σ_{ν} \end{aligned} \end{array}

optimal update of the forecast of $x$

\begin{array}{r} \begin{aligned} x_{t | t} & = x_{t | t - 1} + K_{t} (z_{t} - z_{t | t - 1}) \\ Σ_{t | t}^{x} & = Σ_{t | t - 1}^{x} - K_{t} Σ_{t | t - 1}^{z} K_{t}^{'} \end{aligned} \end{array}

where $K_{t} = Σ_{t | t - 1}^{x} C^{'} (Σ_{t | t - 1}^{z})^{- 1}$ is called Kalman gain - it shows how to optimally update the forecast of $x_{t}$ after the new observation of $z_{t}$ is seen

Derivation for $t = 1$

step 1 Compute joint distribution of $[x_{1}^{'}, z_{1}^{'}]^{'}$ using

\begin{array}{r} [\begin{array}{c} x_{1} \\ z_{1} \end{array}] = [\begin{array}{c} A \\ C A \end{array}] x_{0} + [\begin{array}{c} I & 0 \\ C & A \end{array}] [\begin{array}{c} ε_{1} \\ ν_{1} \end{array}] \end{array}

Note that $x_{0}$ , $ε_{1}$ , and $ν_{1}$ are independent

step 2 Compute the marginal distribution of $x_{1}$ and $z_{1}$
step 3 Compute the conditional distribution of $x_{1}$ given $z_{1}$

Derivation for any $t$ : use induction: assume the optimal update formulae hold for $t - 1$ and show that the one-step ahead ones are true:

write $[x_{t}, z_{t}]$ in terms of $x_{t - 1}$ , $ε_{t}$ , and $ν_{t}$
using the assumed conditional distribution of $x_{t - 1}$ given $Z_{1 : t - 1}$ , compute the joint and marginal conditional disrtibutions of $x_{t}$ and $z_{t}$ given $Z_{1 : t - 1}$ .
from the joint (conditional) distribution compute the conditional distribution of $x_{t}$ given $z_{t}$ . This will give you the conditional distribution of $x_{t}$ given $Z_{1 : t} = [Z_{1 : t - 1}, z_{t}]$

Likelihood function with the Kalman filter¶

joint distribution of $Z = {[z_{1}^{'}, z_{2}^{'}, \dots, z_{T}^{'}]}^{'}$

p (Z; θ) = p (z_{0}; θ) \prod_{t = 1}^{T} p (z_{t} | Z_{1 : t - 1}; θ)

where $p (z_{t} | Z_{1 : t - 1}; θ)$ is Gaussian with moments $z_{t | t - 1}$ and $Σ_{t | t - 1}^{z}$ given by the Kalman Filter.

Note: equivalent to, but much more efficient than computing the joint distribution of $Z$ as

Z \sim N (0, C Σ_{X} C^{'} + Σ_{V})

Kalman smoother¶

\begin{array}{r} \begin{aligned} x_{t - 1 | T} & = x_{t - 1 | t - 1} + J_{t - 1} (x_{t | T} - x_{t | t - 1}) \\ Σ_{t - 1 | T}^{x} & = Σ_{t - 1 | t - 1}^{x} + J_{t - 1} (Σ_{t | T}^{x} - Σ_{t | t - 1}^{x}) J_{t - 1}^{'} \end{aligned} \end{array}

where $J_{t - 1} = Σ_{t - 1 | t - 1}^{x} A^{'} (Σ_{t | t - 1}^{x})^{- 1}$

Note 1: for $t = T + 1$ , $x_{t - 1 | T}$ and $Σ_{t - 1 | T}^{x}$ are given by the Kalman fitler. After that, going backwards, all necessary objects are provided by the previous smoothing step, and by the Kalman filter

Note 2: equivalent to, but much more efficient than computing the block diagonal of conditional distribution of $X$ given $Z$

Estimation¶

What are we estimating?

\begin{array}{r} \begin{aligned} x_{t} & = A x_{t - 1} + ε_{t}, ε_{t} \sim N (0, Σ_{ε}) state (transition) equation \\ z_{t} & = C x_{t} + ν_{t}, ν_{t} \sim N (0, Σ_{ν}) observation equation \end{aligned} \end{array}

Collect the unknown parameters of $A$ , $C$ , $Σ_{ε}$ , $Σ_{ν}$ in $θ$

MLE

ℓ (θ | Z) = \log (L (θ | Z) = p (Z; θ)

\begin{aligned} \hat{θ} & = \underset{θ}{argmax} ℓ (θ | Z) \end{aligned}

Identification¶

\begin{array}{r} \begin{aligned} x_{t} & = A x_{t - 1} + ε_{t}, ε_{t} \sim N (0, Σ_{ε}) state (transition) equation \\ z_{t} & = C x_{t} + ν_{t}, ν_{t} \sim N (0, Σ_{ν}) observation equation \end{aligned} \end{array}

If we replace $x_{t}$ with $x_{t}^{*} = T x_{t}$ , $ε_{t}$ with $ε_{t}^{*} = T ε_{t}$ , $A$ with $A^{*} = T A T^{- 1}$ , $C$ with $C^{*} = C T^{- 1}$ , and $Σ_{ε}$ with $Σ_{ε}^{*} = T Σ_{ε} T^{'}$ the process for $z_{t}$ remains unchanged, and the likelihood function remain the same.

Therefore, unless there are (sufficient) restrictions on $A$ , $C$ , and $Σ_{ε}$ , their parameter cannot be identified - multiple values of $θ$ imply the same value of the likelihood.

a simple way to check for local identification at a given value of $θ$ is to compute the Jacobian matrix of $Σ_{Z}$ w.r.t $θ$ and check that it has full rank.

Forecasting¶

Optimal forecast given information at $T$ :

\begin{array}{r} \begin{aligned} E (z_{T + 1} | Z) & = C E (x_{T + 1} | Z) = C x_{T + 1 | T} = C A x_{T | T} \\ E (z_{T + h} | Z) & = C A^{h} x_{T | T} \end{aligned} \end{array}

Computing the variance of the forecast erros

Using that $x_{t}$ is VAR(1)

x_{T + h} = A^{h} x_{T} + A^{h - 1} ε_{T + 1} + \dots + A ε_{T + h - 1} + ε_{T + h}

and

x_{T + h | T} = A^{h} x_{T | T}

Therefore, the forecast error is

x_{T + h} - x_{T + h | T} = A^{h} (x_{T} - x_{T | T}) + A^{h - 1} ε_{T + 1} + \dots + A ε_{T + h - 1} + ε_{T + h}

The MSE of $x_{T + h | T}$ is

Σ_{T + h | T}^{x} = A^{h} Σ_{T | T}^{x} (A^{h})^{'} + A^{h - 1} Σ_{ε} (A^{h - 1})^{'} + \dots + A Σ_{ε} A^{'} + Σ_{ε}

and the MSE of $z_{T + h | T}$ is

Σ_{T + h | T}^{z} = C Σ_{T + h | T}^{x} C^{'} + Σ_{ν}

Accounting for parameter uncertainty¶

So far, forecast errors were always computed assuming that parameters were known. However, they are estimated and parameter uncertainty is also present (it disappears only asymptotically)

total uncertainty

\begin{array}{r} \begin{aligned} E ([x_{T + h} - x_{T + h | T} (\hat{θ})] [x_{T + h} - x_{T + h | T} (\hat{θ})]^{'}) & = E ([x_{T + h} - x_{T + h | T} (θ)] [x_{T + h} - x_{T + h | T} (θ)]^{'}) \\ + E ([x_{T + h | T} (\hat{θ}) - x_{T + h | T} (\hat{θ})] [x_{T + h | T} (\hat{θ}) - x_{T + h | T} (\hat{θ})]^{'}) \\ = E (Σ_{T + h | T}^{x}) + E ([x_{T + h | T} (θ) - x_{T + h | T} (\hat{θ})] [x_{T + h | T} (θ) - x_{T + h | T} (\hat{θ})]^{'}) \end{aligned} \end{array}

Evaluate $E (Σ_{T + h | T}^{x})$ by generating $N$ draws from the asymptotic distribution of $\hat{θ}$ and compute the average value of $Σ_{T + h | T}^{x}$
Evaluate $E ([x_{T + h | T} (θ) - x_{T + h | T} (\hat{θ})] [x_{T + h | T} (θ) - x_{T + h | T} (\hat{θ})]^{'})$ as the average of $([x_{T + h | T} (θ_{i}) - x_{T + h | T} (\hat{θ})] [x_{T + h | T} (θ_{i}) - x_{T + h | T} (\hat{θ})]^{'})$ using the $N$ draws $θ_{i}$ from the asymptotic distribution of $\hat{θ}$
see Hamilton, James D. “A standard error for the estimated state vector of a state-space model.” Journal of Econometrics 33.3 (1986)

Linear state space model¶

\begin{array}{r} \begin{aligned} x_{t} & = A x_{t - 1} + ε_{t}, ε_{t} \sim N (0, Σ_{ε}) \\ z_{t} & = C x_{t} + ν_{t}, ν_{t} \sim N (0, Σ_{ν}) \\ x_{0} & \sim N (0, Σ_{0}), \end{aligned} \end{array}

reduced-form (statistical) models

the parameters are the unrestricted elements of $A$ , $C$ , $Σ_{ε}$ , $Σ_{ν}$
of little (no) interest on their ow

structural (theoretical) models

reduced-form model parameters are functions of (often much) smaller number of structural parameters $θ$
$θ$ have economic meaning and are (or could be) of interest on their own
estimation is usually harder (non-linear functions)

Non-linear state space models in Python

docs

State space models

Contents

State space models¶

Gaussian linear state space model¶

Marginal distribution of X=[x1′,x2′,⋯,xT′]′¶

Marginal distribution of Z=[z1′,z2′,⋯,zT′]′¶

Applications:¶

Joint distribution of [X′,Z′]′¶

Moments of the conditional distribution of X given Z¶

Moments of the conditional distribution¶

Applications:¶

Kalman filter¶

Likelihood function with the Kalman filter¶

Kalman smoother¶

Estimation¶

Identification¶

Forecasting¶

Accounting for parameter uncertainty¶

Linear state space model¶

Marginal distribution of $X = [x_{1}^{'}, x_{2}^{'}, \dots, x_{T}^{'}]^{'}$ ¶

Marginal distribution of $Z = {[z_{1}^{'}, z_{2}^{'}, \dots, z_{T}^{'}]}^{'}$ ¶

Joint distribution of ${[X^{'}, Z^{'}]}^{'}$ ¶

Moments of the conditional distribution of $X$ given $Z$ ¶