State space models

Consider \(n_{x}\)-dimensional VAR(1) process:

\[ \mathbf{x}_{t} = A \mathbf{x}_{t-1} + \boldsymbol \varepsilon_{t}, \;\;\; \boldsymbol \varepsilon_t \sim \operatorname{WN} \left( 0, \;\mathbf{\Sigma}\right) \]

However, some variables of \(\mathbf{x}_{t}\) are not observed, i.e. no data exists for them

We can define a matrix \(C\) that maps \(\mathbf{x}_{t}\) into a \(n_{z}\)-dimensional \(\mathbf{z}_{t}\) collecting the observed variables in \(\mathbf{x}_{t}\):

\[ \mathbf{z}_{t} = C \mathbf{x}_{t}\]

Example:

  • \(\mathbf{x}_{t} = [x_{1,t}, x_{2,t}]^{\prime}\)

  • \(x_{1,t}\) - unobserved, \(x_{2,t}\) - observed

\[\begin{split}z_t = \underset{C}{\underbrace{\left[\begin{array}{cc} 0, 1 \end{array} \right]}} \left[\begin{array}{c} x_{1,t}\\ x_{2,t}\\ \end{array}\right] \end{split}\]

The case where \(\mathbf{z}_{t}\) is a subset of \(\mathbf{x}_{t}\) is one example of linear state space model

\[\begin{split} \begin{align} \mathbf{x}_{t} &= A \mathbf{x}_{t-1} + \boldsymbol \varepsilon_{t}, \\ \mathbf{z}_{t} &= C \mathbf{x}_{t} + \boldsymbol \nu_{t} \end{align} \end{split}\]

which, in turn, is a special case of the class of (non-linear) state space models

\[\begin{split} \begin{align} \mathbf{x}_{t} &= f\left(\mathbf{x}_{t-1}, \boldsymbol \varepsilon_{t} \right), \\ \mathbf{z}_{t} &= g\left( \mathbf{x}_{t}, \boldsymbol \nu_{t} \right) \end{align} \end{split}\]

Simplest example: AR(1) model with measurement error:

\[\begin{split} \begin{align} x_{t} &= \alpha x_{t-1} + \varepsilon_{t}, \\ z_{t} &= x_{t} + \nu_{t} \end{align} \end{split}\]

Gaussian linear state space model

\[\begin{split} \begin{align} \mathbf{x}_{t} &= A \mathbf{x}_{t-1} + \boldsymbol \varepsilon_{t}, \;\;\;\;\; \varepsilon_{t} \sim \mathcal{N} \left( 0, \;\mathbf{\Sigma}_{\varepsilon}\right) \;\;\; \text{state (transition) equation}\\\\ \mathbf{z}_{t} &= C \mathbf{x}_{t} + \boldsymbol \nu_{t}, \;\;\;\;\;\;\; \nu_{t} \sim \mathcal{N} \left( 0, \;\mathbf{\Sigma}_{\nu}\right) \;\;\; \text{observation equation} \\\\ \mathbf{x}_{0} & \sim \mathcal{N} \left( 0, \;\mathbf{\Sigma}_{0} \right), \;\;\; \text{initial state} \end{align} \end{split}\]

Note 1: This is a time-invariant model. This can be relaxed with some or all of the matrices \(A\), \(C\), \(\mathbf{\Sigma}_{\varepsilon}\), and \(\mathbf{\Sigma}_{\nu}\) being functions of \(t\).

Note 2: We can add an intercept in one or both of the state and observation equations.

Note 3: \(A\) can be a companion matrix, so the unobserved variables could follow a general VAR§

Note 4: \(\varepsilon_{t}\) and \(\nu_t\) are assumed to be independent, but that can be relaxed.

Note 5: \(\mathbf{x}_{0}\) is independent from all \(\varepsilon_{t}\) and \(\nu_t\)

Autocovariances of \(\mathbf{z}_{t}\)

\[\begin{split} \begin{align} \Gamma_{z}(0) & = \operatorname{cov}(\mathbf{z}_t, \mathbf{z}_{t})\\ & = \operatorname{cov}(C\mathbf{x}_t, C \mathbf{x}_{t}) + \mathbf{\Sigma}_{\nu} \\ & = C \Gamma_{x}(0) C^{\prime} + \mathbf{\Sigma}_{\nu} \end{align} \end{split}\]
\[\begin{split} \begin{align} \Gamma_{z}(k) & = \operatorname{cov}(\mathbf{z}_t, \mathbf{z}_{t-k})\\ & = \operatorname{cov}(C\mathbf{x}_t, C \mathbf{x}_{t-k}) \\ & = C \Gamma_{x}(k) C^{\prime} \end{align} \end{split}\]

Note: We get \(\Gamma_{x}(0)\) and \(\Gamma_{x}(k)\) as in the last lecture

Stationarity of \(\mathbf{x}_{t}\)

\[ \mathbf{x}_{0}\sim \mathcal{N} \left( 0, \;\mathbf{\Sigma}_{0} \right), \]

requres

\[ \mathbf{\Sigma}_{0} = \Gamma_{x}(0)\]

Marginal distribution of \(\boldsymbol X = [\mathbf{x}^{\prime}_{1}, \mathbf{x}^{\prime}_{2}, \cdots, \mathbf{x}^{\prime}_{T}]^{\prime}\)

\[\begin{split} \underset{\boldsymbol X}{\underbrace{ \left[\begin{array}{c} \mathbf{x}_{1}\\ \mathbf{x}_{2}\\ \mathbf{x}_{3}\\ \mathbf{x}_{4}\\ \vdots\\ \mathbf{x}_{T} \end{array}\right]}} = \underset{\boldsymbol A}{\underbrace{ \left[\begin{array}{cccccccc} A & I & 0 & 0 & \cdots & 0 & 0 & 0\\ A^2 & A & I & 0 & \cdots & 0 & 0 & 0\\ A^3 & A^2 & A & I & \cdots & 0 & 0 & 0\\ A^4 & A^3 & A^2 & A & \cdots & 0 & 0 & 0\\ \vdots & \vdots & \vdots & \vdots & \cdots & \vdots & \vdots & \vdots\\ A^T & A^{T-1} & A^{T-2} & A^{T-3} & \cdots & A^2 & A & I \end{array}\right]}} \underset{\boldsymbol E}{\underbrace{ \left[\begin{array}{c} \mathbf{x}_{0}\\ \boldsymbol \varepsilon_{1}\\ \boldsymbol \varepsilon_{2}\\ \boldsymbol \varepsilon_{3}\\ \vdots\\ \boldsymbol \varepsilon_{T-1}\\ \boldsymbol \varepsilon_{T} \end{array}\right]}} \end{split}\]
\[ \boldsymbol X \sim \mathcal{N} \left( 0, \; \boldsymbol A \mathbf{\Sigma}_{\boldsymbol E} \boldsymbol A^{\prime}\right) \]

Note: see HW4 part 1 for an alternative way to write the system. Check Efficient simulation and integrated likelihood estimation in state space models for applications of that approach.

Marginal distribution of \(\boldsymbol Z = \left[\mathbf{z}^{\prime}_{1}, \mathbf{z}^{\prime}_{2}, \cdots, \mathbf{z}^{\prime}_{T} \right]^{\prime}\)

\[\begin{split} \underset{\boldsymbol Z}{\underbrace{ \left[\begin{array}{c} \mathbf{z}_{1}\\ \mathbf{z}_{2}\\ \vdots\\ \mathbf{z}_{T} \end{array}\right]}} = \underset{\boldsymbol C}{\underbrace{ \left[\begin{array}{cccc} C & 0 & \cdots & 0\\ 0 & C & \cdots & 0\\ \vdots & \vdots & \cdots & \vdots\\ 0 & 0 & \cdots & C\\ \end{array}\right]}} \underset{\boldsymbol X}{\underbrace{ \left[\begin{array}{c} \mathbf{x}_{1}\\ \mathbf{x}_{2}\\ \vdots\\ \mathbf{x}_{T} \end{array}\right]}} + \underset{\boldsymbol V}{\underbrace{ \left[\begin{array}{c} \boldsymbol \nu_{1}\\ \boldsymbol \nu_{2}\\ \vdots\\ \boldsymbol \nu_{T} \end{array}\right]}} \end{split}\]

Question: How would you compute \(\boldsymbol C\)?

\[ \boldsymbol Z \sim \mathcal{N} \left( 0, \; \boldsymbol C \mathbf{\Sigma}_{\boldsymbol X} \boldsymbol C^{\prime} + \mathbf{\Sigma}_{\boldsymbol V}\right) \]

Applications:

  • likelihood function: distribution of \(Z\)

  • forecasting: distribution of \(\mathbf{z}_{t+h}\), given \(Z_{1:t} = [\mathbf{z}^{\prime}_{1}, \mathbf{z}^{\prime}_{2}, \cdots, \mathbf{z}^{\prime}_{t}]^{\prime}\)

Joint distribution of \(\left[\boldsymbol X^{\prime}, \boldsymbol Z^{\prime} \right]^{\prime}\)

\[\begin{split} \begin{bmatrix} \boldsymbol X \\ \boldsymbol Z \end{bmatrix} \sim \mathcal{N} \left( 0, \begin{bmatrix} \mathbf{\Sigma}_{\boldsymbol X} & \mathbf{\Sigma}_{\boldsymbol X \boldsymbol Z} \\ \mathbf{\Sigma}_{\boldsymbol Z \boldsymbol X} & \mathbf{\Sigma}_{\boldsymbol Z} \end{bmatrix} \right) \end{split}\]

Moments of the conditional distribution of \(\boldsymbol X\) given \(\boldsymbol Z\)

  • \(\operatorname{E}(\boldsymbol X | \boldsymbol Z) = \mathbf{\Sigma}_{\boldsymbol X \boldsymbol Z} \mathbf{\Sigma}^{-1}_{\boldsymbol Z} \boldsymbol Z\)

  • \(\operatorname{cov}(\boldsymbol X | \boldsymbol Z) = \mathbf{\Sigma}_{\boldsymbol X \boldsymbol X} - \mathbf{\Sigma}_{\boldsymbol X \boldsymbol Z}\mathbf{\Sigma}_{\boldsymbol Z}^{-1}\mathbf{\Sigma}_{\boldsymbol Z \boldsymbol X }\)

Note: The conditional variance of \(\boldsymbol X\) given \(\boldsymbol Z\) does not depend on the data \(\boldsymbol Z\)

Moments of the conditional distribution

of \(\boldsymbol X_{t_1:t_2}=[\mathbf{x}^{\prime}_{t_1}, \cdots, \mathbf{x}^{\prime}_{t_2}]^{\prime}\) given \(\boldsymbol Z_{t_3:t_4} = [\mathbf{z}^{\prime}_{t_3}, \cdots, \mathbf{z}^{\prime}_{t_4}]^{\prime}\)

\[ \{\mathbf{\Sigma}_{\boldsymbol X \boldsymbol X}\}_{(t_1:t_2), (t_1:t_1)}, \;\; \{\mathbf{\Sigma}_{\boldsymbol Z \boldsymbol Z}\}_{(t_3:t_4), (t_3:t_4)}, \;\; \{\mathbf{\Sigma}_{\boldsymbol X \boldsymbol Z}\}_{(t_1:t_2), (t_3:t_4)} \;\; \{\mathbf{\Sigma}_{\boldsymbol Z \boldsymbol X}\}_{(t_3:t_4), (t_1:t_2)} \]

Applications:

  • filtering: distribution of \(\mathbf{x}_{t}\), given \(\boldsymbol Z_{1:t} = [\mathbf{z}^{\prime}_{1}, \mathbf{z}^{\prime}_{2}, \cdots, \mathbf{z}^{\prime}_{t}]^{\prime}\)

  • state prediction: distribution of \(\mathbf{x}_{t+h}\), given \(\boldsymbol Z_{1:t}\)

  • smoothing: distribution of \(\mathbf{x}_{t}\), given \(\boldsymbol Z_{1:T}\)

Kalman filter

Let

\[\begin{split} \begin{align} \mathbf{x}_{t|t-1} &= \operatorname{E}(\mathbf{x}_t | \mathbf{Z}_{1:t-1}), \;\; \mathbf{\Sigma}^{x}_{t|t-1} = \operatorname{cov}(\mathbf{x}_t | \mathbf{Z}_{1:t-1}) \\ \mathbf{x}_{t|t} &= \operatorname{E}(\mathbf{x}_t | \mathbf{Z}_{1:t}), \;\; \mathbf{\Sigma}^{x}_{t|t} = \operatorname{cov}(\mathbf{x}_t | \mathbf{Z}_{1:t}) \\ \mathbf{z}_{t|t-1} &= \operatorname{E}(\mathbf{z}_t | \mathbf{Z}_{1:t-1}), \;\; \mathbf{\Sigma}^{z}_{t|t-1} = \operatorname{cov}(\mathbf{z}_t | \mathbf{Z}_{1:t-1}) \\ \end{align} \end{split}\]

and

\[\mathbf{x}_{0|0} = 0, \;\; \mathbf{\Sigma}^{x}_{0|0} = \mathbf{\Sigma}_{0}\]
  • optimal one-step ahead forecast of \(\mathbf{x}\)

\[\begin{split} \begin{align} \mathbf{x}_{t|t-1} &= A \mathbf{x}_{t-1|t-1} \\ \mathbf{\Sigma}^{x}_{t|t-1} &= A \mathbf{\Sigma}^{x}_{t-1|t-1} A^{\prime} + \mathbf{\Sigma}_{\varepsilon}\\ \end{align} \end{split}\]
  • optimal one-step ahead forecast of \(\mathbf{z}\)

\[\begin{split} \begin{align} \mathbf{z}_{t|t-1} &= C \mathbf{x}_{t|t-1} \\ \mathbf{\Sigma}^{z}_{t|t-1} &= A \mathbf{\Sigma}^{x}_{t|t-1} A^{\prime} + \mathbf{\Sigma}_{\nu} \end{align} \end{split}\]
  • optimal update of the forecast of \(\mathbf{x}\)

\[\begin{split} \begin{align} \mathbf{x}_{t|t} &= \mathbf{x}_{t|t-1} + \mathbf{K}_{t}\left(\mathbf{z}_{t} - \mathbf{z}_{t|t-1} \right) \\ \mathbf{\Sigma}^{x}_{t|t} &= \mathbf{\Sigma}^{x}_{t|t-1} - \mathbf{K}_{t} \mathbf{\Sigma}^{z}_{t|t-1} \mathbf{K}_{t}^{\prime} \end{align} \end{split}\]

where \( \mathbf{K}_{t} = \mathbf{\Sigma}^{x}_{t|t-1} C^{\prime} (\mathbf{\Sigma}^{z}_{t|t-1})^{-1} \) is called Kalman gain - it shows how to optimally update the forecast of \(\mathbf{x}_{t}\) after the new observation of \(\mathbf{z}_{t}\) is seen

Derivation for \(t=1\)

  • step 1 Compute joint distribution of \([\mathbf{x}_1^{\prime}, \mathbf{z}_1^{\prime}]^{\prime}\) using

\[\begin{split} \begin{bmatrix} \mathbf{x}_1\\ \mathbf{z}_1 \end{bmatrix} = \begin{bmatrix} A\\ C A \end{bmatrix} \mathbf{x}_0 + \begin{bmatrix} I & 0 \\ C & A \end{bmatrix} \begin{bmatrix} \boldsymbol \varepsilon_{1}\\ \boldsymbol \nu_{1} \end{bmatrix} \end{split}\]

Note that \(\mathbf{x}_0\), \(\boldsymbol \varepsilon_{1}\), and \(\boldsymbol \nu_{1}\) are independent

  • step 2 Compute the marginal distribution of \(\mathbf{x}_1\) and \(\mathbf{z}_1\)

  • step 3 Compute the conditional distribution of \(\mathbf{x}_1\) given \(\mathbf{z}_1\)

Derivation for any \(t\): use induction: assume the optimal update formulae hold for \(t-1\) and show that the one-step ahead ones are true:

  • write \([\mathbf{x}_t, \mathbf{z}_t]\) in terms of \(\mathbf{x}_{t-1}\), \(\boldsymbol \varepsilon_{t}\), and \(\boldsymbol \nu_{t}\)

  • using the assumed conditional distribution of \(\mathbf{x}_{t-1}\) given \(Z_{1:t-1}\), compute the joint and marginal conditional disrtibutions of \(\mathbf{x}_t\) and \(\mathbf{z}_t\) given \(Z_{1:t-1}\).

  • from the joint (conditional) distribution compute the conditional distribution of \(\mathbf{x}_t\) given \(\mathbf{z}_t\). This will give you the conditional distribution of \(\mathbf{x}_t\) given \(Z_{1:t} = [Z_{1:t-1}, \mathbf{z}_t]\)

Likelihood function with the Kalman filter

joint distribution of \( Z = \left[\mathbf{z}^{\prime}_{1}, \mathbf{z}^{\prime}_{2}, \cdots, \mathbf{z}^{\prime}_{T} \right]^{\prime}\)

\[ p(\boldsymbol Z; \boldsymbol \theta) = p(\mathbf{z}_{0}; \boldsymbol \theta) \prod_{t=1}^{T} p(\mathbf{z}_{t}|Z_{1:t-1}; \boldsymbol \theta) \]

where \(p(\mathbf{z}_{t}|Z_{1:t-1}; \boldsymbol \theta)\) is Gaussian with moments \(\mathbf{z}_{t|t-1}\) and \(\mathbf{\Sigma}^{z}_{t|t-1} \) given by the Kalman Filter.

Note: equivalent to, but much more efficient than computing the joint distribution of \(Z\) as

\[ \boldsymbol Z \sim \mathcal{N} \left( 0, \; \boldsymbol C \mathbf{\Sigma}_{\boldsymbol X} \boldsymbol C^{\prime} + \mathbf{\Sigma}_{\boldsymbol V}\right) \]

Kalman smoother

\[\begin{split} \begin{align} \mathbf{x}_{t-1|T} &= \mathbf{x}_{t-1|t-1} + \mathbf{J}_{t-1}\left(\mathbf{x}_{t|T} - \mathbf{x}_{t|t-1} \right) \\ \mathbf{\Sigma}^{x}_{t-1|T} &= \mathbf{\Sigma}^{x}_{t-1|t-1} + \mathbf{J}_{t-1} \left( \mathbf{\Sigma}^{x}_{t|T} - \mathbf{\Sigma}^{x}_{t|t-1} \right) \mathbf{J}_{t-1}^{\prime} \end{align} \end{split}\]

where \( \mathbf{J}_{t-1} = \mathbf{\Sigma}^{x}_{t-1|t-1} A^{\prime} (\mathbf{\Sigma}^{x}_{t|t-1})^{-1} \)

Note 1: for \(t=T+1\), \(\mathbf{x}_{t-1|T}\) and \(\mathbf{\Sigma}^{x}_{t-1|T}\) are given by the Kalman fitler. After that, going backwards, all necessary objects are provided by the previous smoothing step, and by the Kalman filter

Note 2: equivalent to, but much more efficient than computing the block diagonal of conditional distribution of \(\boldsymbol X\) given \(\boldsymbol Z\)

Estimation

What are we estimating?

\[\begin{split} \begin{align} \mathbf{x}_{t} &= A \mathbf{x}_{t-1} + \boldsymbol \varepsilon_{t}, \;\;\;\;\; \varepsilon_{t} \sim \mathcal{N} \left( 0, \;\mathbf{\Sigma}_{\varepsilon}\right) \;\;\; \text{state (transition) equation}\\ \mathbf{z}_{t} &= C \mathbf{x}_{t} + \boldsymbol \nu_{t}, \;\;\;\;\;\;\; \nu_{t} \sim \mathcal{N} \left( 0, \;\mathbf{\Sigma}_{\nu}\right) \;\;\; \text{observation equation} \\ \end{align} \end{split}\]

Collect the unknown parameters of \(A\), \(C\), \(\mathbf{\Sigma}_{\varepsilon}\), \(\mathbf{\Sigma}_{\nu}\) in \(\boldsymbol \theta\)

MLE

\[ \ell(\boldsymbol \theta | \boldsymbol Z) = \operatorname{log}(\mathcal{L}(\boldsymbol \theta | \boldsymbol Z) = p(\boldsymbol Z; \boldsymbol \theta) \]
\[ \begin{align} \hat {\boldsymbol \theta}&= \underset{\boldsymbol \theta}{\mathrm{argmax}}~~\ell(\boldsymbol \theta | \boldsymbol Z) \end{align} \]

Identification

\[\begin{split} \begin{align} \mathbf{x}_{t} &= A \mathbf{x}_{t-1} + \boldsymbol \varepsilon_{t}, \;\;\;\;\; \varepsilon_{t} \sim \mathcal{N} \left( 0, \;\mathbf{\Sigma}_{\varepsilon}\right) \;\;\; \text{state (transition) equation}\\ \mathbf{z}_{t} &= C \mathbf{x}_{t} + \boldsymbol \nu_{t}, \;\;\;\;\;\;\; \nu_{t} \sim \mathcal{N} \left( 0, \;\mathbf{\Sigma}_{\nu}\right) \;\;\; \text{observation equation} \\ \end{align} \end{split}\]
  • If we replace \(\mathbf{x}_{t}\) with \(\mathbf{x}_{t}^{*} = T \mathbf{x}_{t}\), \(\varepsilon_{t}\) with \(\varepsilon_{t}^{*} = T \varepsilon_{t}\), \(A\) with \(A^{*} = T A T^{-1}\), \(C\) with \(C^{*} = C T^{-1}\), and \(\mathbf{\Sigma}_{\varepsilon}\) with \(\mathbf{\Sigma}_{\varepsilon}^{*} = T\mathbf{\Sigma}_{\varepsilon}T'\) the process for \(\mathbf{z}_{t}\) remains unchanged, and the likelihood function remain the same.

  • Therefore, unless there are (sufficient) restrictions on \(A\), \(C\), and \(\mathbf{\Sigma}_{\varepsilon}\), their parameter cannot be identified - multiple values of \(\boldsymbol \theta\) imply the same value of the likelihood.

  • a simple way to check for local identification at a given value of \(\boldsymbol \theta\) is to compute the Jacobian matrix of \(\mathbf{\Sigma}_{\boldsymbol Z}\) w.r.t \(\boldsymbol \theta\) and check that it has full rank.

Forecasting

Optimal forecast given information at \(T\):

\[\begin{split} \begin{align} \operatorname{E}(\mathbf{z}_{T+1} | \boldsymbol Z ) & = C \operatorname{E}(\mathbf{x}_{T+1} | \boldsymbol Z ) = C \mathbf{x}_{T+1|T} = CA \mathbf{x}_{T|T} \\ \operatorname{E}(\mathbf{z}_{T+h} | \boldsymbol Z ) & = C \boldsymbol A^h \mathbf{x}_{T|T} \end{align} \end{split}\]

Computing the variance of the forecast erros

Using that \(\mathbf{x}_{t}\) is VAR(1)

\[ \mathbf{x}_{T+h} = A^{h}\mathbf{x}_{T} + A^{h-1}\boldsymbol \varepsilon_{T+1} + \cdots + A\boldsymbol \varepsilon_{T+h-1} + \boldsymbol \varepsilon_{T+h} \]

and

\[ \mathbf{x}_{T+h|T} = A^{h}\mathbf{x}_{T|T} \]

Therefore, the forecast error is

\[ \mathbf{x}_{T+h} - \mathbf{x}_{T+h|T} = A^{h}\left(\mathbf{x}_{T} - \mathbf{x}_{T|T} \right) + A^{h-1}\boldsymbol \varepsilon_{T+1} + \cdots + A\boldsymbol \varepsilon_{T+h-1} + \boldsymbol \varepsilon_{T+h} \]

The MSE of \(\mathbf{x}_{T+h|T}\) is

\[ \mathbf{\Sigma}^{x}_{T+h|T} = A^{h} \mathbf{\Sigma}^{x}_{T|T} (A^{h})^{\prime} + A^{h-1} \Sigma_{\varepsilon} (A^{h-1})^{\prime} + \cdots + A \Sigma_{\varepsilon} A^{\prime} + \Sigma_{\varepsilon} \]

and the MSE of \(\mathbf{z}_{T+h|T}\) is

\[ \mathbf{\Sigma}^{z}_{T+h|T} = C \mathbf{\Sigma}^{x}_{T+h|T} C^{\prime} + \Sigma_{\nu} \]

Accounting for parameter uncertainty

So far, forecast errors were always computed assuming that parameters were known. However, they are estimated and parameter uncertainty is also present (it disappears only asymptotically)

total uncertainty

\[\begin{split} \begin{align} \operatorname{E}\left([\mathbf{x}_{T+h} - \mathbf{x}_{T+h|T} (\hat{\theta})][\mathbf{x}_{T+h} - \mathbf{x}_{T+h|T} (\hat{\theta})]^{\prime} \right) &= \operatorname{E}\left([\mathbf{x}_{T+h} - \mathbf{x}_{T+h|T}(\theta)][\mathbf{x}_{T+h} - \mathbf{x}_{T+h|T}(\theta)]^{\prime} \right) \\ & + \operatorname{E}\left([\mathbf{x}_{T+h|T}(\hat{\theta}) - \mathbf{x}_{T+h|T} (\hat{\theta})][\mathbf{x}_{T+h|T}(\hat{\theta}) - \mathbf{x}_{T+h|T} (\hat{\theta})]^{\prime} \right) \\ &= \operatorname{E}\left(\mathbf{\Sigma}^{x}_{T+h|T} \right) + \operatorname{E}\left([\mathbf{x}_{T+h|T}(\theta) - \mathbf{x}_{T+h|T} (\hat{\theta})][\mathbf{x}_{T+h|T}(\theta) - \mathbf{x}_{T+h|T} (\hat{\theta})]^{\prime} \right) \end{align} \end{split}\]
  • Evaluate \(\operatorname{E}\left(\mathbf{\Sigma}^{x}_{T+h|T} \right)\) by generating \(N\) draws from the asymptotic distribution of \(\hat{\theta}\) and compute the average value of \(\mathbf{\Sigma}^{x}_{T+h|T}\)

  • Evaluate \(\operatorname{E}\left([\mathbf{x}_{T+h|T}(\theta) - \mathbf{x}_{T+h|T} (\hat{\theta})][\mathbf{x}_{T+h|T}(\theta) - \mathbf{x}_{T+h|T} (\hat{\theta})]^{\prime} \right) \) as the average of \(\left([\mathbf{x}_{T+h|T}(\theta_i) - \mathbf{x}_{T+h|T} (\hat{\theta})][\mathbf{x}_{T+h|T}(\theta_i) - \mathbf{x}_{T+h|T} (\hat{\theta})]^{\prime} \right)\) using the \(N\) draws \(\theta_i\) from the asymptotic distribution of \(\hat{\theta}\)

  • see Hamilton, James D. “A standard error for the estimated state vector of a state-space model.” Journal of Econometrics 33.3 (1986)

Linear state space model

\[\begin{split} \begin{align} \mathbf{x}_{t} &= A \mathbf{x}_{t-1} + \boldsymbol \varepsilon_{t}, \;\;\;\;\; \varepsilon_{t} \sim \mathcal{N} \left( 0, \;\mathbf{\Sigma}_{\varepsilon}\right) \\ \mathbf{z}_{t} &= C \mathbf{x}_{t} + \boldsymbol \nu_{t}, \;\;\;\;\;\;\; \nu_{t} \sim \mathcal{N} \left( 0, \;\mathbf{\Sigma}_{\nu}\right) \\ \mathbf{x}_{0} & \sim \mathcal{N} \left( 0, \;\mathbf{\Sigma}_{0} \right), \end{align} \end{split}\]

reduced-form (statistical) models

  • the parameters are the unrestricted elements of \(A\), \(C\), \(\mathbf{\Sigma}_{\varepsilon}\), \(\mathbf{\Sigma}_{\nu}\)

  • of little (no) interest on their ow

structural (theoretical) models

  • reduced-form model parameters are functions of (often much) smaller number of structural parameters \(\boldsymbol \theta\)

  • \(\boldsymbol \theta\) have economic meaning and are (or could be) of interest on their own

  • estimation is usually harder (non-linear functions)

Non-linear state space models in Python

docs