\[
\mathbf{x}_{t} = A \mathbf{x}_{t-1} + \boldsymbol \varepsilon_{t}, \;\;\; \boldsymbol \varepsilon_t \sim \operatorname{WN} \left( 0, \;\mathbf{\Sigma}\right)
\]
\[\begin{split}z_t = \underset{C}{\underbrace{\left[\begin{array}{cc} 0, 1 \end{array} \right]}}
\left[\begin{array}{c}
x_{1,t}\\
x_{2,t}\\
\end{array}\right]
\end{split}\]
\[\begin{split}
\begin{align}
\mathbf{x}_{t} &= A \mathbf{x}_{t-1} + \boldsymbol \varepsilon_{t}, \\
\mathbf{z}_{t} &= C \mathbf{x}_{t} + \boldsymbol \nu_{t}
\end{align}
\end{split}\]
\[\begin{split}
\begin{align}
\mathbf{x}_{t} &= f\left(\mathbf{x}_{t-1}, \boldsymbol \varepsilon_{t} \right), \\
\mathbf{z}_{t} &= g\left( \mathbf{x}_{t}, \boldsymbol \nu_{t} \right)
\end{align}
\end{split}\]
\[\begin{split}
\begin{align}
x_{t} &= \alpha x_{t-1} + \varepsilon_{t}, \\
z_{t} &= x_{t} + \nu_{t}
\end{align}
\end{split}\]
Gaussian linear state space model
\[\begin{split}
\begin{align}
\mathbf{x}_{t} &= A \mathbf{x}_{t-1} + \boldsymbol \varepsilon_{t}, \;\;\;\;\; \varepsilon_{t} \sim \mathcal{N} \left( 0, \;\mathbf{\Sigma}_{\varepsilon}\right) \;\;\; \text{state (transition) equation}\\\\
\mathbf{z}_{t} &= C \mathbf{x}_{t} + \boldsymbol \nu_{t}, \;\;\;\;\;\;\; \nu_{t} \sim \mathcal{N} \left( 0, \;\mathbf{\Sigma}_{\nu}\right) \;\;\; \text{observation equation} \\\\
\mathbf{x}_{0} & \sim \mathcal{N} \left( 0, \;\mathbf{\Sigma}_{0} \right), \;\;\; \text{initial state}
\end{align}
\end{split}\]
Note 1: This is a time-invariant model. This can be relaxed with some or all of the matrices \(A\), \(C\), \(\mathbf{\Sigma}_{\varepsilon}\), and \(\mathbf{\Sigma}_{\nu}\) being functions of \(t\).
Note 2: We can add an intercept in one or both of the state and observation equations.
Note 3: \(A\) can be a companion matrix, so the unobserved variables could follow a general VAR§
Note 4: \(\varepsilon_{t}\) and \(\nu_t\) are assumed to be independent, but that can be relaxed.
Note 5: \(\mathbf{x}_{0}\) is independent from all \(\varepsilon_{t}\) and \(\nu_t\)
Autocovariances of \(\mathbf{z}_{t}\)
\[\begin{split}
\begin{align}
\Gamma_{z}(0) & = \operatorname{cov}(\mathbf{z}_t, \mathbf{z}_{t})\\
& = \operatorname{cov}(C\mathbf{x}_t, C \mathbf{x}_{t}) + \mathbf{\Sigma}_{\nu} \\
& = C \Gamma_{x}(0) C^{\prime} + \mathbf{\Sigma}_{\nu}
\end{align}
\end{split}\]
\[\begin{split}
\begin{align}
\Gamma_{z}(k) & = \operatorname{cov}(\mathbf{z}_t, \mathbf{z}_{t-k})\\
& = \operatorname{cov}(C\mathbf{x}_t, C \mathbf{x}_{t-k}) \\
& = C \Gamma_{x}(k) C^{\prime}
\end{align}
\end{split}\]
Note: We get \(\Gamma_{x}(0)\) and \(\Gamma_{x}(k)\) as in the last lecture
Stationarity of \(\mathbf{x}_{t}\)
\[
\mathbf{x}_{0}\sim \mathcal{N} \left( 0, \;\mathbf{\Sigma}_{0} \right),
\]
requres
\[ \mathbf{\Sigma}_{0} = \Gamma_{x}(0)\]
Marginal distribution of \(\boldsymbol X = [\mathbf{x}^{\prime}_{1}, \mathbf{x}^{\prime}_{2}, \cdots, \mathbf{x}^{\prime}_{T}]^{\prime}\)
\[\begin{split}
\underset{\boldsymbol X}{\underbrace{
\left[\begin{array}{c}
\mathbf{x}_{1}\\
\mathbf{x}_{2}\\
\mathbf{x}_{3}\\
\mathbf{x}_{4}\\
\vdots\\
\mathbf{x}_{T}
\end{array}\right]}}
=
\underset{\boldsymbol A}{\underbrace{
\left[\begin{array}{cccccccc}
A & I & 0 & 0 & \cdots & 0 & 0 & 0\\
A^2 & A & I & 0 & \cdots & 0 & 0 & 0\\
A^3 & A^2 & A & I & \cdots & 0 & 0 & 0\\
A^4 & A^3 & A^2 & A & \cdots & 0 & 0 & 0\\
\vdots & \vdots & \vdots & \vdots & \cdots & \vdots & \vdots & \vdots\\
A^T & A^{T-1} & A^{T-2} & A^{T-3} & \cdots & A^2 & A & I
\end{array}\right]}}
\underset{\boldsymbol E}{\underbrace{
\left[\begin{array}{c}
\mathbf{x}_{0}\\
\boldsymbol \varepsilon_{1}\\
\boldsymbol \varepsilon_{2}\\
\boldsymbol \varepsilon_{3}\\
\vdots\\
\boldsymbol \varepsilon_{T-1}\\
\boldsymbol \varepsilon_{T}
\end{array}\right]}}
\end{split}\]
\[
\boldsymbol X \sim \mathcal{N} \left( 0, \; \boldsymbol A \mathbf{\Sigma}_{\boldsymbol E} \boldsymbol A^{\prime}\right)
\]
Note: see HW4 part 1 for an alternative way to write the system. Check Efficient simulation and integrated likelihood
estimation in state space models for applications of that approach.
Marginal distribution of \(\boldsymbol Z = \left[\mathbf{z}^{\prime}_{1}, \mathbf{z}^{\prime}_{2}, \cdots, \mathbf{z}^{\prime}_{T} \right]^{\prime}\)
\[\begin{split}
\underset{\boldsymbol Z}{\underbrace{
\left[\begin{array}{c}
\mathbf{z}_{1}\\
\mathbf{z}_{2}\\
\vdots\\
\mathbf{z}_{T}
\end{array}\right]}}
=
\underset{\boldsymbol C}{\underbrace{
\left[\begin{array}{cccc}
C & 0 & \cdots & 0\\
0 & C & \cdots & 0\\
\vdots & \vdots & \cdots & \vdots\\
0 & 0 & \cdots & C\\
\end{array}\right]}}
\underset{\boldsymbol X}{\underbrace{
\left[\begin{array}{c}
\mathbf{x}_{1}\\
\mathbf{x}_{2}\\
\vdots\\
\mathbf{x}_{T}
\end{array}\right]}}
+
\underset{\boldsymbol V}{\underbrace{
\left[\begin{array}{c}
\boldsymbol \nu_{1}\\
\boldsymbol \nu_{2}\\
\vdots\\
\boldsymbol \nu_{T}
\end{array}\right]}}
\end{split}\]
Question: How would you compute \(\boldsymbol C\)?
\[
\boldsymbol Z \sim \mathcal{N} \left( 0, \; \boldsymbol C \mathbf{\Sigma}_{\boldsymbol X} \boldsymbol C^{\prime} + \mathbf{\Sigma}_{\boldsymbol V}\right)
\]
Applications:
likelihood function: distribution of \(Z\)
forecasting: distribution of \(\mathbf{z}_{t+h}\), given \(Z_{1:t} = [\mathbf{z}^{\prime}_{1}, \mathbf{z}^{\prime}_{2}, \cdots, \mathbf{z}^{\prime}_{t}]^{\prime}\)
Joint distribution of \(\left[\boldsymbol X^{\prime}, \boldsymbol Z^{\prime} \right]^{\prime}\)
\[\begin{split}
\begin{bmatrix}
\boldsymbol X \\
\boldsymbol Z
\end{bmatrix}
\sim \mathcal{N} \left( 0,
\begin{bmatrix}
\mathbf{\Sigma}_{\boldsymbol X} & \mathbf{\Sigma}_{\boldsymbol X \boldsymbol Z} \\
\mathbf{\Sigma}_{\boldsymbol Z \boldsymbol X} & \mathbf{\Sigma}_{\boldsymbol Z}
\end{bmatrix}
\right)
\end{split}\]
Moments of the conditional distribution of \(\boldsymbol X\) given \(\boldsymbol Z\)
\(\operatorname{E}(\boldsymbol X | \boldsymbol Z) = \mathbf{\Sigma}_{\boldsymbol X \boldsymbol Z} \mathbf{\Sigma}^{-1}_{\boldsymbol Z} \boldsymbol Z\)
\(\operatorname{cov}(\boldsymbol X | \boldsymbol Z) = \mathbf{\Sigma}_{\boldsymbol X \boldsymbol X} - \mathbf{\Sigma}_{\boldsymbol X \boldsymbol Z}\mathbf{\Sigma}_{\boldsymbol Z}^{-1}\mathbf{\Sigma}_{\boldsymbol Z \boldsymbol X }\)
Note: The conditional variance of \(\boldsymbol X\) given \(\boldsymbol Z\) does not depend on the data \(\boldsymbol Z\)
Moments of the conditional distribution
of \(\boldsymbol X_{t_1:t_2}=[\mathbf{x}^{\prime}_{t_1}, \cdots, \mathbf{x}^{\prime}_{t_2}]^{\prime}\) given \(\boldsymbol Z_{t_3:t_4} = [\mathbf{z}^{\prime}_{t_3}, \cdots, \mathbf{z}^{\prime}_{t_4}]^{\prime}\)
\[
\{\mathbf{\Sigma}_{\boldsymbol X \boldsymbol X}\}_{(t_1:t_2), (t_1:t_1)}, \;\;
\{\mathbf{\Sigma}_{\boldsymbol Z \boldsymbol Z}\}_{(t_3:t_4), (t_3:t_4)}, \;\;
\{\mathbf{\Sigma}_{\boldsymbol X \boldsymbol Z}\}_{(t_1:t_2), (t_3:t_4)} \;\;
\{\mathbf{\Sigma}_{\boldsymbol Z \boldsymbol X}\}_{(t_3:t_4), (t_1:t_2)}
\]
Applications:
filtering: distribution of \(\mathbf{x}_{t}\), given \(\boldsymbol Z_{1:t} = [\mathbf{z}^{\prime}_{1}, \mathbf{z}^{\prime}_{2}, \cdots, \mathbf{z}^{\prime}_{t}]^{\prime}\)
state prediction: distribution of \(\mathbf{x}_{t+h}\), given \(\boldsymbol Z_{1:t}\)
smoothing: distribution of \(\mathbf{x}_{t}\), given \(\boldsymbol Z_{1:T}\)
Kalman filter
Let
\[\begin{split}
\begin{align}
\mathbf{x}_{t|t-1} &= \operatorname{E}(\mathbf{x}_t | \mathbf{Z}_{1:t-1}), \;\; \mathbf{\Sigma}^{x}_{t|t-1} = \operatorname{cov}(\mathbf{x}_t | \mathbf{Z}_{1:t-1}) \\
\mathbf{x}_{t|t} &= \operatorname{E}(\mathbf{x}_t | \mathbf{Z}_{1:t}), \;\; \mathbf{\Sigma}^{x}_{t|t} = \operatorname{cov}(\mathbf{x}_t | \mathbf{Z}_{1:t}) \\
\mathbf{z}_{t|t-1} &= \operatorname{E}(\mathbf{z}_t | \mathbf{Z}_{1:t-1}), \;\; \mathbf{\Sigma}^{z}_{t|t-1} = \operatorname{cov}(\mathbf{z}_t | \mathbf{Z}_{1:t-1}) \\
\end{align}
\end{split}\]
and
\[\mathbf{x}_{0|0} = 0, \;\; \mathbf{\Sigma}^{x}_{0|0} = \mathbf{\Sigma}_{0}\]
\[\begin{split}
\begin{align}
\mathbf{x}_{t|t-1} &= A \mathbf{x}_{t-1|t-1} \\
\mathbf{\Sigma}^{x}_{t|t-1} &= A \mathbf{\Sigma}^{x}_{t-1|t-1} A^{\prime} + \mathbf{\Sigma}_{\varepsilon}\\
\end{align}
\end{split}\]
\[\begin{split}
\begin{align}
\mathbf{z}_{t|t-1} &= C \mathbf{x}_{t|t-1}
\\
\mathbf{\Sigma}^{z}_{t|t-1} &= A \mathbf{\Sigma}^{x}_{t|t-1} A^{\prime} + \mathbf{\Sigma}_{\nu}
\end{align}
\end{split}\]
\[\begin{split}
\begin{align}
\mathbf{x}_{t|t} &= \mathbf{x}_{t|t-1} + \mathbf{K}_{t}\left(\mathbf{z}_{t} - \mathbf{z}_{t|t-1} \right)
\\
\mathbf{\Sigma}^{x}_{t|t} &= \mathbf{\Sigma}^{x}_{t|t-1} - \mathbf{K}_{t} \mathbf{\Sigma}^{z}_{t|t-1} \mathbf{K}_{t}^{\prime}
\end{align}
\end{split}\]
where
\(
\mathbf{K}_{t} = \mathbf{\Sigma}^{x}_{t|t-1} C^{\prime} (\mathbf{\Sigma}^{z}_{t|t-1})^{-1}
\)
is called Kalman gain - it shows how to optimally update the forecast of \(\mathbf{x}_{t}\) after the new observation of \(\mathbf{z}_{t}\) is seen
Derivation for \(t=1\)
\[\begin{split}
\begin{bmatrix}
\mathbf{x}_1\\
\mathbf{z}_1
\end{bmatrix}
=
\begin{bmatrix}
A\\
C A
\end{bmatrix} \mathbf{x}_0 +
\begin{bmatrix}
I & 0 \\
C & A
\end{bmatrix}
\begin{bmatrix}
\boldsymbol \varepsilon_{1}\\
\boldsymbol \nu_{1}
\end{bmatrix}
\end{split}\]
Note that \(\mathbf{x}_0\), \(\boldsymbol \varepsilon_{1}\), and \(\boldsymbol \nu_{1}\) are independent
Derivation for any \(t\): use induction: assume the optimal update formulae hold for \(t-1\) and show that the one-step ahead ones are true:
write \([\mathbf{x}_t, \mathbf{z}_t]\) in terms of \(\mathbf{x}_{t-1}\), \(\boldsymbol \varepsilon_{t}\), and \(\boldsymbol \nu_{t}\)
using the assumed conditional distribution of \(\mathbf{x}_{t-1}\) given \(Z_{1:t-1}\), compute the joint and marginal conditional disrtibutions of \(\mathbf{x}_t\) and \(\mathbf{z}_t\) given \(Z_{1:t-1}\).
from the joint (conditional) distribution compute the conditional distribution of \(\mathbf{x}_t\) given \(\mathbf{z}_t\). This will give you the conditional distribution of \(\mathbf{x}_t\) given \(Z_{1:t} = [Z_{1:t-1}, \mathbf{z}_t]\)
Likelihood function with the Kalman filter
joint distribution of \( Z = \left[\mathbf{z}^{\prime}_{1}, \mathbf{z}^{\prime}_{2}, \cdots, \mathbf{z}^{\prime}_{T} \right]^{\prime}\)
\[ p(\boldsymbol Z; \boldsymbol \theta) = p(\mathbf{z}_{0}; \boldsymbol \theta) \prod_{t=1}^{T} p(\mathbf{z}_{t}|Z_{1:t-1}; \boldsymbol \theta) \]
where \(p(\mathbf{z}_{t}|Z_{1:t-1}; \boldsymbol \theta)\) is Gaussian with moments \(\mathbf{z}_{t|t-1}\) and \(\mathbf{\Sigma}^{z}_{t|t-1} \) given by the Kalman Filter.
Note: equivalent to, but much more efficient than computing the joint distribution of \(Z\) as
\[
\boldsymbol Z \sim \mathcal{N} \left( 0, \; \boldsymbol C \mathbf{\Sigma}_{\boldsymbol X} \boldsymbol C^{\prime} + \mathbf{\Sigma}_{\boldsymbol V}\right)
\]
Kalman smoother
\[\begin{split}
\begin{align}
\mathbf{x}_{t-1|T} &= \mathbf{x}_{t-1|t-1} + \mathbf{J}_{t-1}\left(\mathbf{x}_{t|T} - \mathbf{x}_{t|t-1} \right)
\\
\mathbf{\Sigma}^{x}_{t-1|T} &= \mathbf{\Sigma}^{x}_{t-1|t-1} + \mathbf{J}_{t-1} \left( \mathbf{\Sigma}^{x}_{t|T} - \mathbf{\Sigma}^{x}_{t|t-1} \right) \mathbf{J}_{t-1}^{\prime}
\end{align}
\end{split}\]
where
\(
\mathbf{J}_{t-1} = \mathbf{\Sigma}^{x}_{t-1|t-1} A^{\prime} (\mathbf{\Sigma}^{x}_{t|t-1})^{-1}
\)
Note 1: for \(t=T+1\), \(\mathbf{x}_{t-1|T}\) and \(\mathbf{\Sigma}^{x}_{t-1|T}\) are given by the Kalman fitler. After that, going backwards, all necessary objects are provided by the previous smoothing step, and by the Kalman filter
Note 2: equivalent to, but much more efficient than computing the block diagonal of conditional distribution of \(\boldsymbol X\) given \(\boldsymbol Z\)
Estimation
What are we estimating?
\[\begin{split}
\begin{align}
\mathbf{x}_{t} &= A \mathbf{x}_{t-1} + \boldsymbol \varepsilon_{t}, \;\;\;\;\; \varepsilon_{t} \sim \mathcal{N} \left( 0, \;\mathbf{\Sigma}_{\varepsilon}\right) \;\;\; \text{state (transition) equation}\\
\mathbf{z}_{t} &= C \mathbf{x}_{t} + \boldsymbol \nu_{t}, \;\;\;\;\;\;\; \nu_{t} \sim \mathcal{N} \left( 0, \;\mathbf{\Sigma}_{\nu}\right) \;\;\; \text{observation equation} \\
\end{align}
\end{split}\]
Collect the unknown parameters of \(A\), \(C\), \(\mathbf{\Sigma}_{\varepsilon}\), \(\mathbf{\Sigma}_{\nu}\) in \(\boldsymbol \theta\)
MLE
\[ \ell(\boldsymbol \theta | \boldsymbol Z) = \operatorname{log}(\mathcal{L}(\boldsymbol \theta | \boldsymbol Z) = p(\boldsymbol Z; \boldsymbol \theta) \]
\[
\begin{align}
\hat {\boldsymbol \theta}&= \underset{\boldsymbol \theta}{\mathrm{argmax}}~~\ell(\boldsymbol \theta | \boldsymbol Z)
\end{align}
\]
Identification
\[\begin{split}
\begin{align}
\mathbf{x}_{t} &= A \mathbf{x}_{t-1} + \boldsymbol \varepsilon_{t}, \;\;\;\;\; \varepsilon_{t} \sim \mathcal{N} \left( 0, \;\mathbf{\Sigma}_{\varepsilon}\right) \;\;\; \text{state (transition) equation}\\
\mathbf{z}_{t} &= C \mathbf{x}_{t} + \boldsymbol \nu_{t}, \;\;\;\;\;\;\; \nu_{t} \sim \mathcal{N} \left( 0, \;\mathbf{\Sigma}_{\nu}\right) \;\;\; \text{observation equation} \\
\end{align}
\end{split}\]
If we replace \(\mathbf{x}_{t}\) with \(\mathbf{x}_{t}^{*} = T \mathbf{x}_{t}\), \(\varepsilon_{t}\) with \(\varepsilon_{t}^{*} = T \varepsilon_{t}\), \(A\) with \(A^{*} = T A T^{-1}\), \(C\) with \(C^{*} = C T^{-1}\), and \(\mathbf{\Sigma}_{\varepsilon}\) with \(\mathbf{\Sigma}_{\varepsilon}^{*} = T\mathbf{\Sigma}_{\varepsilon}T'\) the process for \(\mathbf{z}_{t}\) remains unchanged, and the likelihood function remain the same.
Therefore, unless there are (sufficient) restrictions on \(A\), \(C\), and \(\mathbf{\Sigma}_{\varepsilon}\), their parameter cannot be identified - multiple values of \(\boldsymbol \theta\) imply the same value of the likelihood.
Forecasting
Optimal forecast given information at \(T\):
\[\begin{split}
\begin{align}
\operatorname{E}(\mathbf{z}_{T+1} | \boldsymbol Z ) & = C \operatorname{E}(\mathbf{x}_{T+1} | \boldsymbol Z ) = C \mathbf{x}_{T+1|T} = CA \mathbf{x}_{T|T} \\
\operatorname{E}(\mathbf{z}_{T+h} | \boldsymbol Z ) & = C \boldsymbol A^h \mathbf{x}_{T|T}
\end{align}
\end{split}\]
Computing the variance of the forecast erros
Using that \(\mathbf{x}_{t}\) is VAR(1)
\[
\mathbf{x}_{T+h} = A^{h}\mathbf{x}_{T} + A^{h-1}\boldsymbol \varepsilon_{T+1} + \cdots + A\boldsymbol \varepsilon_{T+h-1} + \boldsymbol \varepsilon_{T+h}
\]
and
\[
\mathbf{x}_{T+h|T} = A^{h}\mathbf{x}_{T|T}
\]
Therefore, the forecast error is
\[
\mathbf{x}_{T+h} - \mathbf{x}_{T+h|T} = A^{h}\left(\mathbf{x}_{T} - \mathbf{x}_{T|T} \right) + A^{h-1}\boldsymbol \varepsilon_{T+1} + \cdots + A\boldsymbol \varepsilon_{T+h-1} + \boldsymbol \varepsilon_{T+h}
\]
The MSE of \(\mathbf{x}_{T+h|T}\) is
\[
\mathbf{\Sigma}^{x}_{T+h|T} = A^{h} \mathbf{\Sigma}^{x}_{T|T} (A^{h})^{\prime} + A^{h-1} \Sigma_{\varepsilon} (A^{h-1})^{\prime} + \cdots + A \Sigma_{\varepsilon} A^{\prime} + \Sigma_{\varepsilon}
\]
and the MSE of \(\mathbf{z}_{T+h|T}\) is
\[
\mathbf{\Sigma}^{z}_{T+h|T} = C \mathbf{\Sigma}^{x}_{T+h|T} C^{\prime} + \Sigma_{\nu}
\]
Accounting for parameter uncertainty
So far, forecast errors were always computed assuming that parameters were known. However, they are estimated and parameter uncertainty is also present (it disappears only asymptotically)
total uncertainty
\[\begin{split}
\begin{align}
\operatorname{E}\left([\mathbf{x}_{T+h} - \mathbf{x}_{T+h|T} (\hat{\theta})][\mathbf{x}_{T+h} - \mathbf{x}_{T+h|T} (\hat{\theta})]^{\prime} \right) &= \operatorname{E}\left([\mathbf{x}_{T+h} - \mathbf{x}_{T+h|T}(\theta)][\mathbf{x}_{T+h} - \mathbf{x}_{T+h|T}(\theta)]^{\prime} \right) \\
& + \operatorname{E}\left([\mathbf{x}_{T+h|T}(\hat{\theta}) - \mathbf{x}_{T+h|T} (\hat{\theta})][\mathbf{x}_{T+h|T}(\hat{\theta}) - \mathbf{x}_{T+h|T} (\hat{\theta})]^{\prime} \right) \\
&= \operatorname{E}\left(\mathbf{\Sigma}^{x}_{T+h|T} \right)
+ \operatorname{E}\left([\mathbf{x}_{T+h|T}(\theta) - \mathbf{x}_{T+h|T} (\hat{\theta})][\mathbf{x}_{T+h|T}(\theta) - \mathbf{x}_{T+h|T} (\hat{\theta})]^{\prime} \right)
\end{align}
\end{split}\]
Evaluate \(\operatorname{E}\left(\mathbf{\Sigma}^{x}_{T+h|T} \right)\) by generating \(N\) draws from the asymptotic distribution of \(\hat{\theta}\) and compute the average value of \(\mathbf{\Sigma}^{x}_{T+h|T}\)
Evaluate \(\operatorname{E}\left([\mathbf{x}_{T+h|T}(\theta) - \mathbf{x}_{T+h|T} (\hat{\theta})][\mathbf{x}_{T+h|T}(\theta) - \mathbf{x}_{T+h|T} (\hat{\theta})]^{\prime} \right) \) as the average of \(\left([\mathbf{x}_{T+h|T}(\theta_i) - \mathbf{x}_{T+h|T} (\hat{\theta})][\mathbf{x}_{T+h|T}(\theta_i) - \mathbf{x}_{T+h|T} (\hat{\theta})]^{\prime} \right)\) using the \(N\) draws \(\theta_i\) from the asymptotic distribution of \(\hat{\theta}\)
see Hamilton, James D. “A standard error for the estimated state vector of a state-space model.” Journal of Econometrics 33.3 (1986)
Linear state space model
\[\begin{split}
\begin{align}
\mathbf{x}_{t} &= A \mathbf{x}_{t-1} + \boldsymbol \varepsilon_{t}, \;\;\;\;\; \varepsilon_{t} \sim \mathcal{N} \left( 0, \;\mathbf{\Sigma}_{\varepsilon}\right) \\
\mathbf{z}_{t} &= C \mathbf{x}_{t} + \boldsymbol \nu_{t}, \;\;\;\;\;\;\; \nu_{t} \sim \mathcal{N} \left( 0, \;\mathbf{\Sigma}_{\nu}\right) \\
\mathbf{x}_{0} & \sim \mathcal{N} \left( 0, \;\mathbf{\Sigma}_{0} \right),
\end{align}
\end{split}\]
reduced-form (statistical) models
the parameters are the unrestricted elements of \(A\), \(C\), \(\mathbf{\Sigma}_{\varepsilon}\), \(\mathbf{\Sigma}_{\nu}\)
of little (no) interest on their ow
structural (theoretical) models
reduced-form model parameters are functions of (often much) smaller number of structural parameters \(\boldsymbol \theta\)
\(\boldsymbol \theta\) have economic meaning and are (or could be) of interest on their own
estimation is usually harder (non-linear functions)
Non-linear state space models in Python
docs