State space models
Contents
State space models¶
Consider \(n_{x}\)-dimensional VAR(1) process:
However, some variables of \(\mathbf{x}_{t}\) are not observed, i.e. no data exists for them
We can define a matrix \(C\) that maps \(\mathbf{x}_{t}\) into a \(n_{z}\)-dimensional \(\mathbf{z}_{t}\) collecting the observed variables in \(\mathbf{x}_{t}\):
Example:
\(\mathbf{x}_{t} = [x_{1,t}, x_{2,t}]^{\prime}\)
\(x_{1,t}\) - unobserved, \(x_{2,t}\) - observed
The case where \(\mathbf{z}_{t}\) is a subset of \(\mathbf{x}_{t}\) is one example of linear state space model
which, in turn, is a special case of the class of (non-linear) state space models
Simplest example: AR(1) model with measurement error:
Gaussian linear state space model¶
Note 1: This is a time-invariant model. This can be relaxed with some or all of the matrices \(A\), \(C\), \(\mathbf{\Sigma}_{\varepsilon}\), and \(\mathbf{\Sigma}_{\nu}\) being functions of \(t\).
Note 2: We can add an intercept in one or both of the state and observation equations.
Note 3: \(A\) can be a companion matrix, so the unobserved variables could follow a general VAR§
Note 4: \(\varepsilon_{t}\) and \(\nu_t\) are assumed to be independent, but that can be relaxed.
Note 5: \(\mathbf{x}_{0}\) is independent from all \(\varepsilon_{t}\) and \(\nu_t\)
Autocovariances of \(\mathbf{z}_{t}\)
Note: We get \(\Gamma_{x}(0)\) and \(\Gamma_{x}(k)\) as in the last lecture
Stationarity of \(\mathbf{x}_{t}\)
requres
Marginal distribution of \(\boldsymbol X = [\mathbf{x}^{\prime}_{1}, \mathbf{x}^{\prime}_{2}, \cdots, \mathbf{x}^{\prime}_{T}]^{\prime}\)¶
Note: see HW4 part 1 for an alternative way to write the system. Check Efficient simulation and integrated likelihood estimation in state space models for applications of that approach.
Marginal distribution of \(\boldsymbol Z = \left[\mathbf{z}^{\prime}_{1}, \mathbf{z}^{\prime}_{2}, \cdots, \mathbf{z}^{\prime}_{T} \right]^{\prime}\)¶
Question: How would you compute \(\boldsymbol C\)?
Applications:¶
likelihood function: distribution of \(Z\)
forecasting: distribution of \(\mathbf{z}_{t+h}\), given \(Z_{1:t} = [\mathbf{z}^{\prime}_{1}, \mathbf{z}^{\prime}_{2}, \cdots, \mathbf{z}^{\prime}_{t}]^{\prime}\)
Joint distribution of \(\left[\boldsymbol X^{\prime}, \boldsymbol Z^{\prime} \right]^{\prime}\)¶
Moments of the conditional distribution of \(\boldsymbol X\) given \(\boldsymbol Z\)¶
\(\operatorname{E}(\boldsymbol X | \boldsymbol Z) = \mathbf{\Sigma}_{\boldsymbol X \boldsymbol Z} \mathbf{\Sigma}^{-1}_{\boldsymbol Z} \boldsymbol Z\)
\(\operatorname{cov}(\boldsymbol X | \boldsymbol Z) = \mathbf{\Sigma}_{\boldsymbol X \boldsymbol X} - \mathbf{\Sigma}_{\boldsymbol X \boldsymbol Z}\mathbf{\Sigma}_{\boldsymbol Z}^{-1}\mathbf{\Sigma}_{\boldsymbol Z \boldsymbol X }\)
Note: The conditional variance of \(\boldsymbol X\) given \(\boldsymbol Z\) does not depend on the data \(\boldsymbol Z\)
Moments of the conditional distribution¶
of \(\boldsymbol X_{t_1:t_2}=[\mathbf{x}^{\prime}_{t_1}, \cdots, \mathbf{x}^{\prime}_{t_2}]^{\prime}\) given \(\boldsymbol Z_{t_3:t_4} = [\mathbf{z}^{\prime}_{t_3}, \cdots, \mathbf{z}^{\prime}_{t_4}]^{\prime}\)
Applications:¶
filtering: distribution of \(\mathbf{x}_{t}\), given \(\boldsymbol Z_{1:t} = [\mathbf{z}^{\prime}_{1}, \mathbf{z}^{\prime}_{2}, \cdots, \mathbf{z}^{\prime}_{t}]^{\prime}\)
state prediction: distribution of \(\mathbf{x}_{t+h}\), given \(\boldsymbol Z_{1:t}\)
smoothing: distribution of \(\mathbf{x}_{t}\), given \(\boldsymbol Z_{1:T}\)
Kalman filter¶
Let
and
optimal one-step ahead forecast of \(\mathbf{x}\)
optimal one-step ahead forecast of \(\mathbf{z}\)
optimal update of the forecast of \(\mathbf{x}\)
where \( \mathbf{K}_{t} = \mathbf{\Sigma}^{x}_{t|t-1} C^{\prime} (\mathbf{\Sigma}^{z}_{t|t-1})^{-1} \) is called Kalman gain - it shows how to optimally update the forecast of \(\mathbf{x}_{t}\) after the new observation of \(\mathbf{z}_{t}\) is seen
Derivation for \(t=1\)
step 1 Compute joint distribution of \([\mathbf{x}_1^{\prime}, \mathbf{z}_1^{\prime}]^{\prime}\) using
Note that \(\mathbf{x}_0\), \(\boldsymbol \varepsilon_{1}\), and \(\boldsymbol \nu_{1}\) are independent
step 2 Compute the marginal distribution of \(\mathbf{x}_1\) and \(\mathbf{z}_1\)
step 3 Compute the conditional distribution of \(\mathbf{x}_1\) given \(\mathbf{z}_1\)
Derivation for any \(t\): use induction: assume the optimal update formulae hold for \(t-1\) and show that the one-step ahead ones are true:
write \([\mathbf{x}_t, \mathbf{z}_t]\) in terms of \(\mathbf{x}_{t-1}\), \(\boldsymbol \varepsilon_{t}\), and \(\boldsymbol \nu_{t}\)
using the assumed conditional distribution of \(\mathbf{x}_{t-1}\) given \(Z_{1:t-1}\), compute the joint and marginal conditional disrtibutions of \(\mathbf{x}_t\) and \(\mathbf{z}_t\) given \(Z_{1:t-1}\).
from the joint (conditional) distribution compute the conditional distribution of \(\mathbf{x}_t\) given \(\mathbf{z}_t\). This will give you the conditional distribution of \(\mathbf{x}_t\) given \(Z_{1:t} = [Z_{1:t-1}, \mathbf{z}_t]\)
Likelihood function with the Kalman filter¶
joint distribution of \( Z = \left[\mathbf{z}^{\prime}_{1}, \mathbf{z}^{\prime}_{2}, \cdots, \mathbf{z}^{\prime}_{T} \right]^{\prime}\)
where \(p(\mathbf{z}_{t}|Z_{1:t-1}; \boldsymbol \theta)\) is Gaussian with moments \(\mathbf{z}_{t|t-1}\) and \(\mathbf{\Sigma}^{z}_{t|t-1} \) given by the Kalman Filter.
Note: equivalent to, but much more efficient than computing the joint distribution of \(Z\) as
Kalman smoother¶
where \( \mathbf{J}_{t-1} = \mathbf{\Sigma}^{x}_{t-1|t-1} A^{\prime} (\mathbf{\Sigma}^{x}_{t|t-1})^{-1} \)
Note 1: for \(t=T+1\), \(\mathbf{x}_{t-1|T}\) and \(\mathbf{\Sigma}^{x}_{t-1|T}\) are given by the Kalman fitler. After that, going backwards, all necessary objects are provided by the previous smoothing step, and by the Kalman filter
Note 2: equivalent to, but much more efficient than computing the block diagonal of conditional distribution of \(\boldsymbol X\) given \(\boldsymbol Z\)
Estimation¶
What are we estimating?
Collect the unknown parameters of \(A\), \(C\), \(\mathbf{\Sigma}_{\varepsilon}\), \(\mathbf{\Sigma}_{\nu}\) in \(\boldsymbol \theta\)
MLE
Identification¶
If we replace \(\mathbf{x}_{t}\) with \(\mathbf{x}_{t}^{*} = T \mathbf{x}_{t}\), \(\varepsilon_{t}\) with \(\varepsilon_{t}^{*} = T \varepsilon_{t}\), \(A\) with \(A^{*} = T A T^{-1}\), \(C\) with \(C^{*} = C T^{-1}\), and \(\mathbf{\Sigma}_{\varepsilon}\) with \(\mathbf{\Sigma}_{\varepsilon}^{*} = T\mathbf{\Sigma}_{\varepsilon}T'\) the process for \(\mathbf{z}_{t}\) remains unchanged, and the likelihood function remain the same.
Therefore, unless there are (sufficient) restrictions on \(A\), \(C\), and \(\mathbf{\Sigma}_{\varepsilon}\), their parameter cannot be identified - multiple values of \(\boldsymbol \theta\) imply the same value of the likelihood.
a simple way to check for local identification at a given value of \(\boldsymbol \theta\) is to compute the Jacobian matrix of \(\mathbf{\Sigma}_{\boldsymbol Z}\) w.r.t \(\boldsymbol \theta\) and check that it has full rank.
Forecasting¶
Optimal forecast given information at \(T\):
Computing the variance of the forecast erros
Using that \(\mathbf{x}_{t}\) is VAR(1)
and
Therefore, the forecast error is
The MSE of \(\mathbf{x}_{T+h|T}\) is
and the MSE of \(\mathbf{z}_{T+h|T}\) is
Accounting for parameter uncertainty¶
So far, forecast errors were always computed assuming that parameters were known. However, they are estimated and parameter uncertainty is also present (it disappears only asymptotically)
total uncertainty
Evaluate \(\operatorname{E}\left(\mathbf{\Sigma}^{x}_{T+h|T} \right)\) by generating \(N\) draws from the asymptotic distribution of \(\hat{\theta}\) and compute the average value of \(\mathbf{\Sigma}^{x}_{T+h|T}\)
Evaluate \(\operatorname{E}\left([\mathbf{x}_{T+h|T}(\theta) - \mathbf{x}_{T+h|T} (\hat{\theta})][\mathbf{x}_{T+h|T}(\theta) - \mathbf{x}_{T+h|T} (\hat{\theta})]^{\prime} \right) \) as the average of \(\left([\mathbf{x}_{T+h|T}(\theta_i) - \mathbf{x}_{T+h|T} (\hat{\theta})][\mathbf{x}_{T+h|T}(\theta_i) - \mathbf{x}_{T+h|T} (\hat{\theta})]^{\prime} \right)\) using the \(N\) draws \(\theta_i\) from the asymptotic distribution of \(\hat{\theta}\)
Linear state space model¶
reduced-form (statistical) models
the parameters are the unrestricted elements of \(A\), \(C\), \(\mathbf{\Sigma}_{\varepsilon}\), \(\mathbf{\Sigma}_{\nu}\)
of little (no) interest on their ow
structural (theoretical) models
reduced-form model parameters are functions of (often much) smaller number of structural parameters \(\boldsymbol \theta\)
\(\boldsymbol \theta\) have economic meaning and are (or could be) of interest on their own
estimation is usually harder (non-linear functions)