State space models

Consider nx-dimensional VAR(1) process:

xt=Axt1+εt,εtWN(0,Σ)

However, some variables of xt are not observed, i.e. no data exists for them

We can define a matrix C that maps xt into a nz-dimensional zt collecting the observed variables in xt:

zt=Cxt

Example:

  • xt=[x1,t,x2,t]

  • x1,t - unobserved, x2,t - observed

zt=[0,1]C[x1,tx2,t]

The case where zt is a subset of xt is one example of linear state space model

xt=Axt1+εt,zt=Cxt+νt

which, in turn, is a special case of the class of (non-linear) state space models

xt=f(xt1,εt),zt=g(xt,νt)

Simplest example: AR(1) model with measurement error:

xt=αxt1+εt,zt=xt+νt

Gaussian linear state space model

xt=Axt1+εt,εtN(0,Σε)state (transition) equationzt=Cxt+νt,νtN(0,Σν)observation equationx0N(0,Σ0),initial state

Note 1: This is a time-invariant model. This can be relaxed with some or all of the matrices A, C, Σε, and Σν being functions of t.

Note 2: We can add an intercept in one or both of the state and observation equations.

Note 3: A can be a companion matrix, so the unobserved variables could follow a general VAR§

Note 4: εt and νt are assumed to be independent, but that can be relaxed.

Note 5: x0 is independent from all εt and νt

Autocovariances of zt

Γz(0)=cov(zt,zt)=cov(Cxt,Cxt)+Σν=CΓx(0)C+Σν
Γz(k)=cov(zt,ztk)=cov(Cxt,Cxtk)=CΓx(k)C

Note: We get Γx(0) and Γx(k) as in the last lecture

Stationarity of xt

x0N(0,Σ0),

requres

Σ0=Γx(0)

Marginal distribution of X=[x1,x2,,xT]

[x1x2x3x4xT]X=[AI00000A2AI0000A3A2AI000A4A3A2A000ATAT1AT2AT3A2AI]A[x0ε1ε2ε3εT1εT]E
XN(0,AΣEA)

Note: see HW4 part 1 for an alternative way to write the system. Check Efficient simulation and integrated likelihood estimation in state space models for applications of that approach.

Marginal distribution of Z=[z1,z2,,zT]

[z1z2zT]Z=[C000C000C]C[x1x2xT]X+[ν1ν2νT]V

Question: How would you compute C?

ZN(0,CΣXC+ΣV)

Applications:

  • likelihood function: distribution of Z

  • forecasting: distribution of zt+h, given Z1:t=[z1,z2,,zt]

Joint distribution of [X,Z]

[XZ]N(0,[ΣXΣXZΣZXΣZ])

Moments of the conditional distribution of X given Z

  • E(X|Z)=ΣXZΣZ1Z

  • cov(X|Z)=ΣXXΣXZΣZ1ΣZX

Note: The conditional variance of X given Z does not depend on the data Z

Moments of the conditional distribution

of Xt1:t2=[xt1,,xt2] given Zt3:t4=[zt3,,zt4]

{ΣXX}(t1:t2),(t1:t1),{ΣZZ}(t3:t4),(t3:t4),{ΣXZ}(t1:t2),(t3:t4){ΣZX}(t3:t4),(t1:t2)

Applications:

  • filtering: distribution of xt, given Z1:t=[z1,z2,,zt]

  • state prediction: distribution of xt+h, given Z1:t

  • smoothing: distribution of xt, given Z1:T

Kalman filter

Let

xt|t1=E(xt|Z1:t1),Σt|t1x=cov(xt|Z1:t1)xt|t=E(xt|Z1:t),Σt|tx=cov(xt|Z1:t)zt|t1=E(zt|Z1:t1),Σt|t1z=cov(zt|Z1:t1)

and

x0|0=0,Σ0|0x=Σ0
  • optimal one-step ahead forecast of x

xt|t1=Axt1|t1Σt|t1x=AΣt1|t1xA+Σε
  • optimal one-step ahead forecast of z

zt|t1=Cxt|t1Σt|t1z=AΣt|t1xA+Σν
  • optimal update of the forecast of x

xt|t=xt|t1+Kt(ztzt|t1)Σt|tx=Σt|t1xKtΣt|t1zKt

where Kt=Σt|t1xC(Σt|t1z)1 is called Kalman gain - it shows how to optimally update the forecast of xt after the new observation of zt is seen

Derivation for t=1

  • step 1 Compute joint distribution of [x1,z1] using

[x1z1]=[ACA]x0+[I0CA][ε1ν1]

Note that x0, ε1, and ν1 are independent

  • step 2 Compute the marginal distribution of x1 and z1

  • step 3 Compute the conditional distribution of x1 given z1

Derivation for any t: use induction: assume the optimal update formulae hold for t1 and show that the one-step ahead ones are true:

  • write [xt,zt] in terms of xt1, εt, and νt

  • using the assumed conditional distribution of xt1 given Z1:t1, compute the joint and marginal conditional disrtibutions of xt and zt given Z1:t1.

  • from the joint (conditional) distribution compute the conditional distribution of xt given zt. This will give you the conditional distribution of xt given Z1:t=[Z1:t1,zt]

Likelihood function with the Kalman filter

joint distribution of Z=[z1,z2,,zT]

p(Z;θ)=p(z0;θ)t=1Tp(zt|Z1:t1;θ)

where p(zt|Z1:t1;θ) is Gaussian with moments zt|t1 and Σt|t1z given by the Kalman Filter.

Note: equivalent to, but much more efficient than computing the joint distribution of Z as

ZN(0,CΣXC+ΣV)

Kalman smoother

xt1|T=xt1|t1+Jt1(xt|Txt|t1)Σt1|Tx=Σt1|t1x+Jt1(Σt|TxΣt|t1x)Jt1

where Jt1=Σt1|t1xA(Σt|t1x)1

Note 1: for t=T+1, xt1|T and Σt1|Tx are given by the Kalman fitler. After that, going backwards, all necessary objects are provided by the previous smoothing step, and by the Kalman filter

Note 2: equivalent to, but much more efficient than computing the block diagonal of conditional distribution of X given Z

Estimation

What are we estimating?

xt=Axt1+εt,εtN(0,Σε)state (transition) equationzt=Cxt+νt,νtN(0,Σν)observation equation

Collect the unknown parameters of A, C, Σε, Σν in θ

MLE

(θ|Z)=log(L(θ|Z)=p(Z;θ)
θ^=argmaxθ  (θ|Z)

Identification

xt=Axt1+εt,εtN(0,Σε)state (transition) equationzt=Cxt+νt,νtN(0,Σν)observation equation
  • If we replace xt with xt=Txt, εt with εt=Tεt, A with A=TAT1, C with C=CT1, and Σε with Σε=TΣεT the process for zt remains unchanged, and the likelihood function remain the same.

  • Therefore, unless there are (sufficient) restrictions on A, C, and Σε, their parameter cannot be identified - multiple values of θ imply the same value of the likelihood.

  • a simple way to check for local identification at a given value of θ is to compute the Jacobian matrix of ΣZ w.r.t θ and check that it has full rank.

Forecasting

Optimal forecast given information at T:

E(zT+1|Z)=CE(xT+1|Z)=CxT+1|T=CAxT|TE(zT+h|Z)=CAhxT|T

Computing the variance of the forecast erros

Using that xt is VAR(1)

xT+h=AhxT+Ah1εT+1++AεT+h1+εT+h

and

xT+h|T=AhxT|T

Therefore, the forecast error is

xT+hxT+h|T=Ah(xTxT|T)+Ah1εT+1++AεT+h1+εT+h

The MSE of xT+h|T is

ΣT+h|Tx=AhΣT|Tx(Ah)+Ah1Σε(Ah1)++AΣεA+Σε

and the MSE of zT+h|T is

ΣT+h|Tz=CΣT+h|TxC+Σν

Accounting for parameter uncertainty

So far, forecast errors were always computed assuming that parameters were known. However, they are estimated and parameter uncertainty is also present (it disappears only asymptotically)

total uncertainty

E([xT+hxT+h|T(θ^)][xT+hxT+h|T(θ^)])=E([xT+hxT+h|T(θ)][xT+hxT+h|T(θ)])+E([xT+h|T(θ^)xT+h|T(θ^)][xT+h|T(θ^)xT+h|T(θ^)])=E(ΣT+h|Tx)+E([xT+h|T(θ)xT+h|T(θ^)][xT+h|T(θ)xT+h|T(θ^)])

Linear state space model

xt=Axt1+εt,εtN(0,Σε)zt=Cxt+νt,νtN(0,Σν)x0N(0,Σ0),

reduced-form (statistical) models

  • the parameters are the unrestricted elements of A, C, Σε, Σν

  • of little (no) interest on their ow

structural (theoretical) models

  • reduced-form model parameters are functions of (often much) smaller number of structural parameters θ

  • θ have economic meaning and are (or could be) of interest on their own

  • estimation is usually harder (non-linear functions)

Non-linear state space models in Python

docs