Multivariate time series

  • univariate - temporal dependence between components of a single series

  • multivariate - temporal (inter)-dependence between components of different series

Definition: A time series process is a sequence of random vectors indexed by time:

\[ \{\mathbf{z}_t: t =~ ..., -2, -1, ~0,~1, ~2,~ ... \} = \{\mathbf{z}_t \}_{t=-\infty}^{\infty} \tag{1} \]

where \(\mathbf{z}_t\) is \(n \geq 1\)-dimensional vector


Definition: The process \(\{\mathbf{z}_t\}_{t=-\infty}^{\infty}\) is covariance stationary if the first two moments of the joint distribution exist and are time-invariant

  • mean

\[\begin{split} \operatorname{E}\mathbf{z}_t = \mathbf{\mu}_t = \begin{bmatrix} \mu_{1t} \\ \mu_{2t} \\\vdots \\\mu_{nt} \end{bmatrix} \end{split}\]
  • covariance

\[\begin{split} \begin{align} \operatorname{cov}(\mathbf{z}_t, \mathbf{z}_{t-k}) & = \operatorname{E}(\mathbf{z}_t - \mathbf{\mu}_t)(\mathbf{z}_{t-k} - \mathbf{\mu}_{t-k} )'\\\\ & = \Gamma(t, t-k) = \begin{pmatrix} \gamma_{11}(t, t-k) & \cdots & \gamma_{1n}(t, t-k)\\ \vdots & \ddots & \vdots\\ \gamma_{n1}(t, t-k) & \cdots & \gamma_{nn}(t, t-k)\\ \end{pmatrix} \end{align} \end{split}\]
  • \(\{\mathbf{z}_t\}_{t=-\infty}^{\infty}\) is covariance stationary if

\[\begin{split} \begin{align} \mathbf{\mu}_t & = \mathbf{\mu} \\ \Gamma(t, t-k) & = \Gamma(k) \end{align} \end{split}\]

are not functions of \(t\)


\[ \Gamma(k) = \operatorname{cov}(\mathbf{z}_t, \mathbf{z}_{t-k}) = \operatorname{cov}(\mathbf{z}_{t+k}, \mathbf{z}_{t}) \neq \operatorname{cov}(\mathbf{z}_{t}, \mathbf{z}_{t+k}) = \Gamma(-k) \]
  • \(\Gamma(k)\) is not symmetric (unless \(k=0\))

  • but since \(\operatorname{cov}(\mathbf{z}_{t+k}, \mathbf{z}_{t})' = \operatorname{cov}(\mathbf{z}_{t}, \mathbf{z}_{t+k})\)

\[ \Gamma(k) = \Gamma(-k)' \]

follows from (set \(\mathbf{\mu}=0\) w.l.g.)

\[\Gamma(k) =\operatorname{cov}(\mathbf{z}_{t+k}, \mathbf{z}_{t}) = \operatorname{E}(\mathbf{z}_{t+k} \mathbf{z}_{t}')= \operatorname{E}(\mathbf{z}_{t}\mathbf{z}_{t+k}')' = \left(\operatorname{cov}(\mathbf{z}_{t}, \mathbf{z}_{t+k}) \right)'=\Gamma(-k)' \]


\[\begin{split}\mathbf{Z}_T = \begin{bmatrix}\mathbf{z}_{1}\\ \mathbf{z}_{2}\\ \vdots\\ \mathbf{z}_{T}\end{bmatrix}\end{split}\]


\[\begin{split}\operatorname{E}(\mathbf{Z}_T) = \begin{bmatrix}\mathbf{\mu}\\ \mathbf{\mu}\\ \vdots\\ \mathbf{\mu}\end{bmatrix},\;\;\;\;\; \operatorname{cov}(\mathbf{Z}_T) = \begin{pmatrix} \Gamma(0) & \Gamma(1)' & \cdots & \Gamma(T-1)'\\ \Gamma(1) & \Gamma(0) & \cdots & \Gamma(T-2)'\\ \vdots & \vdots & \vdots& \vdots\\ \Gamma(T-1) & \Gamma(T-2) & \cdots & \Gamma(0)\\ \end{pmatrix} \;\;\; (\text{symmetric block Toeplitz matrix}) \end{split}\]
  • Multivariate Gaussian tme series

\[ \mathbf{Z}_T \sim \mathcal{N} \left(\boldsymbol \mu, \boldsymbol \Sigma \right) \]
\[\begin{split} \begin{bmatrix}\mathbf{z}_{1}\\ \mathbf{z}_{2}\\ \vdots\\ \mathbf{z}_{T}\end{bmatrix} \sim \mathcal{N} \left( \begin{bmatrix}\mathbf{\mu}\\ \mathbf{\mu}\\ \vdots\\ \mathbf{\mu}\end{bmatrix}, \begin{pmatrix} \Gamma(0) & \Gamma(1)' & \cdots & \Gamma(T-1)'\\ \Gamma(1) & \Gamma(0) & \cdots & \Gamma(T-2)'\\ \vdots & \vdots & \vdots& \vdots\\ \Gamma(T-1) & \Gamma(T-2) & \cdots & \Gamma(0)\\ \end{pmatrix} \right) \end{split}\]

Number of unique parameters

\[\begin{split} \begin{align} \Gamma(0) & : n(n+1)/2 \\ &+\\ \Gamma(1) & : n^2 \\ &+\\ & \vdots \\ &+\\ \Gamma(T-1) & : n^2 \end{align} \end{split}\]

with \(n = 5, T = 200 => 4990\)

  • time series models allow to represent temporal (inter)dependence parsimoniously

  • by imposing restrictions - reducing the number of unique parameters

  • making estimation feasible

VARMA(p, q) model

\[\begin{split} \begin{align} A(L) \mathbf{z}_t &= B(L) \boldsymbol \varepsilon_t, \;\;\;\;\; \boldsymbol \varepsilon_t \sim \operatorname{WN} \left( 0, \;\mathbf{\Sigma}\right) \;\;\; (\text{vector white noise, i.e. } \operatorname{E}(\boldsymbol \varepsilon_t, \boldsymbol \varepsilon_{t-k}') = 0)\\\\ A(L) &= I - A_1 L - \cdots - A_p L^p \\ B(L) &= I + B_1 L + \cdots + B_q L^q \\ \end{align} \end{split}\]

zero auto and cross-auto correlations of the innovations

\[\begin{split} A_i = \begin{pmatrix} a_{11, i} & \cdots & a_{1n, i}\\ \vdots & \ddots & \vdots\\ a_{n1, i} & \cdots & a_{nn, i}\\ \end{pmatrix}, \;\;\; B_j = \begin{pmatrix} b_{11, j} & \cdots & b_{1n, j}\\ \vdots & \ddots & \vdots\\ b_{n1, j} & \cdots & b_{nn, j}\\ \end{pmatrix} \end{split}\]
  • \(n^2(p+q) + n(n+1)/2\) unknown parameters

  • in general, difficult to estimate

    • not all parameters are identified

    • have to use numerical optimization

VAR§ model

\[\begin{split} \begin{align} \mathbf{z}_t = A_1 \mathbf{z}_{t-1} + \cdots + A_p \mathbf{z}_{t-p} + \boldsymbol \varepsilon_t, \;\;\;\;\; \boldsymbol \varepsilon_t \sim \operatorname{WN} \left( 0, \;\mathbf{\Sigma}\right)\\\\ \end{align} \end{split}\]
\[\begin{split} \begin{align} A(L) \mathbf{z}_t &= \boldsymbol \varepsilon_t, \;\;\;\;\; \boldsymbol \varepsilon_t \sim \operatorname{WN} \left( 0, \;\mathbf{\Sigma}\right)\\\\ A(L) = I & - A_1 L - \cdots - A_p L^p \\ \end{align} \end{split}\]
  • \(\mathbf{\Sigma}_{ij}\) captures all contemporaneous (time \(t\)) relationships between \(z_i\) and \(z_j\)

  • \(\boldsymbol \{A_k\}_{ij}\) captures all dynamic interactions between \(z_{it}\) and \(z_{j, t-k}\)


VAR§ is stationary if all roots of the equation

\[| A(x)| = | I - A_1 x - \cdots - A_p x^p|=0\]

are outside the unit circle (\(|x|>1\))

VAR(1) representation of VAR§ process

\[ \mathbf{Z}_{t} = \boldsymbol \Phi \mathbf{Z}_{t-1} + \boldsymbol E_t \]
\[\begin{split} \underset{\mathbf{Z}_t}{\underbrace{\left[\begin{array}{c} \mathbf{z}_{t}\\ \mathbf{z}_{t-1}\\ \vdots\\ \mathbf{z}_{t-p+1} \end{array}\right]}} = \underset{ \boldsymbol \Phi}{\underbrace{\left[\begin{array}{cccccccc} A_1 & A_2 & \cdots & A_{p-1} & A_p\\ I & 0 & \cdots & 0 & 0\\ \vdots & \vdots & \vdots & \vdots \\ 0 & 0 & \cdots & I & 0 \end{array}\right]}} \underset{\mathbf{Z}_{t-1}}{\underbrace{\left[\begin{array}{c} \mathbf{z}_{t-1}\\ \mathbf{z}_{t-2}\\ \vdots\\ \mathbf{z}_{t-p} \end{array}\right]}} + \underset{ \boldsymbol E_{t}}{\underbrace{\left[\begin{array}{c} \varepsilon_{t}\\ 0\\ \vdots\\ 0 \end{array}\right]}} \end{split}\]
  • \( \boldsymbol \Phi\) - companion matrix

\(\mathbf{Z}_{t}\) is stationary if all roots \(x\) of

\[|\boldsymbol I - \boldsymbol \Phi x|=0\]

are outside the unit circle (\(|x|>1\)), which is equivalent to all solutions of

\[|\boldsymbol I \lambda - \boldsymbol \Phi|=0\]

being \(|\lambda|<1\), i.e. all eigenvalues of \(\boldsymbol \Phi\) being less than 1 in absolute value.

Eigenvalue decomposition

  • \( \Lambda\) - diagonal matrix of the eigenvalues of \( A\)

  • \( V\) - matrix of the eigenvectors of \( A\)


\[ A V = V \Lambda \]

we have

\[\begin{split} \begin{align} A & = V \Lambda V^{-1}\\ A^2 & = A A = V \Lambda V^{-1} V \Lambda V^{-1} = V \Lambda^2 V^{-1}\\ A^h & = V \Lambda^h V^{-1}\\ \end{align} \end{split}\]

if for all eigenvalues \(|\lambda_i| < 1\),

\[ \Lambda^h \longrightarrow 0 \;\; \text{ and } \;\; A^h \longrightarrow 0 \;\;\; \text{as } h \longrightarrow \infty\]
  • similar to \(|\alpha|<1\) in \(z_t = \alpha z_{t-1} + \varepsilon_t\)

VAR(1) process

\[ \mathbf{z}_{t} = A \mathbf{z}_{t-1} + \boldsymbol \varepsilon_{t} \]

From VAR parameters to moments of \(\mathbf{z}_t\)

  • what are the VAR parameters?

  • mean

\[\begin{split} \begin{align} \operatorname{E} \mathbf{z}_{t} &= A \operatorname{E} \mathbf{z}_{t-1} + \operatorname{E} \boldsymbol \varepsilon_{t}\\ & = 0 \end{align} \end{split}\]
  • covariance

\[\mathbf{z}_{t} = A \mathbf{z}_{t-1} + \boldsymbol \varepsilon_{t} \;\; \Rightarrow \;\; \mathbf{z}_{t}\mathbf{z}_{t}^{\prime} = \left(A \mathbf{z}_{t-1} + \boldsymbol \varepsilon_{t}\right)\left(A \mathbf{z}_{t-1} + \boldsymbol \varepsilon_{t}\right)^{\prime}\]

Since \(\operatorname{E}(\mathbf{z}_{t-1} \varepsilon_{t}') = 0 \)

\[ \operatorname{E}(\mathbf{z}_{t} \mathbf{z}'_{t}) = A \operatorname{E}(\mathbf{z}_{t-1} \mathbf{z}^{\prime}_{t-1}) A' + \operatorname{E}( \varepsilon_{t} \varepsilon_{t}') \]


\[\begin{split} \begin{align} \Gamma(0) =\; & A \Gamma(0) A ' + \Sigma\\\\ \;\;\; &\text{and}\\\\ \operatorname{vec}\left(\Gamma(0) \right) =\; & \left( I - A \otimes A \right)^{-1} \operatorname{vec}( \Sigma)\\ \end{align} \end{split}\]
  • follows from \(\operatorname{vec}(ABC) = (C' \otimes A) \operatorname{vec}(B)\)

  • autocovariances

\[ \operatorname{E}(\mathbf{z}_{t} \mathbf{z}'_{t-1}) = A \operatorname{E}(\mathbf{z}_{t-1} \mathbf{z}^{\prime}_{t-1}) + \operatorname{E}( \varepsilon_{t} \mathbf{z}_{t-1}') \]
  • lag 1

\[ \Gamma(1) = A \Gamma(0) \]
  • lag 2

\[ \Gamma(2) = A \Gamma(1) = A^2 \Gamma(0) \]
  • lag h

\[ \Gamma(h) = A^h \Gamma(0) \]

From moments of \(\mathbf{z}_t\) to VAR parameters

\[\begin{split} \begin{align} \Gamma(1) = A \Gamma(0) \;\;\; & \Rightarrow \;\;\; A = \Gamma(1)\Gamma(0)^{-1} \\\\ \Gamma(0) = A \Gamma(0) A ' + \Sigma \;\;\; & \Rightarrow \;\;\; \Sigma = \Gamma(0)-\Gamma(1)\Gamma(0)^{-1} \Gamma(1)' \end{align} \end{split}\]

Non-zero mean \(\operatorname{E} \mathbf{z}_t = \mathbf{\mu} \neq 0\)

\[\begin{split} \begin{align} \mathbf{z}_{t} & = \boldsymbol a_0 + A \mathbf{z}_{t-1} + \varepsilon_{t}\\\\ \mathbf{\mu} & = \boldsymbol a_0 + A \mathbf{\mu}\\\\ \mathbf{\mu} & = \left( I - A \right)^{-1} \boldsymbol a_0 \end{align} \end{split}\]


\[\operatorname{E} \bar{\mathbf{z}}_t = \operatorname{E} (\mathbf{z}_t- \mathbf{\mu}) = 0\]

Note For the moments of a VAR§ model, use the VAR(1) representation, and apply the selection matrix

\[\boldsymbol s = \left[ I, 0, \cdots, 0 \right] \]

to obtain the autocovariances of \(\mathbf{z}_{t}\) from the autocovariances of \(\mathbf{Z}_{t}\)


\[ \mathbf{z}_{t} = \boldsymbol s \mathbf{Z}_{t}\]
  • \( \operatorname{E} \mathbf{z}_{t} = \boldsymbol s \operatorname{E} \mathbf{Z}_{t}\)

  • \( \operatorname{var} (\mathbf{z}_{t}) = \boldsymbol s \operatorname{var} (\mathbf{Z}_{t}) \boldsymbol s'\)

  • \( \operatorname{cov} (\mathbf{z}_{t}, \mathbf{z}_{t-k}) = \boldsymbol s \operatorname{cov} (\mathbf{Z}_{t}, \mathbf{Z}_{t-k}) \boldsymbol s'\)


We can write VAR§

\[ \begin{align} \mathbf{z}_t = \boldsymbol a_0 + A_1 \mathbf{z}_{t-1} + \cdots + A_p \mathbf{z}_{t-p} + \boldsymbol \varepsilon_t, \;\;\;\;\; \boldsymbol \varepsilon_t \sim \operatorname{WN} \left( 0, \;\mathbf{\Sigma}\right) \end{align} \]


\[\mathbf{z}_t = \boldsymbol A \boldsymbol x_{t-1} + \boldsymbol \varepsilon_t \]

where \(\boldsymbol A = [\boldsymbol a_0, A_1, \cdots, A_p ]\), and \(\boldsymbol x_{t-1} = [1, \mathbf{z}_{t-1}', \cdots, \mathbf{z}_{t-p}' ]'\)

Assume a pre-sample \(\mathbf{z}_{0}, \mathbf{z}_{1}, \cdots, \mathbf{z}_{-p+1}\) is given. (alternatively, re-define \(T\))

Then, we have

\[\boldsymbol Z = \boldsymbol A \boldsymbol X + \boldsymbol U \]


  • \(\boldsymbol Z = [\mathbf{z}_{1}, \cdots, \mathbf{z}_T] \) is \(n \times T\)

  • \(\boldsymbol X = [\mathbf{x}_{0}, \cdots, \mathbf{x}_{T-1}] \) is \(n(p+1) \times T\)

  • \(\boldsymbol U = [\boldsymbol \varepsilon_{1}, \cdots, \boldsymbol \varepsilon_T] \) is \(n \times T\)

OLS estimation

\[\begin{split} \begin{align} \hat{\boldsymbol A} &= \boldsymbol Z \boldsymbol X' (\boldsymbol X \boldsymbol X')^{-1} \\\\ \hat{\boldsymbol \Sigma} & = \frac{1}{T - np - 1} \hat{\boldsymbol U} \hat{\boldsymbol U}' \\\\ \hat{\boldsymbol U} & = \boldsymbol Z - \hat{\boldsymbol A} \boldsymbol X \end{align} \end{split}\]

Asymptotic distribution

\[ \operatorname{vec}\left(\hat{\boldsymbol A} \right) \overset{a}{\sim} \mathcal{N} \left( \operatorname{vec}\left(\boldsymbol A \right),\; (\boldsymbol X \boldsymbol X')^{-1} \otimes \hat{\boldsymbol \Sigma} \right) \]


  • the rows of \(\boldsymbol A\) can be estimated with OLS equation by equation

  • also equivalent to conditional MLE, assuming that \(\boldsymbol \varepsilon_t \sim \mathcal{N} \left( 0, \;\mathbf{\Sigma}\right)\)

  • \(\hat{\boldsymbol A}\) has a small-sample bias, which can be corrected for analytically (when the VAR has only intercept) or using bootstrap (when a deterministic trand is included)

Choice of \(p\)

  • define a set of models - select \(p_{min}\) and \(p_{max}\)

  • estimate each one and compute IC§

  • pick the one with lowest IC§ value

Most commonly used ICs:

\[\begin{split} \begin{align} AIC &= \operatorname{ln}|\hat{\boldsymbol \Sigma}^{ml}(p)| + \frac{2}{T}(pn^2 + n) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \text{Akaike’s Information Criterion} \\ BIC &= \operatorname{ln}|\hat{\boldsymbol \Sigma}^{ml}(p)| + + \frac{\operatorname{ln}(T)}{T}(pn^2 + n) \;\;\;\;\;\;\;\;\; \text{Bayesian Information Criterion} \end{align} \end{split}\]


  • For ICs to be comparable for different \(p\), the sample has to be the same (set \(t = p_{max}+1,\cdots, T\))

  • \(\hat{\boldsymbol \Sigma}^{ml}(p) = \frac{1}{T} \hat{\boldsymbol U} \hat{\boldsymbol U}' = \frac{T-np-1}{T} \hat{\boldsymbol \Sigma}^{ols}(p)\)

  • typically, \(p_{min}=12\) for monthly and \(p_{min}=4\) for quarterly data


\[\begin{split} \begin{align} \mathbf{z}_t &= A \mathbf{z}_{t-1} + \boldsymbol \varepsilon_t, \\\\ \mathbf{z}_{t+1} &= A \mathbf{z}_{t} + \boldsymbol \varepsilon_{t+1}\\\\ \mathbf{z}_{t+2} &= A^2 \mathbf{z}_{t} + A \boldsymbol \varepsilon_{t+1}+ \boldsymbol \varepsilon_{t+2}\\ &\;\vdots \\ \mathbf{z}_{t+h} &= A^h \mathbf{z}_{t} + A^{h-1} \boldsymbol \varepsilon_{t+1} + \cdots + \boldsymbol \varepsilon_{t+h} \end{align} \end{split}\]

Optimal forecast given information at \(T\):

\[\begin{split} \begin{align} \operatorname{E}(\mathbf{z}_{T+1} | \mathbf{z}_{T} ) & = A \mathbf{z}_{T}\\ \operatorname{E}(\mathbf{z}_{T+h} | \mathbf{z}_{T} ) & = \boldsymbol A^h \mathbf{z}_{T} \end{align} \end{split}\]

Optimal forecast given information at \(T+1\):

\[\begin{split} \begin{align} \operatorname{E}(\mathbf{z}_{T+h} | \mathbf{z}_{T+1} ) & = \boldsymbol A^{h-1} \mathbf{z}_{T+1}\\ & = \boldsymbol A^{h-1} \left( A\mathbf{z}_{T} + \boldsymbol \varepsilon_{T+1}\right)\\ & = \boldsymbol A^{h}\mathbf{z}_{T} + A^{h-1} \boldsymbol \varepsilon_{T+1} \\ & = \operatorname{E}(\mathbf{z}_{T+h} | \mathbf{z}_{T} ) + A^{h-1} (\mathbf{z}_{T+1} - A \mathbf{z}_{T} )\\ & = \operatorname{E}(\mathbf{z}_{T+h} | \mathbf{z}_{T} ) + A^{h-1} (\mathbf{z}_{T+1} - \operatorname{E}(\mathbf{z}_{T+1} | \mathbf{z}_{T} ) )\\ \end{align} \end{split}\]

Optimal forecast update:

\[ \operatorname{E}(\mathbf{z}_{T+h} | \mathbf{z}_{T+1} ) - \operatorname{E}(\mathbf{z}_{T+h} | \mathbf{z}_{T} ) = A^{h-1} \underbrace{(\mathbf{z}_{T+1} - \operatorname{E}(\mathbf{z}_{T+1} | \mathbf{z}_{T} ) )}_{\text{1-step ahead forecast error}} \]

\(h\)-step-ahead forecast error:

\[ \mathbf{z}_{T+h} - \operatorname{E}(\mathbf{z}_{T+h} | \mathbf{z}_{T} ) = A^{h-1} \boldsymbol \varepsilon_{t+1} + \cdots + A \boldsymbol \varepsilon_{t+h-1} + \boldsymbol \varepsilon_{t+h} \]


\[\begin{split} \begin{align} \operatorname{E} \left(\mathbf{z}_{T+h} - \operatorname{E}(\mathbf{z}_{T+h} | \mathbf{z}_{T} )\right) &= 0 \\ \operatorname{cov} \left(\mathbf{z}_{T+h} - \operatorname{E}(\mathbf{z}_{T+h} | \mathbf{z}_{T} )\right) &= A^{h-1} \Sigma (A^{h-1})' + \cdots + A \Sigma A' + \Sigma \end{align} \end{split}\]

Note: as in the univariate case, we can write VAR\((p)\) as VMA\((\infty)\). For VAR(1)

\[\begin{split}\begin{align} \mathbf{z}_t &= A(L)^{-1} \boldsymbol \varepsilon_t\\ &= \varepsilon_t + A\varepsilon_{t-1} + A^2\varepsilon_{t-2} + \cdots \end{align} \end{split}\]


\[ \operatorname{cov} (\mathbf{z}_{t}) = \Gamma(0) = \Sigma + A \Sigma A' + A^2 \Sigma (A^2)' + \cdots \]

As \(h \longrightarrow \infty\)

\[\begin{split} \begin{align} \operatorname{E}(\mathbf{z}_{T+h} | \mathbf{z}_{T} ) &\longrightarrow 0 \;\;\;(\text{unconditional mean of } \mathbf{z}_{t} ) \\ \operatorname{cov} \left(\mathbf{z}_{T+h} - \operatorname{E}(\mathbf{z}_{T+h} | \mathbf{z}_{T} )\right) & \longrightarrow \Gamma(0) \;\;\;(\text{unconditional covariance of } \mathbf{z}_{t} ) \end{align} \end{split}\]

For VAR§ - use the VAR(1) representation

\[ \mathbf{Z}_{t} = \boldsymbol \Phi \mathbf{Z}_{t-1} + \boldsymbol E_t \]
  • forecast of \(\mathbf{z}_{T+h}\) (using the selection matrix \(\boldsymbol s\))

\[\operatorname{E}(\mathbf{z}_{T+h} | \mathbf{Z}_{T} ) = \boldsymbol s \operatorname{E}(\mathbf{Z}_{T+h} | \mathbf{Z}_{T} ) \]
  • variance of forecast errors

\[ \operatorname{cov} \left(\mathbf{z}_{T+h} - \operatorname{E}(\mathbf{z}_{T+h} | \mathbf{Z}_{T} )\right) = \boldsymbol s\operatorname{cov} \left(\mathbf{Z}_{T+h} - \operatorname{E}(\mathbf{Z}_{T+h} | \mathbf{Z}_{T} )\right)\boldsymbol s' \]

Impulse response functions

From the VMA representation of of VAR(1) model

\[\begin{align} \mathbf{z}_{t+h} = \mathbf{\varepsilon}_{t+h} + A \mathbf{\varepsilon}_{t+h-1} + A^2 \mathbf{\varepsilon}_{t+h-2} + \cdots + A^h \mathbf{\varepsilon}_{t} + \cdots \end{align}\]

we have

\[ \frac{\partial \mathbf{z}_{t+h} }{\partial \mathbf{\varepsilon}_{t}} = A^h \]
  • Note that \(\mathbf{\varepsilon}_{t}\) is a vector

  • typically, we want to know the effect of a shock on a variable (for example monetary policy on inflation)

  • here \(\operatorname{cov}(\mathbf{\varepsilon}_t) = \Sigma\), i.e. \(\varepsilon_{it}\) and \(\varepsilon_{jt}\) are correlated

  • \(\mathbf{\varepsilon}_t\) are not shocks (statistical innovations, residuals, forecast errors)

(orthogonalized ) impulse response functions

Since \(\Sigma\) is positive definite matrix, there exists a matrix \(B_0\) such that

\[B_0^{-1} (B_0^{-1})^{\prime} = \Sigma \;\;\; \Rightarrow \;\;\; B_0\Sigma B_0' = I\]


\[ \mathbf{u}_t = B_0 \mathbf{\varepsilon}_t \;\;\; \Rightarrow \;\;\; \mathbf{u}_t \sim \operatorname{WN} \left( 0, \;I\right) \]
  • \(u_{it}\) and \(u_{jt}\) are uncorrelated (orthogonal) for all \(i \neq j\)

Using \(\mathbf{\varepsilon}_t = B_0^{-1} \mathbf{u}_t\) in the MA representation

\[\begin{split}\begin{align} \mathbf{z}_{t+h} & = \mathbf{\varepsilon}_{t+h} + A \mathbf{\varepsilon}_{t+h-1} + A^2 \mathbf{\varepsilon}_{t+h-2} + \cdots + A^h \mathbf{\varepsilon}_{t} + \cdots \\\\ & = B_0^{-1} \mathbf{u}_{t+h} + A B_0^{-1} \mathbf{u}_{t+h-1} + A^2 B_0^{-1} \mathbf{u}_{t+h-2} + \cdots + A^h B_0^{-1} \mathbf{u}_{t} + \cdots \end{align} \end{split}\]

and therefore

\[ \frac{\partial \mathbf{z}_{t+h} }{\partial \mathbf{u}_{t}} = A^h B_0^{-1} \equiv \Psi_h \]

Since \(u_{it}\) and \(u_{jt}\) are uncorrelated, the \(k,l\) element of \(\Psi_h\) gives the response of \(z_{k,t+h}\) to a (one standard deviation shock to \(u_{l,t})\)

\[ \frac{\partial z_{k,t+h} }{\partial u_{lt}} = \psi_{kl,h} \]

and the \(l\) column of \(\Psi_h\) gives the response of \(\mathbf{z}_{t+h}\) to a one standard deviation shock to \(u_{l,t}\)

\[ \frac{\partial \mathbf{z}_{t+h} }{\partial u_{lt}} = \boldsymbol \psi_{l,h} \]

\(\boldsymbol \psi_{l,h}\) is the \(l\)-th column of \(A^h B_0^{-1}\)

From VAR§ to Structural VAR§ (and vice versa)

\[\begin{split} \begin{align} \mathbf{z}_t & = \boldsymbol a_0 + A_1 \mathbf{z}_{t-1} + \cdots + A_p \mathbf{z}_{t-p} + \boldsymbol \varepsilon_t, \;\;\;\;\; \boldsymbol \varepsilon_t \sim \operatorname{WN} \left( 0, \;\mathbf{\Sigma}\right)\\\\ \mathbf{z}_t & = \boldsymbol a_0 + A_1 \mathbf{z}_{t-1} + \cdots + A_p \mathbf{z}_{t-p} + B_0^{-1}\boldsymbol u_t\;\;\;\;\; (\text{using } \mathbf{u}_t = B_0 \mathbf{\varepsilon}_t) \\ & \downarrow\\ B_0\mathbf{z}_t & = B_0\boldsymbol a_0 + B_0 A_1 \mathbf{z}_{t-1} + \cdots + B_0 A_p \mathbf{z}_{t-p} + \mathbf{u}_t, \;\;\;\;\; (\text{pre-multiply by } B_0)\\\\ B_0\mathbf{z}_t & = \boldsymbol b_0 + B_1 \mathbf{z}_{t-1} + \cdots + B_p \mathbf{z}_{t-p} + \mathbf{u}_t, \;\;\;\;\; \boldsymbol u_t \sim \operatorname{WN} \left( 0, \;I\right) \end{align} \end{split}\]
  • \(B_0\) captures contemporaneous (time \(t\)) interactions among variables

  • \(\mathbf{u}_t\) are orthogonal shocks: \(u_{i, t}\) only affects \(z_{i, t}\) contemporaneously

  • impulse responses

\[ \frac{\partial \mathbf{z}_{t+h} }{\partial u_{it}} \]


  • We can estimate the reduced-form coefficients \(a_0\), \(A_1\), … \(A_p\) and \(\Sigma\)

  • and compute reduced-form MA representation of \(\mathbf{z}\)

  • to compute impulse responses to structura shocks we need \(B_0^{-1}\)

  • \(B_0^{-1}\) is not identified from

\[B_0^{-1} (B_0^{-1})^{\prime} = \Sigma \]

by symmetry \(\Sigma\) has \(n(n+1)/2\) unique elements \(\Rightarrow n(n+1)/2\) equations, but \(B_0\) has \(n^2\) unknown elements

  • need to impose restrictions on either \(B_0\) or \(B_0^{-1}\) in order to identify it

  • the restrictions must be implied by economic theory.

Types of identifying restrictions

  • short-run restrictions

  • long-run restrictions

  • sign restrictions

  • combinations of short/long-run restrictions, sign restrictions

Short-run restrictions:

  • time \(t\) impact of structural shocks \(\mathbf{\varepsilon}_t = B_0^{-1} \mathbf{u}_t\)

\[\begin{split} \left[\begin{array}{c} \varepsilon_{1, t}\\ \varepsilon_{2, t}\\ \vdots\\ \varepsilon_{n, t} \end{array}\right] = \underset{ B_0^{-1}}{\underbrace{\left[\begin{array}{cccccccc} b^{1,1}_0 & b^{1,2}_0 & \cdots & b^{1,n}_0\\ b^{2,1}_0 & b^{2,2}_0 & \cdots & b^{2,n}_0\\ \vdots & \vdots & \vdots \\ b^{n,1}_0 & b^{n,2}_0 & \cdots & b^{n,n}_0\\ \end{array}\right]}} \left[\begin{array}{c} u_{1, t}\\ u_{2, t}\\ \vdots\\ u_{n, t} \end{array}\right] \end{split}\]
  • time \(t\) interactions among variables \(B_0\mathbf{z}_t = \cdots + \mathbf{u}_t,\)

\[\begin{split} \underset{B_0}{\underbrace{\left[\begin{array}{cccccccc} b_{0,11} & b_{0,12} & \cdots & b_{0,1n}\\ b_{0,21} & b_{0,22} & \cdots & b_{0,2n}\\ \vdots & \vdots & \vdots \\ b_{0,n1} & b_{0,n2} & \cdots & b_{0,nn}\\ \end{array}\right]}} \left[\begin{array}{c} z_{1, t}\\ z_{2, t}\\ \vdots\\ z_{n, t} \end{array}\right] = \cdots + \left[\begin{array}{c} u_{1, t}\\ u_{2, t}\\ \vdots\\ u_{n, t} \end{array}\right] \end{split}\]

Long-run restrictions:

MA representation of \(\mathbf{z}_t\)

\[\begin{split}\begin{align} \mathbf{z}_{t+h} & = \mathbf{\varepsilon}_{t+h} + A \mathbf{\varepsilon}_{t+h-1} + A^2 \mathbf{\varepsilon}_{t+h-2} + \cdots + A^h \mathbf{\varepsilon}_{t} + \cdots \\\\ & = B_0^{-1} \mathbf{u}_{t+h} + A B_0^{-1} \mathbf{u}_{t+h-1} + A^2 B_0^{-1} \mathbf{u}_{t+h-2} + \cdots + A^h B_0^{-1} \mathbf{u}_{t} + \cdots \end{align} \end{split}\]

The cumulative impuse responses of shocks in \(t\) on \(\mathbf{z}_{t}\), \(\mathbf{z}_{t+1}\), … are given by

\[ (I + A + A^2 + A^3 + \cdots ) B^{-1} = A(1)^{-1} B_0^{-1} \]
  • if \(\mathbf{z}_{t}\) contains growth rates (e.g. of GDP), the cumulative response givs the permanent effect on the level

  • common long-run restrictions: some shocks don’t have permanent effect on some variables (nominal shocks on real variabls)

    • some elements of \(A(1)^{-1} B_0^{-1}\) are 0

Sign restrictions:

If \(B_0\) is such that

\[B_0^{-1} (B_0^{-1})^{\prime} = \Sigma \]

then for any orthogonal matrix \(Q\) (\( QQ^{\prime} =Q^{\prime}Q = I \))

we have

\[(B_0^{-1} Q) (B_0^{-1} Q)^{\prime} = \Sigma \]

There are inifitely many such matrices.

  • find the set of solutions that satisfy sign restrictions implied by theory (monetary policy shock raises \(i_t\) and lowers \(\pi_t\) and \(y_t\))

  • find all matrices \(Q\) such that \(B_0^{-1} = P Q \) meets those restrictions, where \(P\) is the Cholesky factor of \(\Sigma\), and \(Q\) is orthogonal matrix.

    • every real-valued symmetric positive-definite matrix has a unique Cholesky decomposition:

\[\Sigma = P P^{\prime}\]

There are different ways to generate \(Q\)

For example, for \(n=2\) candidates \(Q\) can be generated using

\[\begin{split} Q = \left[\begin{array}{cc} \cos(\theta) & -\sin(\theta)\\ \sin(\theta) & \cos(\theta) \\ \end{array}\right]\end{split}\]

and \(\theta \in (0, 2 \pi)\)

Can also use the QR decomposition of a random matrix \(H\) such that \(H_{ij} \sim N(0, 1)\)

  • with sign restrictions we get a set of impulse responses for each shock and variable (set vs point identfication)