Multivariate Normal Distribution

Time series model

  • a specification of the joint distribution of \(\{z_𝑡\}\)

Gaussian time series model

\[ p(z_1, z_2, \cdots, z_T) = \mathcal{N}(\boldsymbol \mu, \mathbf{\Sigma}). \]
  • probability density:

\[ p\left(\boldsymbol z;\mu,\Sigma\right)=\left(2\pi\right)^{-\left(\frac{T}{2}\right)}\det\left(\Sigma\right)^{-\frac{1}{2}}\exp\left(-.5\left(\boldsymbol z-\mu\right)^{\prime}\Sigma^{-1}\left(\boldsymbol z-\mu\right)\right) \]
  • where \( \mu=\operatorname{E} \boldsymbol z \) is the mean of the random vector \( \boldsymbol z\)

  • and \( \Sigma=\operatorname{E}\left(\boldsymbol z -\mu\right)\left(\boldsymbol z-\mu\right)^\prime = \operatorname{cov}(\boldsymbol z) \) is the covariance matrix of \( \boldsymbol z \)

Building Gaussian time series models

  • Gaussian innovations \(\{\varepsilon_𝑡\}\):

\[ p(\varepsilon_1, \varepsilon_2, \cdots, \varepsilon_T) = p(\boldsymbol \varepsilon) = \mathcal{N}(\boldsymbol 0, \sigma^2\mathbf{I}). \]
  • affine transformation \(\boldsymbol \varepsilon \rightarrow \boldsymbol z\)

\[ \boldsymbol z = \boldsymbol \mu + \boldsymbol A \boldsymbol \varepsilon \sim \mathcal{N}(\boldsymbol \mu, \mathbf{\Sigma}) \]

where

\[\mathbf{\Sigma} = \sigma^2 \boldsymbol A \boldsymbol A'\]

General result

\[ \boldsymbol z \sim \mathcal{N}(\boldsymbol \mu, \mathbf{\Sigma}) \]
  • affine transformation \(\boldsymbol z \rightarrow \boldsymbol y\)

\[ \boldsymbol y = \boldsymbol d + \boldsymbol B \boldsymbol z \sim \mathcal{N}(\boldsymbol d + \boldsymbol B \boldsymbol \mu , \boldsymbol B \mathbf{\Sigma} \boldsymbol B') \]

The joint, marginal and conditional distributions

Joint

\[ p(\underbrace{z_1, z_2, \cdots, z_k}_{\mathbf{z}_1}, \underbrace{z_{k+1}, z_{k+2}, \cdots, z_T}_{\mathbf{z}_2}) = p(\mathbf{z}_1, \mathbf{z}_2) \sim \mathcal{N}(\boldsymbol \mu, \mathbf{\Sigma}). \]

with:

\[\begin{split} \boldsymbol \mu = \begin{bmatrix} \boldsymbol \mu_1 \\ \boldsymbol \mu_2 \end{bmatrix} \; \; \text{and} \; \; \mathbf{\Sigma} = \begin{bmatrix} \mathbf{\Sigma}_{1 1} & \mathbf{\Sigma}_{1 2} \\ \mathbf{\Sigma}_{2 1} & \mathbf{\Sigma}_{2 2} \end{bmatrix}, \text{ where} \; \; \mathbf{\Sigma}_{2 1} = \mathbf{\Sigma}_{1 2}^{'} \end{split}\]

Marginal

\[\begin{split} p(\mathbf{z}_1) = \int_{\mathbf{z}_2} p(\mathbf{z}_1, \mathbf{z}_2) \text{d} \mathbf{z}_2 = \mathcal{N}(\boldsymbol \mu_1, \mathbf{\Sigma}_{1 1}) \\ p(\mathbf{z}_2) = \int_{\mathbf{z}_1} p(\mathbf{z}_1, \mathbf{z}_2) \text{d} \mathbf{z}_1 = \mathcal{N}(\boldsymbol \mu_2, \mathbf{\Sigma}_{2 2}) \end{split}\]

Conditional

\[\begin{split} p(\mathbf{z}_1 | \mathbf{z}_2) = \mathcal{N}(\underbrace{\boldsymbol \mu_1 + \mathbf{\Sigma}_{1 2} \mathbf{\Sigma}^{-1}_{2 2} (\mathbf{z}_2 - \boldsymbol \mu_2)}_{\operatorname{E}(\mathbf{z}_1 | \mathbf{z}_2)}, \underbrace{\mathbf{\Sigma}_{1 1} - \mathbf{\Sigma}_{1 2}\mathbf{\Sigma}_{2 2}^{-1}\mathbf{\Sigma}_{2 1}}_{\operatorname{cov}(\mathbf{z}_1 | \mathbf{z}_2)}) %\\ %p(\mathbf{z}_2 | \mathbf{z}_1) = \mathcal{N}(\boldsymbol \mu_2 + \mathbf{\Sigma}_{2 1} \mathbf{\Sigma}^{-1}_{1 1} (\mathbf{z}_1 - \boldsymbol \mu_1), \mathbf{\Sigma}_{2 2} - \mathbf{\Sigma}_{2 1}\mathbf{\Sigma}_{1 1}^{-1}\mathbf{\Sigma}_{1 2}) \end{split}\]

Let

\[\begin{split} \mathbf{Q} = \mathbf{\Sigma}^{-1} = \begin{bmatrix} \mathbf{Q}_{1 1} & \mathbf{Q}_{1 2} \\ \mathbf{Q}_{2 1} & \mathbf{Q}_{2 2} \end{bmatrix} \end{split}\]

Then

\[\begin{split} \begin{align} \operatorname{E}(\mathbf{z}_1 | \mathbf{z}_2) &= \boldsymbol \mu_1 + \mathbf{Q}^{-1}_{1 1} \mathbf{Q}_{1 2} (\mathbf{z}_2 - \boldsymbol \mu_2) \\ \operatorname{cov}(\mathbf{z}_1 | \mathbf{z}_2) &= \mathbf{Q}_{1 1}^{-1} \end{align} \end{split}\]

\(\mathbf{Q}\) is called the precision matrix.

Independence

\[\begin{split} \text{if} \;\; \mathbf{\Sigma}_{1 2}= \boldsymbol 0\\ p(\mathbf{z}_1 | \mathbf{z}_2) = \mathcal{N}(\boldsymbol \mu_1 , \mathbf{\Sigma}_{1 1}) = p(\mathbf{z}_1) \end{split}\]

If \(\mathbf{z}_1 \sim \mathcal{N}(\boldsymbol \mu_1 , \mathbf{\Sigma}_{1 1})\) and \(\mathbf{z}_2 \sim \mathcal{N}(\boldsymbol \mu_2 , \mathbf{\Sigma}_{2 2})\) are independent then

\[ \mathbf{z}_1 + \mathbf{z}_2 \sim \mathcal{N}(\boldsymbol \mu_1 + \mu_2, \mathbf{\Sigma}_{1 1} + \mathbf{\Sigma}_{2 2}) \]

and

\[ p(\mathbf{z}_1, A \mathbf{z}_1 + B\mathbf{z}_2) \sim \mathcal{N}(\boldsymbol \mu, \mathbf{\Sigma}). \]

with:

\[\begin{split} \boldsymbol \mu = \begin{bmatrix} \boldsymbol \mu_1 \\ A \boldsymbol \mu_1 + B\boldsymbol \mu_2 \end{bmatrix} \; \; \text{and} \; \; \mathbf{\Sigma} = \begin{bmatrix} \mathbf{\Sigma}_{1 1} & \mathbf{\Sigma}_{1 1} A' \\ A\mathbf{\Sigma}_{1 1} & A\mathbf{\Sigma}_{1 1}A' + B\mathbf{\Sigma}_{2 2}B' \end{bmatrix} \end{split}\]

Information

\[\operatorname{cov}(\mathbf{z}_1 | \mathbf{z}_2) = \mathbf{\Sigma}_{1 1} - \mathbf{\Sigma}_{1 2}\mathbf{\Sigma}_{2 2}^{-1}\mathbf{\Sigma}_{2 1} \geq \boldsymbol 0\]
  • stronger correlation => more information from \(\mathbf{z}_1\) about \(\mathbf{z}_2\) (and from \(\mathbf{z}_2\) about \(\mathbf{z}_1\))

Bivariate Normal

\[ p(z_1, z_2) \sim \mathcal{N}(\boldsymbol \mu, \mathbf{\Sigma}). \]

where:

\[\begin{split} \boldsymbol \mu = \begin{bmatrix} \mu_1 \\ \mu_2 \end{bmatrix} \; \; \text{and} \; \; \mathbf{\Sigma} = \begin{bmatrix} \sigma^2_{1} & \rho \sigma_{1}\sigma_2 \\ \rho \sigma_{1}\sigma_2 & \sigma^2_{2} \end{bmatrix} \end{split}\]
\[\begin{split} \begin{align} p(z_1 | z_2) &= \mathcal{N}\left(\mu_1 + \frac{\rho \sigma_{1}\sigma_2}{\sigma^2_{2}} (z_2 - \mu_2), \sigma^2_{1} - \frac{(\rho \sigma_{1}\sigma_2)^2}{\sigma^2_{2}}\right) \\ &= \mathcal{N}\left(\mu_1 + \frac{\rho \sigma_{1}}{\sigma_{2}} (z_2 - \mu_2), (1 - \rho^2)\sigma_1^2\right) \end{align} \end{split}\]

Example: \(\sigma_1 = \sigma_2=1\), \(\mu_1=\mu_2 = 0\)

case 1 \(\rho=.9\)

\[p(z_1) = \mathcal{N}(0, 1)\]
rho=.9
sigma1 = 1
sigma2 = 1
mu1=0

mu2=0

y2 = 1

(
mu1 + (y2 - mu2)*(rho*sigma1*sigma2)/(sigma2**2), 
1 - (rho*sigma1*sigma2)**2/(sigma2**2)
)
(0.9, 0.18999999999999995)

If \(z_2=1\)

\[p(z_1|z_2=1) = \mathcal{N}(0.9, 0.19)\]

Information about \(z_1\) from observing \(z_2\)

\[\operatorname{var}(z_1) - \operatorname{var}(z_1 | z_2) = 1 - 0.19 = 0.81 \]

Reduction of the uncertainty about \(z_1\) by 81%

case 2 \(\rho=.1\), \(\sigma_1 = \sigma_2=1\), \(\mu_1=\mu_2 = 0\)

\[p(z_1) = \mathcal{N}(0, 1)\]
rho=.1
sigma1 = 1
sigma2 = 1
mu1=0
mu2=0

y2 = 1

(
mu1 + (y2 - mu2)*(rho*sigma1*sigma2)/(sigma2**2), 
1-(rho*sigma1*sigma2)**2/(sigma2**2)
)
(0.1, 0.99)

If \(z_2=1\)

\[p(z_1|z_2=1) = \mathcal{N}(0.1, 0.99)\]

Reduction of the uncertainty about \(z_1\) by 1%

Let \(\mathbf{z}\) be jointly Gaussian, and partitioned as (\(z_1\) is a scalar)

\[\mathbf{z} = [z_1, \mathbf{z}_2] \]
\[\begin{split} \boldsymbol \mu = \begin{bmatrix} \mu_1 \\ \boldsymbol \mu_2 \end{bmatrix} \; \; \text{and} \; \; \mathbf{\Sigma} = \begin{bmatrix} \Sigma_{1 1} & \mathbf{\Sigma}_{1 2} \\ \mathbf{\Sigma}_{2 1} & \mathbf{\Sigma}_{2 2} \end{bmatrix} \end{split}\]

Then

\[\begin{split} p(z_1 | \mathbf{z}_2) = \mathcal{N}(\underbrace{\mu_1 - \mathbf{\Sigma}_{1 2} \mathbf{\Sigma}^{-1}_{2 2} \boldsymbol \mu_2}_{\beta_0} + \underbrace{\mathbf{\Sigma}_{1 2} \mathbf{\Sigma}^{-1}_{2 2}}_{\mathbf{\beta}'_1} \mathbf{z}_2, \underbrace{\Sigma_{1 1} - \mathbf{\Sigma}_{1 2}\mathbf{\Sigma}_{2 2}^{-1}\mathbf{\Sigma}_{2 1}}_{\sigma^2}) %\\ \end{split}\]

therefore,

\[ z_1 = \beta_0 + \mathbf{\beta}'_1 \mathbf{z}_2 + \varepsilon \;\;\; \text{where } \varepsilon \sim \mathcal{N}(0, \sigma^2) \]

Conditional (in)dependence

Two random variables \(z_1\) and \(z_2\) are conditionally independent given \(\mathbf{z}_3\) if

\[p(z_1, z_2 | \mathbf{z}_3) = p(z_1 | \mathbf{z}_3) p(z_2 | \mathbf{z}_3) \tag{1}\]

If \(z_1\), \(z_2\) and \(\mathbf{z}_3\) are jointly Gaussian, then (1) is true iff

\[\{\mathbf{\Sigma}^{-1}\}_{1,2} = \mathbf{Q}_{1,2} = 0\]

More generally, \(z_i\) and \(z_j\) are conditionally independent given the remaining elements of \(\mathbf{z}\) (denote them with \(\mathbf{z}_{-ij}\)) iff

\[Q_{i,j} = 0\]

And the conditional correlation between \(z_i\) and \(z_j\) is

\[\operatorname{corr}(z_i, z_j | \mathbf{z}_{-ij}) = -\frac{Q_{i,j}}{\sqrt{Q_{i,i} Q_{jj}}}\]

also known as partial correlation between \(z_i\) and \(z_j\)

Example 1

\[\begin{split} \mathbf{Q} = \begin{bmatrix} 1 & -0.4 & 0 \\ -0.4 & 1.16 & -0.4 \\ 0 & -0.4 & 1 \end{bmatrix} \end{split}\]

then

\[\begin{split} \mathbf{\Sigma} = \begin{bmatrix} 1.19047619 & 0.47619048 & 0.19047619 \\ 0.47619048 & 1.19047619 & 0.47619048 \\ 0.19047619 & 0.47619048 & 1.19047619 \end{bmatrix} \end{split}\]

therefore, \(z_1\) and \(z_3\) are unconditionally dependent, but conditionally (given \(z_2\)) independent

Example 2

\[\begin{split} \mathbf{\Sigma} = \begin{bmatrix} 1.09 & 0.3 & 0. \\ 0.3 & 1.09 & 0.3 \\ 0 & 0.3 & 1.09 \end{bmatrix} \end{split}\]

then

\[\begin{split} \mathbf{Q} = \begin{bmatrix} 1. & -0.3 & 0.08 \\ -0.3 & 1.08 & -0.3 \\ 0.08 & -0.3 & 1 \end{bmatrix} \end{split}\]

therefore, \(z_1\) and \(z_3\) are unconditionally independent, but conditionally (given \(z_2\)) dependent

from statsmodels.tsa.arima_process import arma_acovf
from scipy.linalg import toeplitz
import numpy as np

acov1 = arma_acovf(ar=[1, -.4], ma=[1], nobs=10, sigma2=1)
Sigma1 = toeplitz(acov1)
acov2 = arma_acovf(ar=[1,], ma=[1, .3], nobs=10, sigma2=1)
Sigma2 = toeplitz(acov2)


print(np.linalg.inv(Sigma1[:3][:,:3]).round(4))
print(np.linalg.inv(Sigma2[:3][:,:3]).round(4))
[[ 1.   -0.4   0.  ]
 [-0.4   1.16 -0.4 ]
 [ 0.   -0.4   1.  ]]
[[ 0.9993 -0.2976  0.0819]
 [-0.2976  1.0812 -0.2976]
 [ 0.0819 -0.2976  0.9993]]

Missing values

  • Suppose we have a Gaussian model for \(\mathbf{z}\)

\[ \mathbf{z} \sim \mathcal{N}(\boldsymbol \mu , \mathbf{\Sigma} ) \]
  • but some elements of \(\mathbf{z}\) are not observed, i.e. we observe a vector \(\mathbf{z}_{1}\)

  • for example

\[\begin{split} \mathbf{z}=\left[\begin{array}{c} z_{1}\\ z_{2}\\ \vdots\\ z_{k-1}\\ z_{k}\\ z_{k+1}\\ \vdots\\ z_{T-2}\\ z_{T-1}\\ z_{T} \end{array}\right], \quad \mathbf{z}_{1}=\left[\begin{array}{c} *\\ z_{2}\\ \vdots\\ z_{k-1}\\ *\\ z_{k+1}\\ \vdots\\ z_{T-2}\\ *\\ *\\ \end{array}\right] \end{split}\]

Some examples

  • mixed frequency

    • (univariate) GDP at annual, GDP at quarterly

    • (multivariate) GDP at annual, inflation at monthly

    • etc.

  • forecasting

  • backcasting

  • unobserved (latent) variables

    • state of the economy

    • natural rates (interest, unemployment)

    • economic shocks

observed elements \(\mathbf{z}_{1}\), unobserved elements \(\mathbf{z}_{2}\)

  • \(\mathbf{z}_{1}, ~\mathbf{z}_{2}\) - jointly Gaussian

  • marginal distribution of \(\mathbf{z}_{1}\)

\[ p(\mathbf{z}_{1}) \sim \mathcal{N}( \boldsymbol \mu_{1} , \mathbf{\Sigma}_{1}) \]

example:

\[\begin{split} \mathbf{z} =\left[\begin{array}{c} z_{1}\\ z_{2}\\ z_{3}\\ \end{array}\right] , \quad \mathbf{z}_{1} =\left[\begin{array}{c} z_{1}\\ z_{3}\\ \end{array}\right], \quad \mathbf{z}_{2 }= \left[\begin{array}{c} z_{2} \end{array}\right] \end{split}\]
\[\begin{split} \mathbf{z}_{1} = \begin{bmatrix} 1 & 0 & 0\\ 0 & 0 & 1 \end{bmatrix} \left[\begin{array}{c} z_{1}\\ z_{2}\\ z_{3}\\ \end{array}\right] = \boldsymbol B_1 \mathbf{z} \end{split}\]
\[\begin{split} \mathbf{z}_{2} = \begin{bmatrix} 0 & 1 & 0 \end{bmatrix} \left[\begin{array}{c} z_{1}\\ z_{2}\\ z_{3}\\ \end{array}\right] = \boldsymbol B_2 \mathbf{z} \end{split}\]
  • conditional distribution of \(\mathbf{z}_{2}\) given \(\mathbf{z}_{1}\)

\[ p(\mathbf{z}_2 | \mathbf{z}_1) = \mathcal{N}(\underbrace{\boldsymbol \mu_2 + \mathbf{\Sigma}_{2 1} \mathbf{\Sigma}^{-1}_{1 1} (\mathbf{z}_1 - \boldsymbol \mu_1)}_{\operatorname{E}(\mathbf{z}_2 | \mathbf{z}_1)}, \underbrace{\mathbf{\Sigma}_{2 2} - \mathbf{\Sigma}_{2 1}\mathbf{\Sigma}_{1 1}^{-1}\mathbf{\Sigma}_{1 2}}_{\operatorname{cov}(\mathbf{z}_2 | \mathbf{z}_1)}) \]

if \(\mathbf{z}_1\) is the past and \(\mathbf{z}_2\) is the future,

  • \(\operatorname{E}(\mathbf{z}_2 | \mathbf{z}_1)\) - optimal forecast of \(\mathbf{z}_{2}\) given \(\mathbf{z}_{1}\)

  • \(\operatorname{cov}(\mathbf{z}_2 | \mathbf{z}_1)\) - variance of the optimal forecast

where optimality is in the sense of minimizing the MSE

in general, \(\operatorname{E}(\mathbf{z}_2 | \mathbf{z}_1)\) is our best guess of \(\mathbf{z}_{2}\) given \(\mathbf{z}_{1}\), and \(\operatorname{cov}(\mathbf{z}_2 | \mathbf{z}_1)\) is the associated uncertainty