Gaussian time series model
\[
p(z_1, z_2, \cdots, z_T) = \mathcal{N}(\boldsymbol \mu, \mathbf{\Sigma}).
\]
\[
p\left(\boldsymbol z;\mu,\Sigma\right)=\left(2\pi\right)^{-\left(\frac{T}{2}\right)}\det\left(\Sigma\right)^{-\frac{1}{2}}\exp\left(-.5\left(\boldsymbol z-\mu\right)^{\prime}\Sigma^{-1}\left(\boldsymbol z-\mu\right)\right)
\]
where \( \mu=\operatorname{E} \boldsymbol z \) is the mean of the random vector \( \boldsymbol z\)
and
\( \Sigma=\operatorname{E}\left(\boldsymbol z -\mu\right)\left(\boldsymbol z-\mu\right)^\prime = \operatorname{cov}(\boldsymbol z) \) is the covariance matrix of \( \boldsymbol z \)
Building Gaussian time series models
\[
p(\varepsilon_1, \varepsilon_2, \cdots, \varepsilon_T) = p(\boldsymbol \varepsilon) = \mathcal{N}(\boldsymbol 0, \sigma^2\mathbf{I}).
\]
\[
\boldsymbol z = \boldsymbol \mu + \boldsymbol A \boldsymbol \varepsilon \sim \mathcal{N}(\boldsymbol \mu, \mathbf{\Sigma})
\]
where
\[\mathbf{\Sigma} = \sigma^2 \boldsymbol A \boldsymbol A'\]
General result
\[
\boldsymbol z \sim \mathcal{N}(\boldsymbol \mu, \mathbf{\Sigma})
\]
\[
\boldsymbol y = \boldsymbol d + \boldsymbol B \boldsymbol z \sim \mathcal{N}(\boldsymbol d + \boldsymbol B \boldsymbol \mu , \boldsymbol B \mathbf{\Sigma} \boldsymbol B')
\]
The joint, marginal and conditional distributions
Joint
\[
p(\underbrace{z_1, z_2, \cdots, z_k}_{\mathbf{z}_1}, \underbrace{z_{k+1}, z_{k+2}, \cdots, z_T}_{\mathbf{z}_2}) = p(\mathbf{z}_1, \mathbf{z}_2) \sim \mathcal{N}(\boldsymbol \mu, \mathbf{\Sigma}).
\]
with:
\[\begin{split}
\boldsymbol \mu =
\begin{bmatrix}
\boldsymbol \mu_1 \\
\boldsymbol \mu_2
\end{bmatrix}
\; \; \text{and} \; \;
\mathbf{\Sigma} =
\begin{bmatrix}
\mathbf{\Sigma}_{1 1} & \mathbf{\Sigma}_{1 2} \\
\mathbf{\Sigma}_{2 1} & \mathbf{\Sigma}_{2 2}
\end{bmatrix}, \text{ where} \; \; \mathbf{\Sigma}_{2 1} = \mathbf{\Sigma}_{1 2}^{'}
\end{split}\]
Marginal
\[\begin{split}
p(\mathbf{z}_1) = \int_{\mathbf{z}_2} p(\mathbf{z}_1, \mathbf{z}_2) \text{d} \mathbf{z}_2 =
\mathcal{N}(\boldsymbol \mu_1, \mathbf{\Sigma}_{1 1}) \\
p(\mathbf{z}_2) = \int_{\mathbf{z}_1} p(\mathbf{z}_1, \mathbf{z}_2) \text{d} \mathbf{z}_1 =
\mathcal{N}(\boldsymbol \mu_2, \mathbf{\Sigma}_{2 2})
\end{split}\]
Conditional
\[\begin{split}
p(\mathbf{z}_1 | \mathbf{z}_2) = \mathcal{N}(\underbrace{\boldsymbol \mu_1 + \mathbf{\Sigma}_{1 2} \mathbf{\Sigma}^{-1}_{2 2} (\mathbf{z}_2 - \boldsymbol \mu_2)}_{\operatorname{E}(\mathbf{z}_1 | \mathbf{z}_2)}, \underbrace{\mathbf{\Sigma}_{1 1} - \mathbf{\Sigma}_{1 2}\mathbf{\Sigma}_{2 2}^{-1}\mathbf{\Sigma}_{2 1}}_{\operatorname{cov}(\mathbf{z}_1 | \mathbf{z}_2)}) %\\
%p(\mathbf{z}_2 | \mathbf{z}_1) = \mathcal{N}(\boldsymbol \mu_2 + \mathbf{\Sigma}_{2 1} \mathbf{\Sigma}^{-1}_{1 1} (\mathbf{z}_1 - \boldsymbol \mu_1), \mathbf{\Sigma}_{2 2} - \mathbf{\Sigma}_{2 1}\mathbf{\Sigma}_{1 1}^{-1}\mathbf{\Sigma}_{1 2})
\end{split}\]
Let
\[\begin{split}
\mathbf{Q} = \mathbf{\Sigma}^{-1} =
\begin{bmatrix}
\mathbf{Q}_{1 1} & \mathbf{Q}_{1 2} \\
\mathbf{Q}_{2 1} & \mathbf{Q}_{2 2}
\end{bmatrix}
\end{split}\]
Then
\[\begin{split}
\begin{align}
\operatorname{E}(\mathbf{z}_1 | \mathbf{z}_2) &= \boldsymbol \mu_1 + \mathbf{Q}^{-1}_{1 1} \mathbf{Q}_{1 2} (\mathbf{z}_2 - \boldsymbol \mu_2) \\
\operatorname{cov}(\mathbf{z}_1 | \mathbf{z}_2) &= \mathbf{Q}_{1 1}^{-1}
\end{align}
\end{split}\]
\(\mathbf{Q}\) is called the precision matrix.
Independence
\[\begin{split}
\text{if} \;\; \mathbf{\Sigma}_{1 2}= \boldsymbol 0\\
p(\mathbf{z}_1 | \mathbf{z}_2) = \mathcal{N}(\boldsymbol \mu_1 , \mathbf{\Sigma}_{1 1}) = p(\mathbf{z}_1)
\end{split}\]
If \(\mathbf{z}_1 \sim \mathcal{N}(\boldsymbol \mu_1 , \mathbf{\Sigma}_{1 1})\) and \(\mathbf{z}_2 \sim \mathcal{N}(\boldsymbol \mu_2 , \mathbf{\Sigma}_{2 2})\) are independent then
\[ \mathbf{z}_1 + \mathbf{z}_2 \sim \mathcal{N}(\boldsymbol \mu_1 + \mu_2, \mathbf{\Sigma}_{1 1} + \mathbf{\Sigma}_{2 2}) \]
and
\[
p(\mathbf{z}_1, A \mathbf{z}_1 + B\mathbf{z}_2) \sim \mathcal{N}(\boldsymbol \mu, \mathbf{\Sigma}).
\]
with:
\[\begin{split}
\boldsymbol \mu =
\begin{bmatrix}
\boldsymbol \mu_1 \\
A \boldsymbol \mu_1 + B\boldsymbol \mu_2
\end{bmatrix}
\; \; \text{and} \; \;
\mathbf{\Sigma} =
\begin{bmatrix}
\mathbf{\Sigma}_{1 1} & \mathbf{\Sigma}_{1 1} A' \\
A\mathbf{\Sigma}_{1 1} & A\mathbf{\Sigma}_{1 1}A' + B\mathbf{\Sigma}_{2 2}B'
\end{bmatrix}
\end{split}\]
Bivariate Normal
\[
p(z_1, z_2) \sim \mathcal{N}(\boldsymbol \mu, \mathbf{\Sigma}).
\]
where:
\[\begin{split}
\boldsymbol \mu =
\begin{bmatrix}
\mu_1 \\
\mu_2
\end{bmatrix}
\; \; \text{and} \; \;
\mathbf{\Sigma} =
\begin{bmatrix}
\sigma^2_{1} & \rho \sigma_{1}\sigma_2 \\
\rho \sigma_{1}\sigma_2 & \sigma^2_{2}
\end{bmatrix}
\end{split}\]
\[\begin{split}
\begin{align}
p(z_1 | z_2) &= \mathcal{N}\left(\mu_1 + \frac{\rho \sigma_{1}\sigma_2}{\sigma^2_{2}} (z_2 - \mu_2), \sigma^2_{1} - \frac{(\rho \sigma_{1}\sigma_2)^2}{\sigma^2_{2}}\right) \\
&= \mathcal{N}\left(\mu_1 + \frac{\rho \sigma_{1}}{\sigma_{2}} (z_2 - \mu_2), (1 - \rho^2)\sigma_1^2\right)
\end{align}
\end{split}\]
Example: \(\sigma_1 = \sigma_2=1\), \(\mu_1=\mu_2 = 0\)
case 1 \(\rho=.9\)
\[p(z_1) = \mathcal{N}(0, 1)\]
If \(z_2=1\)
\[p(z_1|z_2=1) = \mathcal{N}(0.9, 0.19)\]
Information about \(z_1\) from observing \(z_2\)
\[\operatorname{var}(z_1) - \operatorname{var}(z_1 | z_2) = 1 - 0.19 = 0.81 \]
Reduction of the uncertainty about \(z_1\) by 81%
case 2 \(\rho=.1\), \(\sigma_1 = \sigma_2=1\), \(\mu_1=\mu_2 = 0\)
\[p(z_1) = \mathcal{N}(0, 1)\]
If \(z_2=1\)
\[p(z_1|z_2=1) = \mathcal{N}(0.1, 0.99)\]
Reduction of the uncertainty about \(z_1\) by 1%
Let \(\mathbf{z}\) be jointly Gaussian, and partitioned as (\(z_1\) is a scalar)
\[\mathbf{z} = [z_1, \mathbf{z}_2] \]
\[\begin{split}
\boldsymbol \mu =
\begin{bmatrix}
\mu_1 \\
\boldsymbol \mu_2
\end{bmatrix}
\; \; \text{and} \; \;
\mathbf{\Sigma} =
\begin{bmatrix}
\Sigma_{1 1} & \mathbf{\Sigma}_{1 2} \\
\mathbf{\Sigma}_{2 1} & \mathbf{\Sigma}_{2 2}
\end{bmatrix}
\end{split}\]
Then
\[\begin{split}
p(z_1 | \mathbf{z}_2) = \mathcal{N}(\underbrace{\mu_1 - \mathbf{\Sigma}_{1 2} \mathbf{\Sigma}^{-1}_{2 2} \boldsymbol \mu_2}_{\beta_0} + \underbrace{\mathbf{\Sigma}_{1 2} \mathbf{\Sigma}^{-1}_{2 2}}_{\mathbf{\beta}'_1} \mathbf{z}_2, \underbrace{\Sigma_{1 1} - \mathbf{\Sigma}_{1 2}\mathbf{\Sigma}_{2 2}^{-1}\mathbf{\Sigma}_{2 1}}_{\sigma^2}) %\\
\end{split}\]
therefore,
\[
z_1 = \beta_0 + \mathbf{\beta}'_1 \mathbf{z}_2 + \varepsilon \;\;\; \text{where } \varepsilon \sim \mathcal{N}(0, \sigma^2)
\]
Conditional (in)dependence
Two random variables \(z_1\) and \(z_2\) are conditionally independent given \(\mathbf{z}_3\) if
\[p(z_1, z_2 | \mathbf{z}_3) = p(z_1 | \mathbf{z}_3) p(z_2 | \mathbf{z}_3) \tag{1}\]
If \(z_1\), \(z_2\) and \(\mathbf{z}_3\) are jointly Gaussian, then (1) is true iff
\[\{\mathbf{\Sigma}^{-1}\}_{1,2} = \mathbf{Q}_{1,2} = 0\]
More generally, \(z_i\) and \(z_j\) are conditionally independent given the remaining elements of \(\mathbf{z}\) (denote them with \(\mathbf{z}_{-ij}\)) iff
\[Q_{i,j} = 0\]
And the conditional correlation between \(z_i\) and \(z_j\) is
\[\operatorname{corr}(z_i, z_j | \mathbf{z}_{-ij}) = -\frac{Q_{i,j}}{\sqrt{Q_{i,i} Q_{jj}}}\]
also known as partial correlation between \(z_i\) and \(z_j\)
Example 1
\[\begin{split}
\mathbf{Q} =
\begin{bmatrix}
1 & -0.4 & 0 \\
-0.4 & 1.16 & -0.4 \\
0 & -0.4 & 1
\end{bmatrix}
\end{split}\]
then
\[\begin{split}
\mathbf{\Sigma} =
\begin{bmatrix}
1.19047619 & 0.47619048 & 0.19047619 \\
0.47619048 & 1.19047619 & 0.47619048 \\
0.19047619 & 0.47619048 & 1.19047619
\end{bmatrix}
\end{split}\]
therefore, \(z_1\) and \(z_3\) are unconditionally dependent, but conditionally (given \(z_2\)) independent
Example 2
\[\begin{split}
\mathbf{\Sigma} =
\begin{bmatrix}
1.09 & 0.3 & 0. \\
0.3 & 1.09 & 0.3 \\
0 & 0.3 & 1.09
\end{bmatrix}
\end{split}\]
then
\[\begin{split}
\mathbf{Q} =
\begin{bmatrix}
1. & -0.3 & 0.08 \\
-0.3 & 1.08 & -0.3 \\
0.08 & -0.3 & 1
\end{bmatrix}
\end{split}\]
therefore, \(z_1\) and \(z_3\) are unconditionally independent, but conditionally (given \(z_2\)) dependent
Missing values
\[
\mathbf{z} \sim \mathcal{N}(\boldsymbol \mu , \mathbf{\Sigma} )
\]
\[\begin{split}
\mathbf{z}=\left[\begin{array}{c}
z_{1}\\
z_{2}\\
\vdots\\
z_{k-1}\\
z_{k}\\
z_{k+1}\\
\vdots\\
z_{T-2}\\
z_{T-1}\\
z_{T}
\end{array}\right], \quad
\mathbf{z}_{1}=\left[\begin{array}{c}
*\\
z_{2}\\
\vdots\\
z_{k-1}\\
*\\
z_{k+1}\\
\vdots\\
z_{T-2}\\
*\\
*\\
\end{array}\right]
\end{split}\]
Some examples
mixed frequency
(univariate) GDP at annual, GDP at quarterly
(multivariate) GDP at annual, inflation at monthly
etc.
observed elements \(\mathbf{z}_{1}\), unobserved elements \(\mathbf{z}_{2}\)
\[
p(\mathbf{z}_{1}) \sim \mathcal{N}( \boldsymbol \mu_{1} , \mathbf{\Sigma}_{1})
\]
example:
\[\begin{split}
\mathbf{z} =\left[\begin{array}{c}
z_{1}\\
z_{2}\\
z_{3}\\
\end{array}\right]
, \quad
\mathbf{z}_{1} =\left[\begin{array}{c}
z_{1}\\
z_{3}\\
\end{array}\right], \quad
\mathbf{z}_{2 }= \left[\begin{array}{c}
z_{2}
\end{array}\right]
\end{split}\]
\[\begin{split}
\mathbf{z}_{1} =
\begin{bmatrix}
1 & 0 & 0\\
0 & 0 & 1
\end{bmatrix}
\left[\begin{array}{c}
z_{1}\\
z_{2}\\
z_{3}\\
\end{array}\right] = \boldsymbol B_1 \mathbf{z}
\end{split}\]
\[\begin{split}
\mathbf{z}_{2} =
\begin{bmatrix}
0 & 1 & 0
\end{bmatrix}
\left[\begin{array}{c}
z_{1}\\
z_{2}\\
z_{3}\\
\end{array}\right] = \boldsymbol B_2 \mathbf{z}
\end{split}\]
\[
p(\mathbf{z}_2 | \mathbf{z}_1) = \mathcal{N}(\underbrace{\boldsymbol \mu_2 + \mathbf{\Sigma}_{2 1} \mathbf{\Sigma}^{-1}_{1 1} (\mathbf{z}_1 - \boldsymbol \mu_1)}_{\operatorname{E}(\mathbf{z}_2 | \mathbf{z}_1)}, \underbrace{\mathbf{\Sigma}_{2 2} - \mathbf{\Sigma}_{2 1}\mathbf{\Sigma}_{1 1}^{-1}\mathbf{\Sigma}_{1 2}}_{\operatorname{cov}(\mathbf{z}_2 | \mathbf{z}_1)})
\]
if \(\mathbf{z}_1\) is the past and \(\mathbf{z}_2\) is the future,
\(\operatorname{E}(\mathbf{z}_2 | \mathbf{z}_1)\) - optimal forecast of \(\mathbf{z}_{2}\) given \(\mathbf{z}_{1}\)
\(\operatorname{cov}(\mathbf{z}_2 | \mathbf{z}_1)\) - variance of the optimal forecast
where optimality is in the sense of minimizing the MSE
in general, \(\operatorname{E}(\mathbf{z}_2 | \mathbf{z}_1)\) is our best guess of \(\mathbf{z}_{2}\) given \(\mathbf{z}_{1}\), and \(\operatorname{cov}(\mathbf{z}_2 | \mathbf{z}_1)\) is the associated uncertainty