ARMA models

Time series models

  • aim to capture complex temporal dependence

  • parsimoniously…

  • MA can be very flexible, but is not parsimonious

  • AR can be very parsimonious but is less flexible

  • ARMA models can be a way to achieve both flexibility and parsimony

ARMA(p, q)

\[ z_t = \alpha_1 z_{t-1} + \alpha_2 z_{t-1} + \cdots + \alpha_p z_{t-p} + \varepsilon_{t} + \beta_1 \varepsilon_{t-1} + \cdots + \beta_q \varepsilon_{t-q} \]
  • \(\alpha_p \neq 0\), \(\beta_q \neq 0\)

  • in lag operator notation

\[\begin{split} \begin{align} \underbrace{(1-\alpha_1 L - \alpha_2 L^2 - \cdots - \alpha_p L^p )}_{\alpha(L)} z_t &= \underbrace{(1 + \beta_1 L + \cdots + \beta_q L^q)}_{\beta(L)} \varepsilon_{t} \\ \alpha(L) z_t &= \beta(L) \varepsilon_t \end{align} \end{split}\]

ARMA(1, 1)

\[ z_t = \alpha z_{t-1} + \varepsilon_{t} + \beta \varepsilon_{t-1} \]
\[ (1-\alpha L) z_t = (1+\beta L)\varepsilon_{t} \]

Stationarity and invertibility

ARMA(p, q) is stationary if the AR§ part is stationary, i.e. if all roots of the polynomial

\[ \alpha(x) = 1 - \alpha_1 x - \alpha_2 x^2 - \cdots - \alpha_p x^p\]

are outside the unit circle (\(|x|>1\))

ARMA(p, q) is invertable if the MA(q) part is invertable, i.e. if all roots of the polynomial

\[ \beta(x) = 1 + \beta_1 x + \beta_2 x^2 + \cdots + \beta_q x^q\]

are outside the unit circle (\(|x|>1\))

import warnings
warnings.simplefilter("ignore", FutureWarning)
warnings.simplefilter("ignore", UserWarning)
from statsmodels.tools.sm_exceptions import ConvergenceWarning
warnings.simplefilter('ignore', ConvergenceWarning)

import numpy as np
from numpy.random import default_rng
from statsmodels.tsa.arima_process import arma_acovf, ArmaProcess
from statsmodels.tsa.arima.model import ARIMA
import statsmodels.tsa.api as smt

np.set_printoptions(precision=3, suppress=True)

example: ARMA(1,1) in Python

\[ z_t = .6 z_{t-1} + \varepsilon_{t} - 0.4 \varepsilon_{t-1} \]
alpha = np.array([.8])
beta = np.array([0.1])
ar = np.r_[1, -alpha] # coefficient on z(t) and z(t-1)
ma = np.r_[1, beta]  # coefficients on e(t) and e(t-1)
arma11_process = ArmaProcess(ar, ma)

print(arma11_process)
ArmaProcess
AR: [1.0, -0.8]
MA: [1.0, 0.1]
print(arma11_process.arroots)
print(arma11_process.maroots)
[1.25]
[-10.]
arma11_process.isstationary
True
arma11_process.isinvertible
True
arma11_process.acovf(10)
array([3.25 , 2.7  , 2.16 , 1.728, 1.382, 1.106, 0.885, 0.708, 0.566,
       0.453])
arma11_process.acf(10)
array([1.   , 0.831, 0.665, 0.532, 0.425, 0.34 , 0.272, 0.218, 0.174,
       0.139])
arma11_process.arma2ma(15)
array([1.   , 0.9  , 0.72 , 0.576, 0.461, 0.369, 0.295, 0.236, 0.189,
       0.151, 0.121, 0.097, 0.077, 0.062, 0.049])
# impulse responses
arma11_process.impulse_response(15)
array([1.   , 0.9  , 0.72 , 0.576, 0.461, 0.369, 0.295, 0.236, 0.189,
       0.151, 0.121, 0.097, 0.077, 0.062, 0.049])

Estimation of ARMA models by MLE

if \(\mathbf{z}\) is a sample from an ARMA processes, it can be represented as (try it for ARMA(1,1))

\[\boldsymbol C \mathbf{z} = \boldsymbol b + \boldsymbol B \boldsymbol \varepsilon \]

where \(\boldsymbol C\) is invertible. Therefore, we can express \(\mathbf{z}\) as

\[ \mathbf{z} = \boldsymbol \mu + \boldsymbol A \boldsymbol \varepsilon, \;\;\; \boldsymbol \mu = \boldsymbol C^{-1} \boldsymbol b \;\;\; \boldsymbol A = \boldsymbol C^{-1} \boldsymbol B \]

With

\[ \boldsymbol \varepsilon \sim \mathcal{N}(\boldsymbol 0, \sigma^2\mathbf{I}) \]

we have

\[ \mathbf{z} \sim \mathcal{N}(\boldsymbol \mu, \mathbf{\Sigma}) \]

where

\[\mathbf{\Sigma} = \sigma^2 \boldsymbol A \boldsymbol A'\]

Therefore, this is a gaussian model

\[ \mathbf{z} \sim \mathcal{N}(\boldsymbol \mu (\mathbf{\theta}_0), \mathbf{\Sigma}(\mathbf{\theta}_0)). \]

where \(\mathbf{\theta}\) collects the AR coefficients (\(\boldsymbol \alpha = \left[\alpha_0, \alpha_1, \alpha_2, \cdots, \alpha_p \right]\)), the MA coefficients (\(\boldsymbol \beta = \left[\beta_1, \beta_2, \cdots, \beta_q \right]\)), and \(\sigma^2\).

and we can use the Gaussian MLE to obtain \(\hat {\boldsymbol \theta}\)

Log-likelihood

\[\ell(\boldsymbol \theta | \mathbf{z}) = -\frac{T}{2} \log(2 \pi) - \frac{1}{2}\log(|\mathbf{\Sigma}(\mathbf{\theta})|) - \frac{1}{2}\left( \mathbf{z} - \boldsymbol \mu(\theta)\right)' \mathbf{\Sigma}^{-1}(\mathbf{\theta})\left( \mathbf{z} - \boldsymbol \mu(\theta)\right)\]

Equivalently, by stationarity, the covariance matrix \(\mathbf{\Sigma}\) has Toeplitz structure

\[\begin{split} \mathbf{\Sigma} = \begin{pmatrix} \gamma(0) & \gamma(1) & \gamma(2) & \cdots & \gamma(T-1)\\ \gamma(1) & \gamma(0) & \gamma(1) & \cdots & \gamma(T-2)\\ \gamma(2) & \gamma(1) & \gamma(0) & \cdots & \vdots\\ \vdots & \vdots & \vdots & \cdots & \vdots\\ \gamma(T-1) & \gamma(T-2) & \cdots & \cdots & \gamma(0)\\ \end{pmatrix} \end{split}\]

Note

In practice, the log-likelihood function is evaluated using the Kalman filter (will be discussed later)

ARMA estimation in Python

arma11_process.arroots, arma11_process.maroots
(array([1.25]), array([-10.]))
gen = default_rng(1)

T=700
z = arma11_process.generate_sample(T, distrvs=gen.normal)
arma_model = ARIMA(z, order=(1, 0, 1), trend="n")
arma_results = arma_model.fit()
print(arma_results.summary())
                               SARIMAX Results                                
==============================================================================
Dep. Variable:                      y   No. Observations:                  700
Model:                 ARIMA(1, 0, 1)   Log Likelihood                -972.782
Date:                Mon, 14 Mar 2022   AIC                           1951.564
Time:                        10:44:58   BIC                           1965.218
Sample:                             0   HQIC                          1956.842
                                - 700                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ar.L1          0.7903      0.028     28.455      0.000       0.736       0.845
ma.L1          0.0822      0.044      1.874      0.061      -0.004       0.168
sigma2         0.9417      0.049     19.248      0.000       0.846       1.038
===================================================================================
Ljung-Box (L1) (Q):                   0.03   Jarque-Bera (JB):                 0.42
Prob(Q):                              0.86   Prob(JB):                         0.81
Heteroskedasticity (H):               1.38   Skew:                            -0.00
Prob(H) (two-sided):                  0.02   Kurtosis:                         3.12
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
arma11_process
ArmaProcess([1.0, -0.8], [1.0, 0.1], nobs=100) at 0x21055846e60

Model selection

  • need to determine \(p\) (the order of AR part) and \(q\) (the order of MA part)

  • by balancing model fit and model complexity

want models that capture well the termporal dependence in the data but do it in a parsimonious way

Information criteria: combine a measure of how well a model fits the data with a penalty for the complexity of the model, related the number of model parameters

simpler models are preferred to complex ones if the fit is not much worse.

Everything should be made as simple as possible, but no simpler

Let \(k=p+q+1\) (dimension of \(\mathbf{\theta}\)) and \(\hat{\mathbf{\theta}}\) be the MLE

\[\begin{split} \begin{align} AIC &= -2 \ell(\hat{\mathbf{\theta}} | \mathbf{z}) + 2 k \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \text{Akaike’s Information Criterion} \\ BIC &= -2 \ell(\hat{\mathbf{\theta}} | \mathbf{z}) + k \log(T) \;\;\;\;\;\;\;\;\; \text{Bayesian Information Criterion} \end{align} \end{split}\]

Note

BIC aka as Schwarz Information Criterion

  • Larger \(\ell(\hat{\mathbf{\theta}} | \mathbf{z})\) means better fit

  • Larger \(k\) means more parameters (higher complexity)

  • Lower values of IC are better

  • In AIC, the penalty for complexity does not depend on the sample size

  • in BIC it does

  • as long as \(\log(T)> 2\) the BIC tends to select models with fewer parameters than AIC

Model selection procedure:

  • define a set of models

  • estimate each one and compute IC

  • pick the one with lowest IC value

Caveats

  • AIC and BIC may select different models

  • they are meant to select the “true” model, but all models are wrong

  • BIC is designed to find the true model, not best-ftting one (one reason to favor AIC)

  • Sometimes other criteria may be preferable, depending on the application (e.g. forecasting)

ARMA Model selection in Python

res_IC = smt.arma_order_select_ic(z,
                                  ic=["aic", "bic"],
                                  trend="n",
                                  max_ar=5,
                                  max_ma=5,)
res_IC['aic']
0 1 2 3 4 5
0 2719.340776 2239.261973 2096.383447 2031.290866 1992.290360 1971.391897
1 1952.156686 1951.564413 1950.097669 1950.702283 1952.641670 1954.071994
2 1952.018481 1948.774677 1950.764090 1952.679354 1954.599932 1954.273888
3 1950.620916 1950.765701 1949.777548 1951.359154 1953.032386 1956.223225
4 1950.466444 1952.466307 1951.384101 1953.193051 1954.384883 1955.607200
5 1952.466123 1954.427275 1953.013563 1955.396999 1952.649950 1955.436449
res_IC['bic']
0 1 2 3 4 5
0 2723.891856 2248.364134 2110.036688 2049.495187 2015.045761 1998.698379
1 1961.258847 1965.217654 1968.301990 1973.457685 1979.948152 1985.929557
2 1965.671722 1966.978998 1973.519492 1979.985836 1986.457494 1990.682531
3 1968.825237 1973.521102 1977.084030 1983.216716 1989.441028 1997.182948
4 1973.221846 1979.772789 1983.241664 1989.601694 1995.344606 2001.118004
5 1979.772605 1986.284837 1989.422206 1996.356722 1998.160754 2005.498332
print(res_IC.aic_min_order)
print(res_IC.bic_min_order)
(2, 1)
(1, 0)