ARMA models¶

Time series models¶

aim to capture complex temporal dependence
parsimoniously…
MA can be very flexible, but is not parsimonious
AR can be very parsimonious but is less flexible
ARMA models can be a way to achieve both flexibility and parsimony

ARMA(p, q)¶

\[ z_t = \alpha_1 z_{t-1} + \alpha_2 z_{t-1} + \cdots + \alpha_p z_{t-p} + \varepsilon_{t} + \beta_1 \varepsilon_{t-1} + \cdots + \beta_q \varepsilon_{t-q} \]

\(\alpha_p \neq 0\), \(\beta_q \neq 0\)

in lag operator notation

\[\begin{split} \begin{align} \underbrace{(1-\alpha_1 L - \alpha_2 L^2 - \cdots - \alpha_p L^p )}_{\alpha(L)} z_t &= \underbrace{(1 + \beta_1 L + \cdots + \beta_q L^q)}_{\beta(L)} \varepsilon_{t} \\ \alpha(L) z_t &= \beta(L) \varepsilon_t \end{align} \end{split}\]

ARMA(1, 1)¶

\[ z_t = \alpha z_{t-1} + \varepsilon_{t} + \beta \varepsilon_{t-1} \]

\[ (1-\alpha L) z_t = (1+\beta L)\varepsilon_{t} \]

Stationarity and invertibility¶

ARMA(p, q) is stationary if the AR§ part is stationary, i.e. if all roots of the polynomial

\[ \alpha(x) = 1 - \alpha_1 x - \alpha_2 x^2 - \cdots - \alpha_p x^p\]

are outside the unit circle (\(|x|>1\))

ARMA(p, q) is invertable if the MA(q) part is invertable, i.e. if all roots of the polynomial

\[ \beta(x) = 1 + \beta_1 x + \beta_2 x^2 + \cdots + \beta_q x^q\]

are outside the unit circle (\(|x|>1\))

import warnings
warnings.simplefilter("ignore", FutureWarning)
warnings.simplefilter("ignore", UserWarning)
from statsmodels.tools.sm_exceptions import ConvergenceWarning
warnings.simplefilter('ignore', ConvergenceWarning)

import numpy as np
from numpy.random import default_rng
from statsmodels.tsa.arima_process import arma_acovf, ArmaProcess
from statsmodels.tsa.arima.model import ARIMA
import statsmodels.tsa.api as smt

np.set_printoptions(precision=3, suppress=True)

example: ARMA(1,1) in Python

\[ z_t = .6 z_{t-1} + \varepsilon_{t} - 0.4 \varepsilon_{t-1} \]

alpha = np.array([.8])
beta = np.array([0.1])
ar = np.r_[1, -alpha] # coefficient on z(t) and z(t-1)
ma = np.r_[1, beta]  # coefficients on e(t) and e(t-1)
arma11_process = ArmaProcess(ar, ma)

print(arma11_process)

ArmaProcess
AR: [1.0, -0.8]
MA: [1.0, 0.1]

print(arma11_process.arroots)
print(arma11_process.maroots)

[1.25]
[-10.]

arma11_process.isstationary

True

arma11_process.isinvertible

True

arma11_process.acovf(10)

array([3.25 , 2.7  , 2.16 , 1.728, 1.382, 1.106, 0.885, 0.708, 0.566,
       0.453])

arma11_process.acf(10)

array([1.   , 0.831, 0.665, 0.532, 0.425, 0.34 , 0.272, 0.218, 0.174,
       0.139])

arma11_process.arma2ma(15)

array([1.   , 0.9  , 0.72 , 0.576, 0.461, 0.369, 0.295, 0.236, 0.189,
       0.151, 0.121, 0.097, 0.077, 0.062, 0.049])

# impulse responses
arma11_process.impulse_response(15)

array([1.   , 0.9  , 0.72 , 0.576, 0.461, 0.369, 0.295, 0.236, 0.189,
       0.151, 0.121, 0.097, 0.077, 0.062, 0.049])

Estimation of ARMA models by MLE¶

if \(\mathbf{z}\) is a sample from an ARMA processes, it can be represented as (try it for ARMA(1,1))

\[\boldsymbol C \mathbf{z} = \boldsymbol b + \boldsymbol B \boldsymbol \varepsilon \]

where \(\boldsymbol C\) is invertible. Therefore, we can express \(\mathbf{z}\) as

\[ \mathbf{z} = \boldsymbol \mu + \boldsymbol A \boldsymbol \varepsilon, \;\;\; \boldsymbol \mu = \boldsymbol C^{-1} \boldsymbol b \;\;\; \boldsymbol A = \boldsymbol C^{-1} \boldsymbol B \]

With

\[ \boldsymbol \varepsilon \sim \mathcal{N}(\boldsymbol 0, \sigma^2\mathbf{I}) \]

we have

\[ \mathbf{z} \sim \mathcal{N}(\boldsymbol \mu, \mathbf{\Sigma}) \]

where

\[\mathbf{\Sigma} = \sigma^2 \boldsymbol A \boldsymbol A'\]

Therefore, this is a gaussian model

\[ \mathbf{z} \sim \mathcal{N}(\boldsymbol \mu (\mathbf{\theta}_0), \mathbf{\Sigma}(\mathbf{\theta}_0)). \]

where \(\mathbf{\theta}\) collects the AR coefficients (\(\boldsymbol \alpha = \left[\alpha_0, \alpha_1, \alpha_2, \cdots, \alpha_p \right]\)), the MA coefficients (\(\boldsymbol \beta = \left[\beta_1, \beta_2, \cdots, \beta_q \right]\)), and \(\sigma^2\).

and we can use the Gaussian MLE to obtain \(\hat {\boldsymbol \theta}\)

Log-likelihood¶

\[\ell(\boldsymbol \theta | \mathbf{z}) = -\frac{T}{2} \log(2 \pi) - \frac{1}{2}\log(|\mathbf{\Sigma}(\mathbf{\theta})|) - \frac{1}{2}\left( \mathbf{z} - \boldsymbol \mu(\theta)\right)' \mathbf{\Sigma}^{-1}(\mathbf{\theta})\left( \mathbf{z} - \boldsymbol \mu(\theta)\right)\]

Equivalently, by stationarity, the covariance matrix \(\mathbf{\Sigma}\) has Toeplitz structure

\[\begin{split} \mathbf{\Sigma} = \begin{pmatrix} \gamma(0) & \gamma(1) & \gamma(2) & \cdots & \gamma(T-1)\\ \gamma(1) & \gamma(0) & \gamma(1) & \cdots & \gamma(T-2)\\ \gamma(2) & \gamma(1) & \gamma(0) & \cdots & \vdots\\ \vdots & \vdots & \vdots & \cdots & \vdots\\ \gamma(T-1) & \gamma(T-2) & \cdots & \cdots & \gamma(0)\\ \end{pmatrix} \end{split}\]

Note

In practice, the log-likelihood function is evaluated using the Kalman filter (will be discussed later)

ARMA estimation in Python¶

arma11_process.arroots, arma11_process.maroots

(array([1.25]), array([-10.]))

gen = default_rng(1)

T=700
z = arma11_process.generate_sample(T, distrvs=gen.normal)

arma_model = ARIMA(z, order=(1, 0, 1), trend="n")
arma_results = arma_model.fit()
print(arma_results.summary())

                               SARIMAX Results                                
==============================================================================
Dep. Variable:                      y   No. Observations:                  700
Model:                 ARIMA(1, 0, 1)   Log Likelihood                -972.782
Date:                Mon, 14 Mar 2022   AIC                           1951.564
Time:                        10:44:58   BIC                           1965.218
Sample:                             0   HQIC                          1956.842
                                - 700                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ar.L1          0.7903      0.028     28.455      0.000       0.736       0.845
ma.L1          0.0822      0.044      1.874      0.061      -0.004       0.168
sigma2         0.9417      0.049     19.248      0.000       0.846       1.038
===================================================================================
Ljung-Box (L1) (Q):                   0.03   Jarque-Bera (JB):                 0.42
Prob(Q):                              0.86   Prob(JB):                         0.81
Heteroskedasticity (H):               1.38   Skew:                            -0.00
Prob(H) (two-sided):                  0.02   Kurtosis:                         3.12
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

arma11_process

ArmaProcess([1.0, -0.8], [1.0, 0.1], nobs=100) at 0x21055846e60

Model selection¶

need to determine \(p\) (the order of AR part) and \(q\) (the order of MA part)
by balancing model fit and model complexity

want models that capture well the termporal dependence in the data but do it in a parsimonious way

Information criteria: combine a measure of how well a model fits the data with a penalty for the complexity of the model, related the number of model parameters

simpler models are preferred to complex ones if the fit is not much worse.

Everything should be made as simple as possible, but no simpler

Let \(k=p+q+1\) (dimension of \(\mathbf{\theta}\)) and \(\hat{\mathbf{\theta}}\) be the MLE

\[\begin{split} \begin{align} AIC &= -2 \ell(\hat{\mathbf{\theta}} | \mathbf{z}) + 2 k \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \text{Akaike’s Information Criterion} \\ BIC &= -2 \ell(\hat{\mathbf{\theta}} | \mathbf{z}) + k \log(T) \;\;\;\;\;\;\;\;\; \text{Bayesian Information Criterion} \end{align} \end{split}\]

Note

BIC aka as Schwarz Information Criterion

Larger \(\ell(\hat{\mathbf{\theta}} | \mathbf{z})\) means better fit
Larger \(k\) means more parameters (higher complexity)
Lower values of IC are better
In AIC, the penalty for complexity does not depend on the sample size
in BIC it does
as long as \(\log(T)> 2\) the BIC tends to select models with fewer parameters than AIC

Model selection procedure:¶

define a set of models
estimate each one and compute IC
pick the one with lowest IC value

Caveats

AIC and BIC may select different models
they are meant to select the “true” model, but all models are wrong
BIC is designed to find the true model, not best-ftting one (one reason to favor AIC)
Sometimes other criteria may be preferable, depending on the application (e.g. forecasting)

ARMA Model selection in Python¶

res_IC = smt.arma_order_select_ic(z,
                                  ic=["aic", "bic"],
                                  trend="n",
                                  max_ar=5,
                                  max_ma=5,)

res_IC['aic']

	0	1	2	3	4	5
0	2719.340776	2239.261973	2096.383447	2031.290866	1992.290360	1971.391897
1	1952.156686	1951.564413	1950.097669	1950.702283	1952.641670	1954.071994
2	1952.018481	1948.774677	1950.764090	1952.679354	1954.599932	1954.273888
3	1950.620916	1950.765701	1949.777548	1951.359154	1953.032386	1956.223225
4	1950.466444	1952.466307	1951.384101	1953.193051	1954.384883	1955.607200
5	1952.466123	1954.427275	1953.013563	1955.396999	1952.649950	1955.436449

res_IC['bic']

	0	1	2	3	4	5
0	2723.891856	2248.364134	2110.036688	2049.495187	2015.045761	1998.698379
1	1961.258847	1965.217654	1968.301990	1973.457685	1979.948152	1985.929557
2	1965.671722	1966.978998	1973.519492	1979.985836	1986.457494	1990.682531
3	1968.825237	1973.521102	1977.084030	1983.216716	1989.441028	1997.182948
4	1973.221846	1979.772789	1983.241664	1989.601694	1995.344606	2001.118004
5	1979.772605	1986.284837	1989.422206	1996.356722	1998.160754	2005.498332

print(res_IC.aic_min_order)
print(res_IC.bic_min_order)

(2, 1)
(1, 0)

ARMA models

Contents