Forecasting

Time series models are (largely) models of the temporal dependence observed in time series data

  • the future depends of the past

  • the past is informative about the future

One motivation behind fitting models to a time series is to forecast future unobserved observations - which would not be possible without a model.

Prediction is interesting for a variety of reasons. First, it is one of the few rationalizations for time-series to be a subject of its own, divorced from economics.

Time series process {zt}t=

Z=[zkzk+1z1z0z1zTzT+1]

Consider zT+1

  • at time T it is a random variable with some (marginal) distribution

  • and a conditional distribution, given z={zt}t=1T

p(zT+1|z)
  • describes all knowledge of zT+1 given z

  • called predictive distribution

similarly for zT+h

p(zT+h|z)

Point forecast

  • What is our best guess zT+hf of zT+h given information availalbe as of T?

  • Depends on how we define “best”, i.e. what the optimality criterion is

  • if the goal is to minimize the expected (mean) square error

E((zT+hzT+hf)2|z)

the best point forecast is

E(zT+h|z)=ET(zT+h)
  • for other loss functions, the optimal forecast is another characteristics of the predictive distribution (e.g. the median)

  • p(zT+h|z) and E(zT+h|z) are unknown

  • we use models to approximate them

Forecasting for AR(1) model

zt=αzt1+εt
  • at time T we observe z1,z2,,zT and want to forecast zT+h

  • h=1

zT+1=αzT+εT+1
E(zT+1|z1,z2,,zT)=E(zT+1|zT)=αzT

why?

the information in past z’s is the same as information in past ε’s, and εT+1 is unpredictable

  • h=2

zT+2=αzT+1+εT+2=α2zT+αεT+1+εT+2
E(zT+2|z1,z2,,zT)=E(zT+2|zT)=α2zT
  • for any h>0

E(zT+h|z1,z2,,zT)=E(zT+h|zT)=αhzT
  • as h

E(zT+h|zT)0=E(zt)

first-order Markov process:

p(zT+1|z1,z2,,zT)=p(zT+1|zT)

zT+1 and z1:T1 are independent, conditional on zT

higher-order Markov processes (AR§):

zT+1 and z1:Tp are independent, conditional on zTp+1:T

Mean squared error of the forecasts (MSE)

  • h=1

E((zT+1zT+1f)2)=Var(zT+1E(zT+1|zT))=Var(zT+1αzT)=Var(εT+1)=σ2
  • h>1

E((zT+hzT+hf)2)=Var(zT+hαhzT)=Var(εT+h+αεT+h1+,+αh1εT+1)=σ2(1+α2++α2(h1))
  • as h

E((zT+hzT+hf)2)σ2(1α2)=Var(zt)

Forecasting for AR§ model

zt=α1zt1+α2zt2++αpztp+εt
  • h=1

zT+1=α1zT+α2zT1++αpzT+1p+εT+1
E(zT+1|z1,z2,,zT)=E(zT+1|zT+1p,,zT1,zT)=α1zT+α2zT1++αpzT+1p

for h>1, it is easier to use a VAR(1) representation:

[ztzt1ztp+1]zt=[α1α2αp1αp10000010]A[zt1zt2ztp]zt1+[εt00]εt
zT+1=AzT+εT+1E(zT+1|zT)=AzT
E(zT+h|zT)=AhzT
zT+hE(zT+h|zT)=Ah1εT+1+Ah+2εT+2++εT+h

Let e=[1,0,0,,0] be a p dimensional vector (first column of a p×p identity matrix). Then

zT+h=ezT+h

and

E(zT+h|zT)=eAhzT
E[(zT+hE(zT+h|zT))2]=(Ah1εT+1+Ah+2εT2++εT+h)2
  • the MSE of the forecast:

eE[(zT+hE(zT+h|zT))2]e=σ2(ah12+ah22++1)

where ai=eAie

Forecasting for MA(q) and ARMA(p,q) models

MA(q)

zt=εt+β1εt1++βqεtq
zT+1=εT+1+β1εT++βqεTq+1
ETzT+1=ETεT+1+β1ETεT++βqETεTq+1=β1ETεT++βqETεTq+1
ETzT+2=β2ETεT++βqETεTq+2(h=2)
ETzT+q=βqETεT(h=q)
  • if h>q

ETzT+h=0

by invertability of MA(q):

zt=β(L)εtεt=1β(L)=α(L)zt
ETεT=ETα(L)zT=ET(zT+α1zT1+α2zT2+)=zT+α1zT1+α2zT2++αT1z1
import warnings
warnings.simplefilter("ignore", FutureWarning)
warnings.simplefilter("ignore", UserWarning)
from statsmodels.tools.sm_exceptions import ConvergenceWarning
warnings.simplefilter('ignore', ConvergenceWarning)

import numpy as np
from numpy.random import default_rng
from statsmodels.tsa.arima_process import arma_acovf, ArmaProcess

import statsmodels.tsa.api as smt

np.set_printoptions(precision=3, suppress=True)

getting αi in Python

beta = np.array([.6, -.3]) # coefficients on e(t-1) and e(t-2)
ar = [1]  # coefficient on z_t
ma = np.r_[1, beta]
ma2_process = ArmaProcess(ar, ma)

ma2_process.arma2ar(100)
#statsmodels.tsa.arima_process.arma_impulse_response(ma=[1.0,], ar=[1, .6, -.3], leads=10)
array([ 1.   , -0.6  ,  0.66 , -0.576,  0.544, -0.499,  0.462, -0.427,
        0.395, -0.365,  0.338, -0.312,  0.289, -0.267,  0.247, -0.228,
        0.211, -0.195,  0.18 , -0.167,  0.154, -0.142,  0.132, -0.122,
        0.112, -0.104,  0.096, -0.089,  0.082, -0.076,  0.07 , -0.065,
        0.06 , -0.055,  0.051, -0.047,  0.044, -0.041,  0.037, -0.035,
        0.032, -0.03 ,  0.027, -0.025,  0.023, -0.022,  0.02 , -0.018,
        0.017, -0.016,  0.015, -0.014,  0.012, -0.012,  0.011, -0.01 ,
        0.009, -0.008,  0.008, -0.007,  0.007, -0.006,  0.006, -0.005,
        0.005, -0.005,  0.004, -0.004,  0.004, -0.003,  0.003, -0.003,
        0.003, -0.002,  0.002, -0.002,  0.002, -0.002,  0.002, -0.001,
        0.001, -0.001,  0.001, -0.001,  0.001, -0.001,  0.001, -0.001,
        0.001, -0.001,  0.001, -0.001,  0.001, -0.   ,  0.   , -0.   ,
        0.   , -0.   ,  0.   , -0.   ])
  • the MSE of forecast:

  • h = 2

zT+2=εT+2+β1εT+1+β2εT++βqETεTq+2ETzT+2=β2ETεT++βqETεTq+2

If ETεTεT, ETεT1εT1, etc:

E[(zT+2E(zT+2|zT))2]e=σ2(1+β12)
  • general h

E[(zT+hE(zT+h|zT))2]e=σ2(1+β12+β22++βh12)

ARMA(p, q)

zt=α1zt1++αpztp+εt+β1εt1++βqεtq
zT+1=α1zT++αpzTp+1+εT+1+β1εT++βqεTq+1
ETzT+1=α1zT++αpzTp+1+β1ETεT++βqETεTq+1
ETzT+2=α1ETzT+1+α2zT++αpzTp+2+β2ETεT++βqETεTq+2
  • obtain ETεT, ETεT1 etc as in the MA(q) case

α(L)zt=β(L)εtεt=α(L)β(L)zt=ψ(L)zt
alpha = np.array([.8])
beta = np.array([0.1])
ar = np.r_[1, -alpha] # coefficient on z(t) and z(t-1)
ma = np.r_[1, beta]  # coefficients on e(t) and e(t-1)
arma11_process = ArmaProcess(ar, ma)

arma11_process.arma2ar(8)
array([ 1.   , -0.9  ,  0.09 , -0.009,  0.001, -0.   ,  0.   , -0.   ])

Using the MA() representation:

α(L)zt=β(L)εtzt=β(L)α(L)εt=ϕ(L)εt

we have $$

(3)zT+h=εT+h+ϕ1εT+h1+ϕ2εT+h2++ϕhεT+ϕh+1εT1+ϕh+2εT2+
and
(4)ETzT+h=ϕhETεT+ϕh+1ETεT1+ϕh+2ETεT2+

$$

  • the MSE of forecast :

E[(zT+hE(zT+h|zT))2]=σ2(1+ϕ12+ϕ22++ϕh12)

assuming (as for MA(q)) ETεTεT, ETεT1εT1, etc:

  • in general E(zT+h|z) is unknown and non-linear function of z

Gaussian time series model

p(z1,z2,,zTz1,zT+1,,zT+hz2)=p(z1,z2)N(μ,Σ).

Then,

p(z2|z1)=N(μ2+Σ21Σ111(z1μ1),Σ22Σ21Σ111Σ12)
  • E(z2|z1)=μ2+Σ21Σ111(z1μ1) optimal forecast

  • cov(z2|z1)=Σ22Σ21Σ111Σ12 MSE of the optimal forecast

  • we can restrict “best” to apply to linear functions only: best linear predictor of zT+h given z

(1)zT+hf=β0+β1z
  • if the loss is the MSE, the best linear predictor is (1) with

β0=EzT+hCov(zT+h,z)Cov(z,z)1Ezβ1=Cov(zT+h,z)Cov(z,z)1

Prediction operator

the best linear prediction of random variable y given a random vector z

P(y|z)=μy+ϕ(zμz)withϕ=cov(y,z)cov(z,z)1
  • has lowest MSE loss

the MSE of P(y|z):

E[(yP(y|z))2]=var(y)cov(y,z)cov(z,z)1cov(y,z)

properties of P(y|z) :

  • E(yP(y|z))=0

  • E((yP(y|z))z)=0

  • P(zi|z))=zi

  • P(y|z))=Ey, if cov(y,z)=0

ARMA forecasting in Python

import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from numpy.random import default_rng
  • generate data from ARMA(1,1)

gen = default_rng(100)
T=700
z = arma11_process.generate_sample(T, distrvs=gen.normal)
  • estimate

arma_model = ARIMA(z, order=(1, 0, 1), trend="n")
arma_results = arma_model.fit()
print(arma_results.summary())
                               SARIMAX Results                                
==============================================================================
Dep. Variable:                      y   No. Observations:                  700
Model:                 ARIMA(1, 0, 1)   Log Likelihood                -998.176
Date:                Mon, 21 Mar 2022   AIC                           2002.351
Time:                        10:22:36   BIC                           2016.005
Sample:                             0   HQIC                          2007.629
                                - 700                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ar.L1          0.8075      0.027     29.733      0.000       0.754       0.861
ma.L1          0.0199      0.047      0.423      0.672      -0.072       0.112
sigma2         1.0126      0.058     17.456      0.000       0.899       1.126
===================================================================================
Ljung-Box (L1) (Q):                   0.00   Jarque-Bera (JB):                 2.05
Prob(Q):                              0.98   Prob(JB):                         0.36
Heteroskedasticity (H):               1.06   Skew:                             0.06
Prob(H) (two-sided):                  0.66   Kurtosis:                         2.76
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
  • forecast the next 5 values

arma_results.forecast(steps=5)
array([-0.816, -0.659, -0.532, -0.43 , -0.347])
  • get forecasts and CI

fcast_res = arma_results.get_forecast(steps=5,)
fcast_res.summary_frame()
y mean mean_se mean_ci_lower mean_ci_upper
0 -0.815737 1.006267 -2.787985 1.156510
1 -0.658719 1.306041 -3.218512 1.901073
2 -0.531925 1.468926 -3.410966 2.347116
3 -0.429537 1.566040 -3.498920 2.639846
4 -0.346857 1.626246 -3.534241 2.840527
  • working with dates

data = pd.DataFrame(z, index=pd.date_range(start='1964', freq='M', periods=T), columns=['z'])
data.tail() 
z
2021-12-31 -1.983945
2022-01-31 -3.883839
2022-02-28 -2.971514
2022-03-31 -2.339958
2022-04-30 -1.031274
arma_model = ARIMA(data, order=(1, 0, 1), trend="n")
arma_results = arma_model.fit()
fcast_res = arma_results.get_forecast(steps=24)
fcast = fcast_res.summary_frame()
fcast.head()
z mean mean_se mean_ci_lower mean_ci_upper
2022-05-31 -0.815737 1.006267 -2.787985 1.156510
2022-06-30 -0.658719 1.306041 -3.218512 1.901073
2022-07-31 -0.531925 1.468926 -3.410966 2.347116
2022-08-31 -0.429537 1.566040 -3.498920 2.639846
2022-09-30 -0.346857 1.626246 -3.534241 2.840527
  • plot forecasts

fig, ax = plt.subplots(figsize=(15, 5))

# Plot the data (here we are subsetting it to get a better look at the forecasts)
data.loc['2018':].plot(ax=ax)
# Construct the forecasts
fcast['mean'].plot(ax=ax, style='k--')
ax.fill_between(fcast.index, fcast['mean_ci_lower'], fcast['mean_ci_upper'], color='k', alpha=0.1);
 
../../_images/04-Forecasting_87_0.png

In-sample forecast evaluation

use

1Tt=0T1(zt+1z^t+1f)2

as an estimate of the MSE

E((zt+1zt+1f)2)
data['forecast'] = arma_results.forecasts.flatten()
data['forecast errors'] = arma_results.forecasts_error.flatten()

data.tail()
z forecast forecast errors
2021-12-31 -1.983945 -0.373014 -1.610931
2022-01-31 -3.883839 -1.634070 -2.249770
2022-02-28 -2.971514 -3.180953 0.209439
2022-03-31 -2.339958 -2.395378 0.055420
2022-04-30 -1.031274 -1.888448 0.857174
(arma_results.forecasts_error**2).sum()/T
1.0138467224702907
arma_results.mse
1.0138467224702907

In-sample forecasts are based on model estimated using the full sample

  • in AR(1) model

z^t+1f=α^zt

introduces information from the future relative to the time (t<T) when forecasts are produced

Out-of-sample (OOS) forecast evaluation

parameter estimates of the forecasting model use only information available at the time when the forecast is computed

  • fit the model on a training sample z1,,zt for some t=t0<T

  • produce forecast z^t+1f from the end of that sample

  • compare to true value zt+1

  • expand the sample to include the next observation, and repeat until t=T1

The OOS estimate of MSE

1Tt0t=t01T1(zt+1z^t+1f)2
data = data[['z']]
# Setup forecasts
nforecasts = 1
forecasts = {}

# Get the number of initial training observations
nobs = len(data)
n_init_training = int(nobs * 0.8)

# Create model for initial training sample, fit parameters
init_training_endog = data.iloc[:n_init_training]
mod = ARIMA(init_training_endog, order=(1, 0, 1), trend="n")
res = mod.fit()

# Save initial forecast
forecasts[init_training_endog.index[-1]] = res.forecast(steps=nforecasts)

# Step through the rest of the sample
for t in range(n_init_training, nobs-1):
    # Update the results by appending the next observation
    updated_data = data.iloc[t:t+1]
    res = res.append(updated_data, refit=True)

    # Save the new set of forecasts
    forecasts[updated_data.index[0]] = res.forecast(steps=nforecasts)

# Combine all forecasts into a dataframe
forecasts = pd.concat(forecasts, axis=1)
forecasts.iloc[:5, :5]
2010-08-31 2010-09-30 2010-10-31 2010-11-30 2010-12-31
2010-09-30 1.632432 NaN NaN NaN NaN
2010-10-31 NaN 0.928112 NaN NaN NaN
2010-11-30 NaN NaN 0.127096 NaN NaN
2010-12-31 NaN NaN NaN 1.271923 NaN
2011-01-31 NaN NaN NaN NaN 0.60945
forecasts.iloc[-5:, -5:]
2021-11-30 2021-12-31 2022-01-31 2022-02-28 2022-03-31
2021-12-31 -0.373446 NaN NaN NaN NaN
2022-01-31 NaN -1.62698 NaN NaN NaN
2022-02-28 NaN NaN -3.185304 NaN NaN
2022-03-31 NaN NaN NaN -2.400559 NaN
2022-04-30 NaN NaN NaN NaN -1.891933
# Construct the forecast errors
forecast_errors = {}
for c in forecasts.columns:
    forecast_errors[c] = data.iloc[n_init_training:, 0] - forecasts.loc[:,c]
    
forecast_errors = pd.DataFrame.from_dict(forecast_errors)    
def flatten(column):
    return column.dropna().reset_index(drop=True)

flattened = forecast_errors.apply(flatten)
flattened.index = (flattened.index + 1).rename('horizon')
flattened.T.head()
horizon 1
2010-08-31 -0.496213
2010-09-30 -0.757934
2010-10-31 1.392945
2010-11-30 -0.523532
2010-12-31 -0.075812
forecast_errors = flattened.T.dropna().values.flatten()
# Compute the root mean square error
rmse = (forecast_errors**2).mean()**0.5

print(rmse)
1.0117784839655213