Time series concepts and models

Definition: A time series is a collection of random variables indexed by time , i.e.

(1){zt:t= ...,2,1, 0, 1, 2, ...}={zt}t=

In this class we will only deal with discrete time series, i.e. where t is discrete as above. This is common in macroeconomics. In finance, continuous time series are often more useful/relevant. Equation (1) is also refered to stochastic process, or stochastic sequence (some authors reserve sequence for discrete and process – for continuous time series)

  • unlike cross-sectional data, time series are ordered sequentially - there is before and after and observations can be close or far away from each other.

  • In practice, we work with observed finite realizations of stochastic processes:

    {z1,z2,...,zT}={zt}t=1T

Note

We use time series for both the process and the realization

  • the set {z1,z2,...,zT} is characterized by its joint distribution.

  • the distribution of the full stochastic process is specified if for an arbitrary set of indices we know the joint distribution of the respective set

Example: {zt}t= is a Gaussian process if any finite subset {zt1,zt2,...,ztk} has a joint Gaussian distribution.

  • A time series sample {zt}t=1T is a single draw from some distribution

  • sample size of 1???

  • how do we estimate moments (learn about the underlying distribution)?

  • in cross-section: data points are independent draws from a common distribution

    • in zi=[yixi] there could be arbitrarily complicated dependence, but zi and zj are independent for ij

  • sample averages estimate population means

  • the equivalent in time series is observing muliple ensembles (paths) of the time series

  • and computing ensemble averages for each t

1Nn=1Nzt(n)

Erg1

Erg2

Ensamble average with n=100

Erg1

Ensamble average with n=10000

Erg1

  • impossible with one observed path (e.g. one history of quarterly GDP numbers)

  • can only compute averages over time

1Tt=1Tzt
  • for time averages to estimate population means, we need ergodic stationarity

Ergodic stationarity

  • With cross-sectional data we have identical distribution and independence

  • the related concepts for time series are stationarity and ergodicity

Stationarity

Definition: The process {zt}t=1T is strictly (or strongly) stationary if its distribution is time invariant.

This means that if we take an arbitraty set of time indices t1,t2,..,tk for some k, the joint distribution of zt1,zt2,,...,ztk stays the same if we shift all indices by the same number of time units, i.e. it is the same as the joint distribution of zt1,zt2,,...,ztk for t=t+h for any h.

Definition: The process {zt}t=1T is covariance (or weakly) stationary if the first two moments of the joint distribution exist and are time invariant

That is, the mean and covariance do not change with the time index:

Ezt=const<
cov(zt,zt+k)=cov(zt,zt+k)<

for t=t+h, k0

  • also wide-sense stationary

  • with k=0 the variance of zt is constant

set t=tk. Then

cov(zt,zt+k)=cov(ztk,zt)
  • stationarity is a form of constancy of the properties of the process

  • needed in order to be able to learn something about those properties

  • strong stationarity is usually too strong an assumption

  • covariance stationarity is typically enough

  • if the first two moments exist, strong stationarity implies weak stationarity

Example: (strictly but not weakly stationary)

iid zt with a Cauchy distribution is strictly stationary but has no finite moments and is therefore not weakly stationary.

  • strong and weak stationarity coincide when zt is a Gaussian process

  • with weak stationarity alone we can still learn first and second order moments (if not the full distribution)

  • if zt is iid then it is strongly stationary

Ergodicity

  • independence of observations implies each contains unique information

  • dependence means that some information is shared

  • for information to accumulate as sample size grows, some unique information must exist

  • i.e. dependence must not be too strong

  • ergodicity is the property that more distant variables in the sequence are closer to being independent

  • (Ergodic theorem) For ergodic and stationary processes, time averages converge to population means

  • functions of ergodic and stationary processes are also ergodic and stationary

  • time averages of such functions converge to respective population averages

Example: If {zt} is stationary and ergodic, then {ztzt} is also stationary and ergodic.
Therefore, the time averages of zt and ztzt converge to Ezt and Eztzt

Example: (stationary but not ergodic)

zt=xt+y

with:

  • xt iid, Ext=0

  • y independent of xt and Ey=0, var(y)>0

  • since xt is iid:

1Tt=1TxtExt=0
  • since y is the same for all t:

1Tt=1Tyy

Therefore:

1Tt=1Tzty

On the other hand, the (ensemble) mean of zt is

Ezt=Ext+Ey=0

This is because dependence between zt and ztk does not decrease as k increases:

cov(zt,ztk)=var(y)

due to the independence among xt, xtk, and y

Important

The ergodic theorem is important because it implies that, in time series settings, a single long sample becomes representative of the whole data-generating process similar to how, in a cross section, a large iid sample becomes representative of the whole population.

Note

A related (to ergodicity) property is called mixing. Defining formally either one is beyond the scope of this course. Informally, a useful way to think about the difference between mixing and ergodicity can be found in Ch 14.7 of B. Hansens’s Econometrics: mixing means that more distant elements of the time series sequence are closer to being independent, while ergodicity means that this is true only on average. Therefore, mixing implies ergodicity, but ergodicity does not imply mixing.

Dependence

  • sample

{z1,z2,...,}
  • in cross section - independent

  • in time series - dependent

What does it mean for two variables z1 and z2 to be independent?

  • Let f(z1,z2) be the joint distribution of z1 and z2. If z1 and z2 are independent, then

f(z1,z2)=f(z1)f(z2)
  • Moreover, from the definition of conditional distribution

f(z1|z2)=f(z1,z2)f(z2)  and  f(z2|z1)=f(z1,z2)f(z1)
  • it follows that if z1 and z2 are independent,

f(z1|z2)=f(z1)  and  f(z2|z1)=f(z2)

When z1 and z2 are independent

  • observing z2 tells us nothing about z1, no new information is gained, beyond the one already contained in the marginal distribution of z1.

  • This is a defining property of a random sample: each observation is independent from all other observations, and is therefore an unique source of information.

When z1 and z2 are dependent

f(z1|z2)f(z1)
  • observing z2 tells us something about z1, and vice versa – z1 is informative about z2.

  • if all z’s are mutually dependent, the larger the number of observations, the smaller is the information value of each individual observation, given all other observations.

Time series models are largely models of the temporal dependence

This has various important implications. One is that standard results from probability theory, such as the law of large numbers or the central limit theorem, are not directly applicable. Another is that we need to somehow model the temporal dependence

Time series models

  • aim to capture complex temporal dependence

  • build by combining processes with simple dependence structure - innovations

Innovations

{εt},E(εt)=0,var(εt)=σ2
  • Gaussian iid noise

  • iid noise

  • stationary martingale difference

  • white noise

Definition: The innovation process {εt} is an (Gaussian) iid noise process if it is iid (and Gaussian)

time series model for the observed data xt is a specification of the joint distributions (or possibly only the means and covariances) of a sequence of random variables {Zt} of which {zt} is postulated to be a realization.

Definition: The innovation process {εt} is a martingale difference process if

E(εt|εt1,εt2,...)=0
  • E(εt)=0 (by LIE)

  • cov(εt,εth)=0, for all h0 (also by LIE)

  • if also covariance stationary: var(εt)=σt2=σ2

Definition: The innovation process {εt} is a white noise process if

var(εt)=σ2<cov(εt,εth)=0,  h0
  • stronger than covariance stationary (uncorrelated)

  • weaker than iid noise (not necessarily independent or identically distributed)

  • weaker than stationary martingale difference

  • uncorrelated and Gaussian => Gaussian iid

Forms of (in)dependence

  • with iid innovations the future is completely independent from the past history

    • the past contains no information about the future (completely unpredictable)

    • conditional distribution equal to the unconditional one

  • with m.d. innovations the mean in the future is completely independent from the past history

    • the past contains no information about the mean (the mean is unpredictable)

    • conditional mean equals the unconditional mean

  • for white noise innovations the mean in the future is linearly independent from the past history

    • linear functions of the past contain no information about the mean (the mean is linearly unpredictable)

Examples of time series models

MA(1) model

zt=εt+θεt1

AR (1) model:

zt=αzt1+εt

ARMA(1,1) model:

zt=αzt1+εt+θεt1