Time series concepts and models¶

Definition: A time series is a collection of random variables indexed by time , i.e.

(1)¶

{z_{t} : t = . . ., - 2, - 1, 0, 1, 2, . . .} = {z_{t}}_{t = - \infty}^{\infty}

In this class we will only deal with discrete time series, i.e. where $t$ is discrete as above. This is common in macroeconomics. In finance, continuous time series are often more useful/relevant. Equation (1) is also refered to stochastic process, or stochastic sequence (some authors reserve sequence for discrete and process – for continuous time series)

unlike cross-sectional data, time series are ordered sequentially - there is before and after and observations can be close or far away from each other.

In practice, we work with observed finite realizations of stochastic processes:

${z_{1}, z_{2}, . . ., z_{T}} = {z_{t}}_{t = 1}^{T}$

Note

We use time series for both the process and the realization

the set ${z_{1}, z_{2}, . . ., z_{T}}$ is characterized by its joint distribution.

the distribution of the full stochastic process is specified if for an arbitrary set of indices we know the joint distribution of the respective set

Example: ${z_{t}}_{t = - \infty}^{\infty}$ is a Gaussian process if any finite subset ${z_{t_{1}}, z_{t_{2}}, . . ., z_{t_{k}}}$ has a joint Gaussian distribution.

A time series sample ${z_{t}}_{t = 1}^{T}$ is a single draw from some distribution

sample size of 1???

how do we estimate moments (learn about the underlying distribution)?

in cross-section: data points are independent draws from a common distribution
- in $z_{i} = [\begin{matrix} y_{i} \\ x_{i} \end{matrix}]$ there could be arbitrarily complicated dependence, but $z_{i}$ and $z_{j}$ are independent for $i \neq j$

sample averages estimate population means

the equivalent in time series is observing muliple ensembles (paths) of the time series

and computing ensemble averages for each $t$

\frac{1}{N} \sum_{n = 1}^{N} z_{t}^{(n)}

Erg1

Erg2

Ensamble average with n=100

Erg1

Ensamble average with n=10000

Erg1

impossible with one observed path (e.g. one history of quarterly GDP numbers)

can only compute averages over time

\frac{1}{T} \sum_{t = 1}^{T} z_{t}

for time averages to estimate population means, we need ergodic stationarity

Ergodic stationarity¶

With cross-sectional data we have identical distribution and independence
the related concepts for time series are stationarity and ergodicity

Stationarity¶

Definition: The process ${z_{t}}_{t = 1}^{T}$ is strictly (or strongly) stationary if its distribution is time invariant.

This means that if we take an arbitraty set of time indices $t_{1}, t_{2}, . ., t_{k}$ for some $k$ , the joint distribution of $z_{t_{1}}, z_{t_{2}},, . . ., z_{t_{k}}$ stays the same if we shift all indices by the same number of time units, i.e. it is the same as the joint distribution of $z_{t_{1}^{'}}, z_{t_{2}^{'}},, . . ., z_{t_{k}^{'}}$ for $t^{'} = t + h$ for any $h$ .

Definition: The process ${z_{t}}_{t = 1}^{T}$ is covariance (or weakly) stationary if the first two moments of the joint distribution exist and are time invariant

That is, the mean and covariance do not change with the time index:

E z_{t} = const < \infty

cov (z_{t}, z_{t + k}) = cov (z_{t^{'}}, z_{t^{'} + k}) < \infty

for $t^{'} = t + h$ , $k \geq 0$

also wide-sense stationary

with $k = 0$ the variance of $z_{t}$ is constant

set $t^{'} = t - k$ . Then

cov (z_{t}, z_{t + k}) = cov (z_{t - k}, z_{t})

stationarity is a form of constancy of the properties of the process
needed in order to be able to learn something about those properties
strong stationarity is usually too strong an assumption
covariance stationarity is typically enough

if the first two moments exist, strong stationarity implies weak stationarity

Example: (strictly but not weakly stationary)

iid $z_{t}$ with a Cauchy distribution is strictly stationary but has no finite moments and is therefore not weakly stationary.

strong and weak stationarity coincide when $z_{t}$ is a Gaussian process
with weak stationarity alone we can still learn first and second order moments (if not the full distribution)
if $z_{t}$ is iid then it is strongly stationary

Ergodicity¶

independence of observations implies each contains unique information

dependence means that some information is shared

for information to accumulate as sample size grows, some unique information must exist

i.e. dependence must not be too strong

ergodicity is the property that more distant variables in the sequence are closer to being independent

(Ergodic theorem) For ergodic and stationary processes, time averages converge to population means

functions of ergodic and stationary processes are also ergodic and stationary

time averages of such functions converge to respective population averages

Example: If ${z_{t}}$ is stationary and ergodic, then ${z_{t} z_{t}}$ is also stationary and ergodic.
Therefore, the time averages of $z_{t}$ and $z_{t} z_{t}$ converge to $E z_{t}$ and $E z_{t} z_{t}$

Example: (stationary but not ergodic)

z_{t} = x_{t} + y

with:

$x_{t}$ iid, $E x_{t} = 0$
$y$ independent of $x_{t}$ and $E y = 0$ , $var (y) > 0$

since $x_{t}$ is iid:

\frac{1}{T} \sum_{t = 1}^{T} x_{t} ⟶ E x_{t} = 0

since $y$ is the same for all $t$ :

\frac{1}{T} \sum_{t = 1}^{T} y ⟶ y

Therefore:

\frac{1}{T} \sum_{t = 1}^{T} z_{t} ⟶ y

On the other hand, the (ensemble) mean of $z_{t}$ is

E z_{t} = E x_{t} + E y = 0

This is because dependence between $z_{t}$ and $z_{t - k}$ does not decrease as $k$ increases:

cov (z_{t}, z_{t - k}) = var (y)

due to the independence among $x_{t}$ , $x_{t - k}$ , and $y$

Important

The ergodic theorem is important because it implies that, in time series settings, a single long sample becomes representative of the whole data-generating process similar to how, in a cross section, a large iid sample becomes representative of the whole population.

Note

A related (to ergodicity) property is called mixing. Defining formally either one is beyond the scope of this course. Informally, a useful way to think about the difference between mixing and ergodicity can be found in Ch 14.7 of B. Hansens’s Econometrics: mixing means that more distant elements of the time series sequence are closer to being independent, while ergodicity means that this is true only on average. Therefore, mixing implies ergodicity, but ergodicity does not imply mixing.

Dependence¶

sample

{z_{1}, z_{2}, . . .,}

in cross section - independent
in time series - dependent

What does it mean for two variables $z_{1}$ and $z_{2}$ to be independent?

Let $f (z_{1}, z_{2})$ be the joint distribution of $z_{1}$ and $z_{2}$ . If $z_{1}$ and $z_{2}$ are independent, then

f (z_{1}, z_{2}) = f (z_{1}) f (z_{2})

Moreover, from the definition of conditional distribution

f (z_{1} | z_{2}) = \frac{f (z_{1}, z_{2})}{f (z_{2})} and f (z_{2} | z_{1}) = \frac{f (z_{1}, z_{2})}{f (z_{1})}

it follows that if $z_{1}$ and $z_{2}$ are independent,

f (z_{1} | z_{2}) = f (z_{1}) and f (z_{2} | z_{1}) = f (z_{2})

When $z_{1}$ and $z_{2}$ are independent¶

observing $z_{2}$ tells us nothing about $z_{1}$ , no new information is gained, beyond the one already contained in the marginal distribution of $z_{1}$ .

This is a defining property of a random sample: each observation is independent from all other observations, and is therefore an unique source of information.

When $z_{1}$ and $z_{2}$ are dependent¶

f (z_{1} | z_{2}) \neq f (z_{1})

observing $z_{2}$ tells us something about $z_{1}$ , and vice versa – $z_{1}$ is informative about $z_{2}$ .

if all $z$ ’s are mutually dependent, the larger the number of observations, the smaller is the information value of each individual observation, given all other observations.

Time series models are largely models of the temporal dependence

This has various important implications. One is that standard results from probability theory, such as the law of large numbers or the central limit theorem, are not directly applicable. Another is that we need to somehow model the temporal dependence

Time series models¶

aim to capture complex temporal dependence
build by combining processes with simple dependence structure - innovations

Innovations¶

{ε_{t}}, E (ε_{t}) = 0, var (ε_{t}) = σ^{2}

Gaussian iid noise
iid noise
stationary martingale difference
white noise

Definition: The innovation process ${ε_{t}}$ is an (Gaussian) iid noise process if it is iid (and Gaussian)

time series model for the observed data $x_{t}$ is a specification of the joint distributions (or possibly only the means and covariances) of a sequence of random variables ${Z_{t}}$ of which ${z_{t}}$ is postulated to be a realization.

Definition: The innovation process ${ε_{t}}$ is a martingale difference process if

\begin{array}{r} \begin{aligned} E (ε_{t} | ε_{t - 1}, ε_{t - 2}, . . .) & = 0 \end{aligned} \end{array}

$E (ε_{t}) = 0$ (by LIE)

$cov (ε_{t}, ε_{t - h}) = 0$ , for all $h \neq 0$ (also by LIE)

if also covariance stationary: $var (ε_{t}) = σ_{t}^{2} = σ^{2}$

Definition: The innovation process ${ε_{t}}$ is a white noise process if

\begin{array}{r} \begin{aligned} var (ε_{t}) & = σ^{2} < \infty \\ cov (ε_{t}, ε_{t - h}) & = 0, h \neq 0 \end{aligned} \end{array}

stronger than covariance stationary (uncorrelated)
weaker than iid noise (not necessarily independent or identically distributed)
weaker than stationary martingale difference
uncorrelated and Gaussian => Gaussian iid

Forms of (in)dependence¶

with iid innovations the future is completely independent from the past history
- the past contains no information about the future (completely unpredictable)
- conditional distribution equal to the unconditional one
with m.d. innovations the mean in the future is completely independent from the past history
- the past contains no information about the mean (the mean is unpredictable)
- conditional mean equals the unconditional mean
for white noise innovations the mean in the future is linearly independent from the past history
- linear functions of the past contain no information about the mean (the mean is linearly unpredictable)

Examples of time series models¶

MA(1) model

z_{t} = ε_{t} + θ ε_{t - 1}

AR (1) model:

z_{t} = α z_{t - 1} + ε_{t}

ARMA(1,1) model:

z_{t} = α z_{t - 1} + ε_{t} + θ ε_{t - 1}

Time series concepts and models

Contents

Time series concepts and models¶

Ergodic stationarity¶

Stationarity¶

Ergodicity¶

Dependence¶

When z1 and z2 are independent¶

When z1 and z2 are dependent¶

Time series models¶

Innovations¶

Forms of (in)dependence¶

Examples of time series models¶

When $z_{1}$ and $z_{2}$ are independent¶

When $z_{1}$ and $z_{2}$ are dependent¶