Maximum likelihood estimation

Time series model

  • a specification of the joint distribution of \(\{z_𝑡\}\)

\[ p(\mathbf{z}; \boldsymbol \theta) \]

definition: Likelihood function

\[ \mathcal{L}(\boldsymbol \theta | \mathbf{z}) = p(\mathbf{z}; \boldsymbol \theta) \]

Note

The likelihood function is identical in functional form to the PDF of \(\mathbf{z}\), \(p(\mathbf{z};\boldsymbol \theta)\), but is interpreted as a function of \(\boldsymbol \theta\), for a given value of \(\mathbf{z}\), rather than as a function of \(\mathbf{z}\) for a given value of \(\boldsymbol \theta\).

definition: Log-likelihood function

\[ \ell(\boldsymbol \theta | \mathbf{z}) = \operatorname{log}(\mathcal{L}(\boldsymbol \theta | \mathbf{z})\]

The maximum likelihood estimator (MLE)

\[\begin{split} \begin{align} \hat {\boldsymbol \theta} &= \underset{\boldsymbol \theta}{\mathrm{argmax}}~~\mathcal{L}(\boldsymbol \theta | \mathbf{z}) \\ &= \underset{\boldsymbol \theta}{\mathrm{argmax}}~~\ell(\boldsymbol \theta | \mathbf{z}) \end{align} \end{split}\]

Rationale for MLE

For a given \(\boldsymbol \theta\), the value of \(p(\mathbf{z}; \boldsymbol \theta) d \mathbf{z}\) evaluated at the observed sample \(\mathbf{z}\) tells us what is the probability of observing a sample in a small neighborhood around the actual \(\mathbf{z}\) for that value of \(\boldsymbol \theta\). Compared to the MLE \(\hat {\boldsymbol \theta}\), any other value of \(\boldsymbol \theta\) is associated with a pdf that assigns a lower probability of observing such a sample. Therefore, \(\hat {\boldsymbol \theta}\) is the value most supported by the observed sample.

Note

Difference between ML estimator and ML estimate:

  • estimator: \(\hat {\boldsymbol \theta}\) as a function of a generic sample \(\mathbf{z}\)

  • estimate: the value \(\hat {\boldsymbol \theta}\) at a particular sample \(\mathbf{z}\)

Score

\[ \boldsymbol S_{T}(\boldsymbol \theta) = \frac{\partial}{\partial \boldsymbol \theta } \ell(\boldsymbol \theta | \mathbf{z}) \]
  • describes the steepness of log-likelihood function

  • MLE \(\hat {\boldsymbol \theta}\) solves

\[ \boldsymbol S_{T}(\hat {\boldsymbol \theta}) = \boldsymbol 0 \]

Observed Fisher information

\[\begin{split} \begin{align} \boldsymbol I_{T}(\hat {\boldsymbol \theta}) &= -\frac{\partial}{\partial \boldsymbol \theta } \boldsymbol S_{T}(\boldsymbol \theta) |_{\hat {\boldsymbol \theta}} \\ &= -\frac{\partial^2}{\partial \boldsymbol \theta \partial \boldsymbol \theta'} \ell(\boldsymbol \theta | \mathbf{z})|_{\hat {\boldsymbol \theta}} \end{align} \end{split}\]
  • describes the curvature of the log-likelihood function at the maximum \(\hat {\boldsymbol \theta}\)

  • measures how much information about \(\boldsymbol \theta\) we have at the MLE.

Expected Fisher information

\[\mathcal{I}_{T}(\boldsymbol \theta) = \operatorname{E}\left[ \boldsymbol S_{T}(\boldsymbol \theta) \boldsymbol S_{T}(\boldsymbol \theta)' \right] \]
\[\mathcal{I}_{T}(\boldsymbol \theta) = -\operatorname{E}\left[ \frac{\partial}{\partial \boldsymbol \theta } \boldsymbol S_{T}(\boldsymbol \theta)\right] = \operatorname{E}\left[ \boldsymbol I_{T}(\boldsymbol \theta) \right]\]
  • expected curvature of the log-likelihood function

  • measures how much information about \(\boldsymbol \theta\) we can expect to have

Consistency and asymptotic normality of MLE

Assumption: \(\mathbf{z}\) is a draw from \( p(\mathbf{z}; \boldsymbol \theta_0)\), \(\boldsymbol \theta_0\) - true value of \(\boldsymbol \theta\)

  • \(\hat {\boldsymbol \theta}\) is consitent estimator of \(\boldsymbol \theta_0\)

\[ \hat {\boldsymbol \theta} \longrightarrow \boldsymbol \theta_0 \]
  • \(\hat {\boldsymbol \theta}\) is asymptotically normally distributed

\[ \sqrt{T} \left(\hat {\boldsymbol \theta} - \boldsymbol \theta_0 \right) \longrightarrow \mathcal{N}\left(\boldsymbol 0, \mathcal{I}^{-1}_{\infty}(\boldsymbol \theta_0) \right) \]

where

\[\mathcal{I}_{\infty}(\boldsymbol \theta) = \underset{T\rightarrow \infty}{\operatorname{lim}} \frac{1}{T}\mathcal{I}_{T}(\boldsymbol \theta)\]
\[ \hat {\boldsymbol \theta} \overset{a}{\sim} \mathcal{N}\left(\boldsymbol \theta_0, \frac{1}{T}\mathcal{I}^{-1}_{\infty}(\boldsymbol \theta_0) \right) \]