MLE of Gaussian models

Gaussian model

\[ \mathbf{z} \sim \mathcal{N}(\boldsymbol \mu (\mathbf{\theta}_0), \mathbf{\Sigma}(\mathbf{\theta}_0)). \]

Log-likelihood

\[\ell(\boldsymbol \theta | \mathbf{z}) = -\frac{T}{2} \log(2 \pi) - \frac{1}{2}\log(|\mathbf{\Sigma}(\mathbf{\theta})|) - \frac{1}{2}\left( \boldsymbol z - \boldsymbol \mu(\theta)\right)' \mathbf{\Sigma}^{-1}(\mathbf{\theta})\left( \boldsymbol z - \boldsymbol \mu(\theta)\right)\]

score w.r.t \(\theta_i\)

\[\begin{split} \begin{align} \{\boldsymbol S_{T}(\boldsymbol \theta)\}_i &= \frac{1}{2} \operatorname{tr} \left( \frac{\partial \mathbf{\Sigma}^{-1}(\theta)}{\partial \theta_i} \mathbf{\Sigma}(\theta) \right) + \frac{\partial \mu(\theta)'}{\partial \theta_i}\mathbf{\Sigma}^{-1}(\theta)\left(\mathbf{z} - \boldsymbol \mu (\theta)\right) - \frac{1}{2}(\mathbf{z} - \boldsymbol \mu (\theta))' \frac{\partial \mathbf{\Sigma}^{-1}(\theta)}{\partial \theta_i} (\mathbf{z} - \boldsymbol \mu (\theta)) \\ &= \frac{\partial \mu(\theta)'}{\partial \theta_i}\mathbf{\Sigma}^{-1}(\theta)\left(\mathbf{z} - \boldsymbol \mu (\theta)\right) + \frac{1}{2} \operatorname{tr} \left( \frac{\partial \mathbf{\Sigma}^{-1}(\theta)}{\partial \theta_i} \left(\mathbf{\Sigma}(\theta) - (\mathbf{z} - \boldsymbol \mu (\theta)) (\mathbf{z} - \boldsymbol \mu (\theta))' \right) \right)\\ &=\frac{\partial \mu(\theta)'}{\partial \theta_i} \mathbf{\Sigma}^{-1}(\theta)\left(\mathbf{z}- \boldsymbol \mu (\theta)\right) + \frac{1}{2} \operatorname{vec}\left(\frac{\partial \mathbf{\Sigma}^{-1}(\theta)}{\partial \theta_i}\right)' \operatorname{vec}\left( \mathbf{\Sigma}(\theta) - (\mathbf{z} - \boldsymbol \mu (\theta)) (\mathbf{z} - \boldsymbol \mu (\theta))'\right) \end{align} \end{split}\]

FOC:

\[\begin{split} \boldsymbol S_{T}(\boldsymbol \theta) = \boldsymbol 0\\ \frac{\partial \mu(\theta)'}{\partial \theta_i} \mathbf{\Sigma}^{-1}(\theta)\left(\mathbf{z}- \boldsymbol \mu (\theta)\right) + \frac{1}{2} \operatorname{vec}\left(\frac{\partial \mathbf{\Sigma}^{-1}(\theta)}{\partial \theta_i}\right)' \operatorname{vec}\left( \mathbf{\Sigma}(\theta) - (\mathbf{z} - \boldsymbol \mu (\theta)) (\mathbf{z} - \boldsymbol \mu (\theta))'\right)= 0 \end{split}\]

Intuition

Assume that we know \(\mu(\theta)=0\) (data is de-meaned). MLE solves

\[\begin{split} \begin{align} \frac{1}{2} \operatorname{vec}\left(\frac{\partial \mathbf{\Sigma}^{-1}(\theta)}{\partial \theta_i}\right)' \operatorname{vec}\left( \mathbf{\Sigma}(\theta) - \mathbf{z} \mathbf{z}'\right)&= 0\\ \end{align} \end{split}\]

for all \(i\)

\[\begin{split} \begin{align} \boldsymbol W(\theta) \operatorname{vec}\left( \mathbf{\Sigma}(\theta) - \mathbf{z} \mathbf{z}'\right)&= 0\\ \end{align} \end{split}\]
  • MLE picks values of \(\theta\) that minimize the difference between empirical (\(\mathbf{z}\mathbf{z}'\)) and theoretical (\(\mathbf{\Sigma}(\theta)\)) second moments

  • Optimality means that information about \(\theta\) is maximized, i.e. estimation uncertainty is minimized

  • MLE is equivalent to GMM with a weighting matrix which is optimal when the true distribution is Gaussian.

  • When \(\mathbf{\mu}(\theta) \neq 0\), the same intuition holds: MLE picks \(\theta\) so as to minimize the difference between empirical and theoretical first and second order moments.

What if the true model is not Gaussian?

  • other moments (than first and second) will be informative

  • the Gaussian weights are not optimal

Fisher information matrix

\[\begin{split} \begin{align} \mathcal{I}_{T, ij}(\boldsymbol \theta) &= \left( \frac{\partial \mu(\theta)}{\partial \theta_i} \right)' \mathbf{\Sigma}^{-1}(\theta) \left( \frac{\partial \mu(\theta)}{\partial \theta_j} \right) \\ & + \frac{1}{2}\operatorname{tr} \left( \frac{\partial \mathbf{\Sigma}^{-1}(\theta)}{\partial \theta_i} \mathbf{\Sigma} (\theta) \frac{\partial \mathbf{\Sigma}^{-1}(\theta)}{\partial \theta_j} \mathbf{\Sigma} (\theta)\right) \end{align} \end{split}\]

asymptotic variance matrix of MLE \(\hat{\mathbf{\theta}}\)

\[\left(\underset{T\rightarrow \infty}{\operatorname{lim}} \frac{1}{T} \mathcal{I}_{T, ij}(\mathbf{\theta}_0)\right)^{-1}\]