MLE of Gaussian models

Gaussian model

zN(μ(θ0),Σ(θ0)).

Log-likelihood

(θ|z)=T2log(2π)12log(|Σ(θ)|)12(zμ(θ))Σ1(θ)(zμ(θ))

score w.r.t θi

{ST(θ)}i=12tr(Σ1(θ)θiΣ(θ))+μ(θ)θiΣ1(θ)(zμ(θ))12(zμ(θ))Σ1(θ)θi(zμ(θ))=μ(θ)θiΣ1(θ)(zμ(θ))+12tr(Σ1(θ)θi(Σ(θ)(zμ(θ))(zμ(θ))))=μ(θ)θiΣ1(θ)(zμ(θ))+12vec(Σ1(θ)θi)vec(Σ(θ)(zμ(θ))(zμ(θ)))

FOC:

ST(θ)=0μ(θ)θiΣ1(θ)(zμ(θ))+12vec(Σ1(θ)θi)vec(Σ(θ)(zμ(θ))(zμ(θ)))=0

Intuition

Assume that we know μ(θ)=0 (data is de-meaned). MLE solves

12vec(Σ1(θ)θi)vec(Σ(θ)zz)=0

for all i

W(θ)vec(Σ(θ)zz)=0
  • MLE picks values of θ that minimize the difference between empirical (zz) and theoretical (Σ(θ)) second moments

  • Optimality means that information about θ is maximized, i.e. estimation uncertainty is minimized

  • MLE is equivalent to GMM with a weighting matrix which is optimal when the true distribution is Gaussian.

  • When μ(θ)0, the same intuition holds: MLE picks θ so as to minimize the difference between empirical and theoretical first and second order moments.

What if the true model is not Gaussian?

  • other moments (than first and second) will be informative

  • the Gaussian weights are not optimal

Fisher information matrix

IT,ij(θ)=(μ(θ)θi)Σ1(θ)(μ(θ)θj)+12tr(Σ1(θ)θiΣ(θ)Σ1(θ)θjΣ(θ))

asymptotic variance matrix of MLE θ^

(limT1TIT,ij(θ0))1