\[\begin{split}
\boldsymbol S_{T}(\boldsymbol \theta) = \boldsymbol 0\\
\frac{\partial \mu(\theta)'}{\partial \theta_i} \mathbf{\Sigma}^{-1}(\theta)\left(\mathbf{z}- \boldsymbol \mu (\theta)\right) + \frac{1}{2} \operatorname{vec}\left(\frac{\partial \mathbf{\Sigma}^{-1}(\theta)}{\partial \theta_i}\right)' \operatorname{vec}\left( \mathbf{\Sigma}(\theta) - (\mathbf{z} - \boldsymbol \mu (\theta)) (\mathbf{z} - \boldsymbol \mu (\theta))'\right)= 0
\end{split}\]
Intuition
Assume that we know \(\mu(\theta)=0\) (data is de-meaned). MLE solves
\[\begin{split}
\begin{align}
\frac{1}{2} \operatorname{vec}\left(\frac{\partial \mathbf{\Sigma}^{-1}(\theta)}{\partial \theta_i}\right)' \operatorname{vec}\left( \mathbf{\Sigma}(\theta) - \mathbf{z} \mathbf{z}'\right)&= 0\\
\end{align}
\end{split}\]
for all \(i\)
\[\begin{split}
\begin{align}
\boldsymbol W(\theta) \operatorname{vec}\left( \mathbf{\Sigma}(\theta) - \mathbf{z} \mathbf{z}'\right)&= 0\\
\end{align}
\end{split}\]
MLE picks values of \(\theta\) that minimize the difference between empirical (\(\mathbf{z}\mathbf{z}'\)) and theoretical (\(\mathbf{\Sigma}(\theta)\)) second moments
Optimality means that information about \(\theta\) is maximized, i.e. estimation uncertainty is minimized
MLE is equivalent to GMM with a weighting matrix which is optimal when the true distribution is Gaussian.
When \(\mathbf{\mu}(\theta) \neq 0\), the same intuition holds: MLE picks \(\theta\) so as to minimize the difference between empirical and theoretical first and second order moments.
What if the true model is not Gaussian?
Fisher information matrix
\[\begin{split}
\begin{align}
\mathcal{I}_{T, ij}(\boldsymbol \theta) &= \left( \frac{\partial \mu(\theta)}{\partial \theta_i} \right)' \mathbf{\Sigma}^{-1}(\theta) \left( \frac{\partial \mu(\theta)}{\partial \theta_j} \right) \\
& + \frac{1}{2}\operatorname{tr} \left( \frac{\partial \mathbf{\Sigma}^{-1}(\theta)}{\partial \theta_i} \mathbf{\Sigma} (\theta) \frac{\partial \mathbf{\Sigma}^{-1}(\theta)}{\partial \theta_j} \mathbf{\Sigma} (\theta)\right)
\end{align}
\end{split}\]
asymptotic variance matrix of MLE \(\hat{\mathbf{\theta}}\)
\[\left(\underset{T\rightarrow \infty}{\operatorname{lim}} \frac{1}{T} \mathcal{I}_{T, ij}(\mathbf{\theta}_0)\right)^{-1}\]