The goal is essentially the same as MLE. We have an assumed model for $p(\mathbf{x}_j | \omega_j)$ parameterized by $\theta$. We want to classify a feature $\mathbf{x}$ into some class $\omega_j$ based on a labeled dataset $\mathcal{D}$. In MLE, we were trying to maximize the likelihood: $$ \hat{\boldsymbol{\theta}}_{\text{MLE}} = \arg \max_{\boldsymbol{\theta}} p(\mathcal{D} | \boldsymbol{\theta}) $$ In MAP, we instead maximize the a posteriori: $$ \begin{align*} \hat{\boldsymbol{\theta}}_{\...