Bayesian Parameter Estimation (BPE) is fundamentally different compared to MLE or MAP. Whereas the latter two solve for an optimal set of parameters $\hat{\boldsymbol{\theta}}$ for the model, BPE treats $\boldsymbol{\theta}$ as a random variable with a distribution $p(\boldsymbol{\theta})$. Setup We are given a dataset $\mathcal{D}$, which contains $n$ i.i.d. features $\mathbf{x}_j$. Given a new feature vector $\mathbf{x}$, we want to classify it to some class $\omega$. One way to do this is ...