Goal Suppose we have a dataset of features, but no labels. If we know (or guess) that there are $K$ classes in the dataset, we could model the dataset as the weighted average of $K$ class–conditional Gaussians. This is what Gaussian Mixture Models do. We assume that the model is parameterized by $\boldsymbol \Theta = \{ \pi_k, \mu_k, \sigma^2_k \}_{k=1}^K$, where $\pi_k$ determines the weight of the $k$th Gaussian in the model.