Here’s a thesis (hypothesis, predicate, etc) to chew on: The mixture-of-experts paradigm is fundamentally a hinderance to open source development, and mixtral-8x5B+2B will be summarily supplanted by a dense model like llama3/mistral-70b/yi/qwen/… in the near future.