SOTA. Multimodal. Multilingual. Apache 2.0| mistral.ai
Stacking transformer layers to create large models results in better accuracies, few-shot learning capabilities, and even near-human emergent abilities on a wide range of language tasks.| NVIDIA Technical Blog