Run and fine-tune generative AI models with simple APIs and scalable GPU clusters. Train & deploy at scale on The AI Acceleration Cloud.| www.together.ai
Today, we are excited to release the Together Embeddings endpoint! Some of the highlights are:| www.together.ai
Large Language Models (LLMs) have changed the world. However, generating text with them can be slow and expensive. While methods like speculative decoding have been proposed to accelerate the generation speed, their intricate nature has left many in the open-source community hesitant to embrace them.| www.together.ai
The Together Inference Engine is multiple times faster than any other inference service, with 117 tokens per second on Llama-2-70B-Chat and 171 tokens per second on Llama-2-13B-Chat| www.together.ai
Today, Mistral released Mixtral 8x7B, a high-quality sparse mixture of experts model (SMoE) with open weights. | www.together.ai