Hello everybody, I hope you’ve been enjoying this summer after two years of Covid and lockdowns :D In this post I’m going to describe how| evilsocket
This is the story of a summer project that started out of boredom and that evolved into something incredibly fun and unique. It is also the story of how that project went from being discussed on a porch by just two people, to having a community made of almost 700 awesome people (and counting!) that gathered, polished it and made today’s release possible. TL;DR: You can download the 1.0.0 .img file from here, then just follow the instructions. If you want the long version instead, sit back, ...| evilsocket
Large Language Models (LLMs) and their multi-modal variants offer significant benefits in automating complex processes, with Document Understanding (DU) being a particularly promising application. In DU, the challenge often lies in integrating text, layout, and graphical elements to accurately extract necessary information. In a new paper Arctic-TILT. Business Document Understanding at Sub-Billion Scale, a research| Synced
A foundation model is a type of artificial intelligence neural network trained on vast amounts of raw data, typically through unsupervised learning, and designed to be adaptable for a wide range of tasks. In a new paper Apple Intelligence Foundation Language Models, an Apple research team introduces the foundation language models developed to power Apple| Synced
Robot learning has seen remarkable advancements in recent years; however, achieving human-level performance in terms of accuracy, speed, and adaptability remains a significant challenge across various domains. One such domain is table tennis—a sport that demands years of rigorous training for human players to reach an advanced level of proficiency. In a new paper Achieving| Synced
Video captioning is essential for enhancing content accessibility and searchability by providing precise and searchable descriptions of video content. However, the task of generating accurate, descriptive, and detailed video captions remains challenging due to several factors: the limited availability of high-quality labeled data and the additional complexity involved in video captioning, such as temporal correlations| Synced
In natural language processing (NLP) applications, long prompts pose significant challenges, including slower inference speed, higher computational costs, and a diminished user experience. Furthermore, the limitations imposed by context length restrict model performance and application scope, creating a strong need to reduce prompt length. In a new paper 500xCompressor: Generalized Prompt Compression for Large Language| Synced
Foundation models, also known as general-purpose AI systems, are a rising trend in AI research. These models excel in diverse tasks such as text synthesis, image manipulation, and audio generation. Notable examples include OpenAI’s GPT-3 and GPT-4, which power the conversational agent ChatGPT. In a new paper The Llama 3 Herd of Models, a Meta| Synced
For years, embedding models based on bidirectional language models have led the field, excelling in retrieval and general-purpose embedding tasks. However, past top-tier methods have relied on fine-tuning Large Language Models (LLMs) with extensive amounts of proprietary synthetic data from GPT-4, which isn't accessible to the broader community. In a new paper NV-Embed: Improved Techniques| Synced
Transformers have fundamentally transformed the field of natural language processing, driving significant advancements across numerous applications. With their widespread success, there is a growing interest in understanding the complex mechanisms of these models. One key aspect that has not been thoroughly examined is the inherent linearity of intermediate embedding transformations within transformer architectures. In a| Synced
The field of medical artificial intelligence (AI) is advancing rapidly, heralding a new era of diagnostic accuracy and patient care. Researchers have been focusing on developing AI solutions for specific tasks, but current medical AI systems are often limited to narrow applications, hindering their broader adoption in clinical practice. In face of this limitation, in| Synced
Large language models (LLMs) have demonstrated remarkable proficiency in various natural language tasks and an impressive ability to follow open-ended instructions, showcasing strong generalization capabilities. Despite these successes, a notable limitation of LLMs is their inability to perceive non-textual modalities such as audio. In a new paper SpeechVerse: A Large-scale Generalizable Audio Language Model, a| Synced
In various domains, Diffusion Models (DMs) have emerged as groundbreaking tools, offering an unparalleled blend of realism and diversity while ensuring stable training. However, their sequential denoising process poses significant challenges, being time-consuming and costly. In a new paper Imagine Flash: Accelerating Emu Diffusion Models with Backward Distillation, a Meta GenAI research team introduces an| Synced
Achieving excellence across diverse medical applications presents significant hurdles for artificial intelligence (AI), demanding advanced reasoning abilities, access to the latest medical knowledge, and comprehension of intricate multimodal data. Gemini models, Google's cutting-edge AI, stand out for their robust general capabilities in multimodal and long-context reasoning, presenting promising avenues in the realm of medicine. In| Synced
Multi-layer perceptrons (MLPs) stand as the bedrock of contemporary deep learning architectures, serving as indispensable components in various machine learning applications. Leveraging the expressive power conferred by the universal approximation theorem, MLPs excel in approximating nonlinear functions, embodying a default choice for many tasks. However, despite their widespread adoption, MLPs harbor notable limitations. They often| Synced
Ensuring that Large Language Models (LLMs) align with human values and preferences is crucial for their utility and safety. Yet, devising effective tools for this alignment presents significant challenges, particularly with the largest and most sophisticated LLMs, which often boast tens or hundreds of billions of parameters. In a new paper NeMo-Aligner: Scalable Toolkit for| Synced
How to set up your own Ubuntu instance of “neural-style”, the deep neural network that makes fine art!| Reflections