Self-supervised learning (SSL) has made significant advances in speech representation learning. Models like wav2vec 2.0 and HuBERT have achieved state-of-the-art results in tasks such as speech recognition, particularly in monolingual settings. However, multilingual SSL models tend to underperform their monolingual counterparts on each individual language, especially in multilingual scenarios with few languages such as the bilingual setting. In this work, we investigate a novel approach to re...| Apple Machine Learning Research
Protein folding models have achieved groundbreaking results since the introduction of AlphaFold2, typically built via a combination of integrating domain-expertise into its architectural designs and training pipelines. Nonetheless, given the success of generative models across different but related problems, it is natural to question whether these architectural designs are a necessity to build performant models. In this paper, we introduce SimpleFold, the first flow-matching based protein fol...| Apple Machine Learning Research
We investigate the theoretical foundations of classifier-free guidance (CFG). CFG is the dominant method of conditional sampling for text-to-image diffusion models, yet unlike other aspects of diffusion, it remains on shaky theoretical footing. In this paper, we disprove common misconceptions, by showing that CFG interacts differently with DDPM (Ho et al., 2020) and DDIM (Song et al., 2021), and neither sampler with CFG generates the gamma-powered distribution p(x|c)^γp(x)^{1−γ}. Then, we...| Apple Machine Learning Research
Recent advances in large language models (LLMs) have extended context lengths, enabling assistants to sustain long histories for coherent, personalized responses. This ability, however, hinges on Key-Value (KV) caching, whose memory grows linearly with dialogue length and quickly dominates under strict resource constraints. An active line of research for reducing this overhead is KV cache compression, which seeks to limit cache size while preserving accuracy. Yet existing methods face two maj...| Apple Machine Learning Research
We present AToken, the first unified visual tokenizer that achieves both high-fidelity reconstruction and semantic understanding across images, videos, and 3D assets. Unlike existing tokenizers that specialize in either reconstruction or understanding for single modalities, AToken encodes these diverse visual inputs into a shared 4D latent space, unifying both tasks and modalities in a single framework. Specifically, we introduce a pure transformer architecture with 4D rotary position embeddi...| Apple Machine Learning Research
This paper was accepted at the DataWorld (Data Curation) Workshop at ICML 2025. Multimodal models are trained on large-scale web-crawled datasets, which often contain noise, bias, and irrelevant information. This motivates the use of data selection techniques, which can be divided into model-free variants, relying on heuristic rules and downstream datasets, and model-based approaches, such as those using influence functions. The former can be expensive to design and risks introducing unwanted...| Apple Machine Learning Research
Natural language processing (NLP) remains one of the most quickly evolving fields in AI, as new research continues to rapidly advance large language models (LLMs), systems for speech recognition and generation, language agents, and more. This technology is essential to many of today’s AI experiences, including Apple Intelligence and Siri, and fundamental research in NLP will be foundational to future AI. Apple recently hosted the Workshop on Natural Language and Interactive Systems, bringin...| Apple Machine Learning Research
This paper re-examines the first normalized incomplete moment, a well-established measure of inequality with wide applications in economic and social sciences. Despite the popularity of the measure itself, existing statistical inference appears to lag behind the needs of modern-age analytics. To fill this gap, we propose an alternative solution that is intuitive, computationally efficient, mathematically equivalent to the existing solutions for “standard” cases, and easily adaptable to ...| Apple Machine Learning Research
Local-global attention models have recently emerged as compelling alternatives to standard Transformers, promising improvements in both training and inference efficiency. However, the crucial choice of window size presents a Pareto tradeoff: larger windows maintain performance akin to full attention but offer minimal efficiency gains in short-context scenarios, while smaller windows can lead to performance degradation. Current models, such as Gemma2 and Mistral, adopt conservative window size...| Apple Machine Learning Research
Multimodal large language models (MLLMs) excel at 2D visual understanding but remain limited in their ability to reason about 3D space. In…| Apple Machine Learning Research
Read updates about machine learning research, events, and programs from Apple.| Apple Machine Learning Research
Apple believes that privacy is a fundamental human right. As AI experiences become increasingly personal and a part of people's daily lives…| Apple Machine Learning Research
Vision Language Models (VLMs) enable visual understanding alongside textual inputs. They are typically built by passing visual tokens from a…| Apple Machine Learning Research
Apple researchers are advancing AI and ML through fundamental research, and to support the broader research community and help accelerate…| Apple Machine Learning Research
The increasing capabilities of large generative models and their ever more widespread deployment have raised concerns about their…| Apple Machine Learning Research
Large-scale models are routinely trained on a mixture of different data sources. Different data mixtures yield very different downstream…| Apple Machine Learning Research
Recent generations of frontier language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes…| Apple Machine Learning Research
At Apple, we believe privacy is a fundamental human right. And we believe in giving our users a great experience while protecting their…| Apple Machine Learning Research
Large generative models are becoming increasingly capable and more widely deployed to power production applications, but getting these…| Apple Machine Learning Research
This study explores using embedding rank as an unsupervised evaluation metric for general-purpose speech encoders trained via…| Apple Machine Learning Research
Self-play has powered breakthroughs in two-player and multi-player games. Here we show that self-play is a surprisingly effective strategy…| Apple Machine Learning Research
Nonverbal behaviors such as posture, gestures, and gaze are essential for conveying internal states, both consciously and unconsciously, in…| Apple Machine Learning Research
This paper introduces a framework, called EMOTION, for generating expressive motion sequences in humanoid robots, enhancing their ability to…| Apple Machine Learning Research
At Apple, we believe privacy is a fundamental human right. Our work to protect user privacy is informed by a set of privacy principles, and…| Apple Machine Learning Research
This paper was accepted at the Efficient Natural Language and Speech Processing (ENLSP) Workshop at NeurIPS 2024. The pre-training phase of…| Apple Machine Learning Research
This paper was accepted at the Efficient Natural Language and Speech Processing (ENLSP) Workshop at NeurIPS 2024. While large language…| Apple Machine Learning Research
Apple is sponsoring the annual meeting of the Association for Computational Linguistics (ACL), which takes place in person from August 11 to…| Apple Machine Learning Research
Build amazing machine-learned experiences with Apple. Discover opportunities for researchers, students, and developers.| Apple Machine Learning Research
At the 2024 Worldwide Developers Conference, we introduced Apple Intelligence, a personal intelligence system integrated deeply into…| Apple Machine Learning Research
Explore advancements in state of the art machine learning research in speech and natural language, privacy, computer vision, health, and more.| Apple Machine Learning Research
Today, we are excited to release optimizations to Core ML for Stable Diffusion in macOS 13.1 and iOS 16.2, along with code to get started…| Apple Machine Learning Research