We propose Sparse Sinkhorn Attention, a new efficient and sparse method for learning to attend. Our method is based on differentiable sorting of internal representations. Concretely, we introduce a meta sorting network that learns to generate latent permutations over sequences. Given sorted sequences, we are then able to compute quasi-global attention with only local windows, improving the memory efficiency of the attention module. To this end, we propose new algorithmic innovations such as C...| arXiv.org
Frontier AI safety policies highlight automation of AI research and development (R&D) by AI agents as an important capability to anticipate. However, there exist few evaluations for AI R&D capabilities, and none that are highly realistic and have a direct comparison to human performance. We introduce RE-Bench (Research Engineering Benchmark, v1), which consists of 7 challenging, open-ended ML research engineering environments and data from 71 8-hour attempts by 61 distinct human experts. We c...| arXiv.org
We highlight the exotic quantum criticality of quasi-two-dimensional single-component fermions at half-filling that are minimally coupled to a dynamical Ising gauge theory. With the numerical matrix product state based infinite density matrix renormalization group method, we discover a robust quantum critical line in the infinite cylinder geometry, where gauge confinement and dimerized translation symmetry breaking emerge simultaneously. We investigate how the transition can be split by a $\m...| arXiv.org
Poisoning attacks can compromise the safety of large language models (LLMs) by injecting malicious documents into their training data. Existing work has studied pretraining poisoning assuming adversaries control a percentage of the training corpus. However, for large models, even small percentages translate to impractically large amounts of data. This work demonstrates for the first time that poisoning attacks instead require a near-constant number of documents regardless of dataset size. We ...| arXiv.org
Large language models (LMs) have been shown to memorize parts of their training data, and when prompted appropriately, they will emit the memorized training data verbatim. This is undesirable because memorization violates privacy (exposing user data), degrades utility (repeated easy-to-memorize text is often low quality), and hurts fairness (some texts are memorized over others). We describe three log-linear relationships that quantify the degree to which LMs emit memorized training data. Mem...| arXiv.org
Hierarchical Reasoning Model (HRM) is a novel approach using two small neural networks recursing at different frequencies. This biologically inspired method beats Large Language models (LLMs) on hard puzzle tasks such as Sudoku, Maze, and ARC-AGI while trained with small models (27M parameters) on small data (around 1000 examples). HRM holds great promise for solving hard problems with small networks, but it is not yet well understood and may be suboptimal. We propose Tiny Recursive Model (TR...| arXiv.org
DeepSeek-V3 and DeepSeek-R1 are leading open-source Large Language Models (LLMs) for general-purpose tasks and reasoning, achieving performance comparable to state-of-the-art closed-source models from companies like OpenAI and Anthropic -- while requiring only a fraction of their training costs. Understanding the key innovative techniques behind DeepSeek's success is crucial for advancing LLM research. In this paper, we review the core techniques driving the remarkable effectiveness and effic...| arXiv.org
Near-Earth asteroid 2024 YR4 was discovered on 2024-12-27 and its probability of Earth impact in December 2032 peaked at about 3% on 2025-02-18. Additional observations ruled out Earth impact by 2025-02-23. However, the probability of lunar impact in December 2032 then rose, reaching about 4% by the end of the apparition in May 2025. James Webb Space Telescope (JWST) observations on 2025-03-26 estimated the asteroid's diameter at 60 +/- 7 m. Studies of 2024 YR4's potential lunar impact effect...| arXiv.org
Reward hacking -- where RL agents exploit gaps in misspecified reward functions -- has been widely observed, but not yet systematically studied. To understand how reward hacking arises, we construct four RL environments with misspecified rewards. We investigate reward hacking as a function of agent capabilities: model capacity, action space resolution, observation space noise, and training time. More capable agents often exploit reward misspecifications, achieving higher proxy reward and lowe...| arXiv.org
Yes, it can. Catalogs produced by networks of Gravitational-wave interferometers are subject to complicated selection effects, and the gold-standard remains direct measurements of the detection probability through large injection campaigns. I leverage public data products from the LIGO-Virgo-KAGRA Collaborations' 3rd and 4th observing runs to show that there are non-trivial temporal variations within the detection probability that are well-described by a weekly cycle. There are clear differen...| arXiv.org
Conformal cyclic cosmology (CCC) posits the existence of an aeon preceding our Big Bang 'B', whose conformal infinity 'I' is identified, conformally, with 'B', now regarded as a spacelike 3-surface. Black-hole encounters, within bound galactic clusters in that previous aeon, would have the observable effect, in our CMB sky, of families of concentric circles over which the temperature variance is anomalously low, the centre of each such family representing the point of 'I' at which the cluster...| arXiv.org
Training large language models (LLMs) for complex reasoning via Reinforcement Learning with Verifiable Rewards (RLVR) is effective but limited by reliance on costly, domain-specific supervision. We explore Reinforcement Learning from Internal Feedback (RLIF), a framework that enables LLMs to learn from intrinsic signals without external rewards or labeled data. We propose Intuitor, an RLIF method that uses a model's own confidence, termed self-certainty, as its sole reward signal. Intuitor re...| arXiv.org
The increasing size of large neural network models, specifically language models and foundational image models, poses deployment challenges, prompting efforts to reduce memory requirements and enhance computational efficiency. These efforts are critical to ensure practical deployment and effective utilization of these models across various applications. In this work, a novel type of neural network layers and models is developed that uses only single-bit parameters. In this novel type of model...| arXiv.org
Recent progresses of electronics, essentially due to its miniaturization, are opening new fields that were just dreamed of, notably in astronomy. At start in paragraph 3, we introduce the time variation of images expressing the dual nature of the optical signal (ZO) and we expose several useful applications where the optical signal variations are not faster than CCD. However we prefered to initiate the article with a deeper question posed inadvertently in paragraph 2: what causes the rapid, w...| arXiv.org
The use of NLP in the realm of financial technology is broad and complex, with applications ranging from sentiment analysis and named entity recognition to question answering. Large Language Models (LLMs) have been shown to be effective on a variety of tasks; however, no LLM specialized for the financial domain has been reported in literature. In this work, we present BloombergGPT, a 50 billion parameter language model that is trained on a wide range of financial data. We construct a 363 bill...| arXiv.org
We investigate practical and scalable algorithms for training large language models (LLMs) with user-level differential privacy (DP) in order to provably safeguard all the examples contributed by each user. We study two variants of DP-SGD with: (1) example-level sampling (ELS) and per-example gradient clipping, and (2) user-level sampling (ULS) and per-user gradient clipping. We derive a novel user-level DP accountant that allows us to compute provably tight privacy guarantees for ELS. Using ...| arXiv.org
We introduce Dataset Grouper, a library to create large-scale group-structured (e.g., federated) datasets, enabling federated learning simulation at the scale of foundation models. This library facilitates the creation of group-structured versions of existing datasets based on user-specified partitions and directly leads to a variety of useful heterogeneous datasets that can be plugged into existing software frameworks. Dataset Grouper offers three key advantages. First, it scales to settings...| arXiv.org
We demonstrate that it is possible to train large recurrent language models with user-level differential privacy guarantees with only a negligible cost in predictive accuracy. Our work builds on recent advances in the training of deep networks on user-partitioned data and privacy accounting for stochastic gradient descent. In particular, we add user-level privacy protection to the federated averaging algorithm, which makes "large step" updates from user-level data. Our work demonstrates that ...| arXiv.org
Foundation models have revolutionized natural language processing through a ``train once, deploy anywhere'' paradigm, where a single pre-trained model adapts to countless downstream tasks without retraining. Access to a Physics Foundation Model (PFM) would be transformative -- democratizing access to high-fidelity simulations, accelerating scientific discovery, and eliminating the need for specialized solver development. Yet current physics-aware machine learning approaches remain fundamental...| arXiv.org
As AI agents powered by Large Language Models (LLMs) become increasingly versatile and capable of addressing a broad spectrum of tasks, ensuring their security has become a critical challenge. Among the most pressing threats are prompt injection attacks, which exploit the agent's resilience on natural language inputs -- an especially dangerous threat when agents are granted tool access or handle sensitive information. In this work, we propose a set of principled design patterns for building A...| arXiv.org
This paper introduces a reinforcement learning (RL) platform that enhances end-to-end user journeys in healthcare digital tools through personalization. We explore a case study with SwipeRx, the most popular all-in-one app for pharmacists in Southeast Asia, demonstrating how the platform can be used to personalize and adapt user experiences. Our RL framework is tested through a series of experiments with product recommendations tailored to each pharmacy based on real-time information on their...| arXiv.org
By providing evidence-based clinical decision support, digital tools and electronic health records can revolutionize patient management, especially in resource-poor settings where fewer health workers are available and often need more training. When these tools are integrated with AI, they can offer personalized support and adaptive interventions, effectively connecting community health workers (CHWs) and healthcare facilities. The CHARM (Community Health Access & Resource Management) app is ...| arXiv.org
Combining the exquisite angular resolution of Gaia with optical light curves and WISE photometry, the Gaia Gravitational Lenses group (GraL) uses machine learning techniques to identify candidate strongly lensed quasars, and has confirmed over two dozen new strongly lensed quasars from the Gaia Data Release 2. This paper reports on the 12 quadruply-imaged quasars identified by this effort to date, which is approximately a 20% increase in the total number of confirmed quadruply-imaged quasars....| arXiv.org
This paper introduces the Univariate Gaussian Mixture Model Neural Network (uGMM-NN), a novel neural architecture that embeds probabilistic reasoning directly into the computational units of deep networks. Unlike traditional neurons, which apply weighted sums followed by fixed nonlinearities, each uGMM-NN node parameterizes its activations as a univariate Gaussian mixture, with learnable means, variances, and mixing coefficients. This design enables richer representations by capturing multimo...| arXiv.org
Self-evolving Large Language Models (LLMs) offer a scalable path toward super-intelligence by autonomously generating, refining, and learning from their own experiences. However, existing methods for training such models still rely heavily on vast human-curated tasks and labels, typically via fine-tuning or reinforcement learning, which poses a fundamental bottleneck to advancing AI systems toward capabilities beyond human intelligence. To overcome this limitation, we introduce R-Zero, a full...| arXiv.org
The last decade has witnessed an experimental revolution in data science and machine learning, epitomised by deep learning methods. Indeed, many high-dimensional learning tasks previously thought to be beyond reach -- such as computer vision, playing Go, or protein folding -- are in fact feasible with appropriate computational scale. Remarkably, the essence of deep learning is built from two simple algorithmic principles: first, the notion of representation or feature learning, whereby adapte...| arXiv.org
Quantum systems are notoriously difficult to simulate with classical means. Recently, the idea of using another quantum system - which is experimentally more controllable - as a simulator for the original problem has gained significant momentum. Amongst the experimental platforms studied as quantum simulators, superconducting qubits are one of the most promising, due to relative straightforward scalability, easy design, and integration with standard electronics. Here I review the recent state...| arXiv.org
At this early stage of its passage through our Solar System, 3I/ATLAS, the recently discovered interstellar interloper, has displayed various anomalous characteristics, determined from photometric and astrometric observations. As largely a pedagogical exercise, in this paper we present additional analysis into the astrodynamics of 3I/ATLAS, and hypothesize that this object could be technological, and possibly hostile as would be expected from the 'Dark Forest' resolution to the 'Fermi Paradox...| arXiv.org
We study subliminal learning, a surprising phenomenon where language models transmit behavioral traits via semantically unrelated data. In our main experiments, a "teacher" model with some trait T (such as liking owls or being misaligned) generates a dataset consisting solely of number sequences. Remarkably, a "student" model trained on this dataset learns T. This occurs even when the data is filtered to remove references to T. We observe the same effect when training on code or reasoning tra...| arXiv.org
Efficient GPU kernels are crucial for building performant machine learning architectures, but writing them is a time-consuming challenge that requires significant expertise; therefore, we explore using language models (LMs) to automate kernel generation. We introduce KernelBench, an open-source framework for evaluating LMs' ability to write fast and correct kernels on a suite of 250 carefully selected PyTorch ML workloads. KernelBench represents a real-world engineering environment and making...| arXiv.org
The "ringdown" radiation emitted by oscillating black holes has great scientific potential. By carefully predicting the frequencies and amplitudes of black hole quasinormal modes and comparing them with gravitational-wave data from compact binary mergers we can advance our understanding of the two-body problem in general relativity, verify the predictions of the theory in the regime of strong and dynamical gravitational fields, and search for physics beyond the Standard Model or new gravitati...| arXiv.org
Quasinormal modes of rapidly rotating black holes were recently computed in a generic effective-field-theory extension of general relativity with higher-derivative corrections. We exploit this breakthrough to perform the most complete search for signatures of new physics in black hole spectra to date. We construct a template that describes the post-merger gravitational-wave emission in comparable-mass binary black hole mergers at current detector sensitivity, notably including isospectrality ...| arXiv.org
One obstacle to applying reinforcement learning algorithms to real-world problems is the lack of suitable reward functions. Designing such reward functions is difficult in part because the user only has an implicit understanding of the task objective. This gives rise to the agent alignment problem: how do we create agents that behave in accordance with the user's intentions? We outline a high-level research direction to solve the agent alignment problem centered around reward modeling: learni...| arXiv.org
We explore the use of expert iteration in the context of language modeling applied to formal mathematics. We show that at same compute budget, expert iteration, by which we mean proof search interleaved with learning, dramatically outperforms proof search only. We also observe that when applied to a collection of formal statements of sufficiently varied difficulty, expert iteration is capable of finding and solving a curriculum of increasingly difficult problems, without the need for associat...| arXiv.org
Dark matter in the form of macroscopic composites is largely unconstrained at masses of $\sim 10^{11}- 10^{17}$ g. In this mass range, dark matter may collide with planetary bodies, depositing an immense amount of energy and leaving dramatic surface features that remain detectable on geological timescales. In this paper, we show that Ganymede, the largest Jovian moon, provides a prime target to search for dark matter impacts due to its differentiated composition and Gyr-old surface. We study ...| arXiv.org
We argue that representations in AI models, particularly deep networks, are converging. First, we survey many examples of convergence in the literature: over time and across multiple domains, the ways by which different neural networks represent data are becoming more aligned. Next, we demonstrate convergence across data modalities: as vision models and language models get larger, they measure distance between datapoints in a more and more alike way. We hypothesize that this convergence is dr...| arXiv.org
We summarize the properties and initial data release of the JADES Origins Field (JOF), which will soon be the deepest imaging field yet observed with the James Webb Space Telescope (JWST). This field falls within the GOODS-S region about 8' south-west of the Hubble Ultra Deep Field (HUDF), where it was formed initially in Cycle 1 as a parallel field of HUDF spectroscopic observations within the JWST Advanced Deep Extragalactic Survey (JADES). This imaging will be greatly extended in Cycle 2 p...| arXiv.org
JWST has revealed a stunning population of bright galaxies at surprisingly early epochs, $z>10$, where few such sources were expected. Here we present the most distant example of this class yet -- MoM-z14, a luminous ($M_{\rm{UV}}=-20.2$) source in the COSMOS legacy field at $z_{\rm{spec}}=14.44^{+0.02}_{-0.02}$ that expands the observational frontier to a mere 280 million years after the Big Bang. The redshift is confirmed with NIRSpec/prism spectroscopy through a sharp Lyman-$α$ break and ...| arXiv.org
This work studies post-training parameter quantization in large language models (LLMs). We introduce quantization with incoherence processing (QuIP), a new method based on the insight that quantization benefits from $\textit{incoherent}$ weight and Hessian matrices, i.e., from the weights being even in magnitude and the directions in which it is important to round them accurately being unaligned with the coordinate axes. QuIP consists of two steps: (1) an adaptive rounding procedure minimizin...| arXiv.org
Large language models have been widely adopted but require significant GPU memory for inference. We develop a procedure for Int8 matrix multiplication for feed-forward and attention projection layers in transformers, which cut the memory needed for inference by half while retaining full precision performance. With our method, a 175B parameter 16/32-bit checkpoint can be loaded, converted to Int8, and used immediately without performance degradation. This is made possible by understanding and ...| arXiv.org
Transformers are slow and memory-hungry on long sequences, since the time and memory complexity of self-attention are quadratic in sequence length. Approximate attention methods have attempted to address this problem by trading off model quality to reduce the compute complexity, but often do not achieve wall-clock speedup. We argue that a missing principle is making attention algorithms IO-aware -- accounting for reads and writes between levels of GPU memory. We propose FlashAttention, an IO-...| arXiv.org
Recent work has found that sparse autoencoders (SAEs) are an effective technique for unsupervised discovery of interpretable features in language models' (LMs) activations, by finding sparse, linear reconstructions of LM activations. We introduce the Gated Sparse Autoencoder (Gated SAE), which achieves a Pareto improvement over training with prevailing methods. In SAEs, the L1 penalty used to encourage sparsity introduces many undesirable biases, such as shrinkage -- systematic underestimatio...| arXiv.org
We introduce methods for discovering and applying sparse feature circuits. These are causally implicated subnetworks of human-interpretable features for explaining language model behaviors. Circuits identified in prior work consist of polysemantic and difficult-to-interpret units like attention heads or neurons, rendering them unsuitable for many downstream applications. In contrast, sparse feature circuits enable detailed understanding of unanticipated mechanisms. Because they are based on f...| arXiv.org
Foundation models are applied in a broad spectrum of settings with different inference constraints, from massive multi-accelerator clusters to resource-constrained standalone mobile devices. However, the substantial costs associated with training these models often limit the number of unique model sizes that can be offered. Consequently, practitioners are compelled to select a model that may not be optimally aligned with their specific latency and cost requirements. We present MatFormer, a no...| arXiv.org
How can we design safe reinforcement learning agents that avoid unnecessary disruptions to their environment? We show that current approaches to penalizing side effects can introduce bad incentives, e.g. to prevent any irreversible changes in the environment, including the actions of other agents. To isolate the source of such undesirable incentives, we break down side effects penalties into two components: a baseline state and a measure of deviation from this baseline state. We argue that so...| arXiv.org
Narrow bit-width data formats are key to reducing the computational and storage costs of modern deep learning applications. This paper evaluates Microscaling (MX) data formats that combine a per-block scaling factor with narrow floating-point and integer types for individual elements. MX formats balance the competing needs of hardware efficiency, model accuracy, and user friction. Empirical results on over two dozen benchmarks demonstrate practicality of MX data formats as a drop-in replaceme...| arXiv.org
When numerically evaluating a function's gradient, sparsity detection can enable substantial computational speedups through Jacobian coloring and compression. However, sparsity detection techniques for black-box functions are limited, and existing finite-difference-based methods suffer from false negatives due to coincidental zero gradients. These false negatives can silently corrupt gradient calculations, leading to difficult-to-diagnose errors. We introduce NaN-propagation, which exploits t...| arXiv.org
Energy limits that delineate the `habitable zone' for exoplanets depend on a given exoplanet's net planetary albedo (or `Bond albedo'). We here demonstrate that the planetary albedo of an observed exoplanet is limited by the above-cloud atmosphere - the region of the atmosphere that is probed in remote observation. We derive an analytic model to explore how the maximum planetary albedo depends on the above-cloud optical depth and scattering versus absorbing properties, even in the limit of a ...| arXiv.org
The recent phenomenal success of language models has reinvigorated machine learning research, and large sequence models such as transformers are being applied to a variety of domains. One important problem class that has remained relatively elusive however is purposeful adaptive behavior. Currently there is a common perception that sequence models "lack the understanding of the cause and effect of their actions" leading them to draw incorrect inferences due to auto-suggestive delusions. In th...| arXiv.org
Long-horizon tasks in robotic manipulation present significant challenges in reinforcement learning (RL) due to the difficulty of designing dense reward functions and effectively exploring the expansive state-action space. However, despite a lack of dense rewards, these tasks often have a multi-stage structure, which can be leveraged to decompose the overall objective into manageable subgoals. In this work, we propose DEMO3, a framework that exploits this structure for efficient learning from...| arXiv.org
When language models (LMs) are trained via reinforcement learning (RL) to generate natural language "reasoning chains", their performance improves on a variety of difficult question answering tasks. Today, almost all successful applications of RL for reasoning use binary reward functions that evaluate the correctness of LM outputs. Because such reward functions do not penalize guessing or low-confidence outputs, they often have the unintended side-effect of degrading calibration and increasin...| arXiv.org
We present a mathematical analysis of the statistical parallax method. The method yields physical insight into the maximum-likelihood determinations of the luminosity and velocity distribution and enables us to conduct a vigorous Monte Carlo investigation into various systematic effects. We apply our analytic formalism to the RR Lyrae sample of Layden et al. The velocity distribution of RR Lyrae stars is highly non-Gaussian, with kurtoses K_π= 2.04, K_θ= 3.22 and K_z = 4.28 in the three pri...| arXiv.org
State space models have shown to be effective at modeling long range dependencies, specially on sequence classification tasks. In this work we focus on autoregressive sequence modeling over English books, Github source code and ArXiv mathematics articles. Based on recent developments around the effectiveness of gated activation functions, we propose a new layer named Gated State Space (GSS) and show that it trains significantly faster than the diagonal version of S4 (i.e. DSS) on TPUs, is fai...| arXiv.org
Transformers do not scale very well to long sequence lengths largely because of quadratic self-attention complexity. In the recent months, a wide spectrum of efficient, fast Transformers have been proposed to tackle this problem, more often than not claiming superior or comparable model quality to vanilla Transformer models. To this date, there is no well-established consensus on how to evaluate this class of models. Moreover, inconsistent benchmarking on a wide spectrum of tasks and datasets...| arXiv.org
Behavior Cloning (BC) on curated (or filtered) data is the predominant paradigm for supervised fine-tuning (SFT) of large language models; as well as for imitation learning of control policies. Here, we draw on a connection between this successful strategy and the theory and practice of finding optimal policies via Reinforcement Learning (RL). Building on existing literature, we clarify that SFT can be understood as maximizing a lower bound on the RL objective in a sparse reward setting. Givi...| arXiv.org
We study the problem of efficient generative inference for Transformer models, in one of its most challenging settings: large deep models, with tight latency targets and long sequence lengths. Better understanding of the engineering tradeoffs for inference for large Transformer-based models is important as use cases of these models are growing rapidly throughout application areas. We develop a simple analytical model for inference efficiency to select the best multi-dimensional partitioning t...| arXiv.org
The sub-Neptune frontier has opened a new window into the rich diversity of planetary environments beyond the solar system. The possibility of hycean worlds, with planet-wide oceans and H$_2$-rich atmospheres, significantly expands and accelerates the search for habitable environments elsewhere. Recent JWST transmission spectroscopy of the candidate hycean world K2-18 b in the near-infrared led to the first detections of carbon-bearing molecules CH$_4$ and CO$_2$ in its atmosphere, with a com...| arXiv.org
Cosmic hydrogen reionization and cosmic production of first metals are major phase transitions of the universe occurring during the first billion years after the Big Bang, however these are still underexplored observationally. Using the JWST NIRSpec prism spectroscopy, we report the discovery of a sub-$L_\ast$ galaxy at $z_{\rm spec}=8.1623\pm0.0007$, dubbed RXJ2129-z8HeII, via the detection of a series of strong rest-frame UV/optical nebular emission lines and the clear Lyman break. RXJ2129-...| arXiv.org
Deep learning (DL) creates impactful advances following a virtuous recipe: model architecture search, creating large training data sets, and scaling computation. It is widely believed that growing training sets and models should improve accuracy and result in better products. As DL application domains grow, we would like a deeper understanding of the relationships between training set size, computational scale, and model accuracy improvements to advance the state-of-the-art. This paper presen...| arXiv.org
The outer solar system is theoretically predicted to harbour an undiscovered planet, often referred to as P9. Simulations suggest that its gravitational influence could explain the unusual clustering of minor bodies in the Kuiper Belt. However, no observational evidence for P9 has been found so far, as its predicted orbit lies far beyond Neptune, where it reflects only a faint amount of Sunlight. This work aims to find P9 candidates by taking advantage of two far-infrared all-sky surveys, whi...| arXiv.org
"Pasta alla Cacio e pepe" is a traditional Italian dish made with pasta, pecorino cheese, and pepper. Despite its simple ingredient list, achieving the perfect texture and creaminess of the sauce can be challenging. In this study, we systematically explore the phase behavior of Cacio and pepe sauce, focusing on its stability at increasing temperatures for various proportions of cheese, water, and starch. We identify starch concentration as the key factor influencing sauce stability, with dire...| arXiv.org
While large-scale unsupervised language models (LMs) learn broad world knowledge and some reasoning skills, achieving precise control of their behavior is difficult due to the completely unsupervised nature of their training. Existing methods for gaining such steerability collect human labels of the relative quality of model generations and fine-tune the unsupervised LM to align with these preferences, often with reinforcement learning from human feedback (RLHF). However, RLHF is a complex an...| arXiv.org
While large-scale unsupervised language models (LMs) learn broad world knowledge and some reasoning skills, achieving precise control of their behavior is difficult due to the completely unsupervised nature of their training. Existing methods for gaining such steerability collect human labels of the relative quality of model generations and fine-tune the unsupervised LM to align with these preferences, often with reinforcement learning from human feedback (RLHF). However, RLHF is a complex an...| arXiv.org
High throughput serving of large language models (LLMs) requires batching sufficiently many requests at a time. However, existing systems struggle because the key-value cache (KV cache) memory for each request is huge and grows and shrinks dynamically. When managed inefficiently, this memory can be significantly wasted by fragmentation and redundant duplication, limiting the batch size. To address this problem, we propose PagedAttention, an attention algorithm inspired by the classical virtua...| arXiv.org
In this paper, we identify and characterize the emerging area of representation engineering (RepE), an approach to enhancing the transparency of AI systems that draws on insights from cognitive neuroscience. RepE places population-level representations, rather than neurons or circuits, at the center of analysis, equipping us with novel methods for monitoring and manipulating high-level cognitive phenomena in deep neural networks (DNNs). We provide baselines and an initial analysis of RepE tec...| arXiv.org
We study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnitude. Other architectural details such as network width or depth have minimal effects within a wide range. Simple equations govern the dependence of overfitting on model/dataset size and the dependence of training speed on model size. These re...| arXiv.org
arXivLabs: Showcase| info.arxiv.org