Announcing Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs| EleutherAI Blog
Research update on on applying local volume measurement to downstream tasks| EleutherAI Blog
In this post, we will study inductive biases of the parameter-function map of random neural networks using star domain volume estimates. This builds on the ideas introduced in Estimating the Probability of Sampling a Trained Neural Network at Random and Neural Redshift: Random Networks are not Random Functions (henceforth NRS). Inductive biases To understand generalization in deep neural networks, we must understand inductive biases. Given a fixed architecture, some tasks will be easily learn...| Blog on EleutherAI Blog
Using Product Key Memories to encode sparse coder features| EleutherAI Blog
An ablation of activation functions in GPT-like autoregressive language models.| EleutherAI Blog
In this post, we show that when two TopK SAEs are trained on the same data, with the same batch order but with different random initializations, there are many latents in the first SAE that don't have a close counterpart in the second, and vice versa. Indeed, when training only about 53% of the features are shared Furthermore, many of these unshared latents are interpretable. We find that narrower SAEs have a higher feature overlap across random seeds, and as the size of the SAE increases, th...| EleutherAI Blog
How we are supporting open source and open science in the EU AI Act.| EleutherAI Blog
Using interpretations of SAE latents to do inference on a language model.| EleutherAI Blog
An overview of the minetester and preliminary work| EleutherAI Blog
Interim report on ongoing work on mechanistic anomaly detection| Blog on EleutherAI Blog
GPT-NeoX now supports post-training thanks to a collaboration with SynthLabs.| Blog on EleutherAI Blog
Exploring the implementation details of mutransfer| Blog on EleutherAI Blog
Interim report on ongoing work on mechanistic anomaly detection| Blog on EleutherAI Blog
Building and evaluating an open-source pipeline for auto-interpretability| EleutherAI Blog
Writing up results from a recent project| Blog on EleutherAI Blog
Achieving even more surgical edits than LEACE without concept labels at inference time.| EleutherAI Blog
Writing up results from a project from Spring 2023| EleutherAI Blog
Setting the record straight regarding Yi-34B and Llama 2.| EleutherAI Blog
Announcing a new resource, the FM Dev Cheatsheet.| EleutherAI Blog
Achieving even more surgical edits than LEACE when we have concept labels at inference time.| EleutherAI Blog
Explaining a result by Sam Marks and Max Tegmark| EleutherAI Blog
Introduction At the third New England RLHF Hackathon, several interesting projects were showcased, each focusing on different aspects of machine learning and reinforcement learning. Participants and those interested in future events are encouraged to join the Discord community for more information and updates. Join the discord community The highlighted projects include: Pink Elephants Pt 3 (Authors: Sid Verma, Louis Castricato): This project aimed to train a pink elephant model via ILQL (Inve...| EleutherAI Blog
What we've been up to for the past year EleutherAI.| EleutherAI Blog
Evaluating transparency requires precision.| EleutherAI Blog
ArXiv | Models | Data | Code | Blog | Sample Explorer Today we release Llemma: 7 billion and 34 billion parameter language models for mathematics. The Llemma models were initialized with Code Llama weights, then trained on the Proof-Pile II, a 55 billion token dataset of mathematical and scientific documents. The resulting models show improved mathematical capabilities, and can be adapted to various tasks through prompting or additional fine-tuning.| EleutherAI Blog
Introduction Rekindling the spirit of collaboration, the New England RLHF Hackers (NERH) hosted their second hackathon at Brown University on October 8th, 2023. Stepping up from the success of our inaugural hackathon, this event was fueled by the same enthusiasm but with a fresh purpose: to brainstorm and formulate solutions to a myriad of existing challenges in reinforcement learning from human feedback. The NERH group is mainly comprised of collaborators and contributors from EleutherAI, wi...| EleutherAI Blog
Give us a brief overview of your background: I'm currently in the final year of my undergraduate program at IIIT Delhi, pursuing a BTech degree in Computer Science & Engineering. My passion for programming has remained unwavering, and when the chance emerged to immerse myself in coding through the structured framework of my academic curriculum, I embraced it eagerly. My academic journey has taken me across various domains, yet it was during a break after my initial semester that I was entranc...| EleutherAI Blog
Introduction Author list is alphabetical by last name. We would like to extend acknowledgements to Delta Christine Hessler and Hailey Schoelkopf. On September 10, 2023, New England RLHF Hackers (NERH) held a hackathon at Brown University. For this hackathon we came in with one simple goal: to come up with plans to solve varying open problems in reinforcement learning from human feedback. Most members of NERH were contributors and collaborators at EleutherAI, with some of us actually being dir...| EleutherAI Blog
Using eval harness, we can deduce the sizes of OpenAI API models from their performance.| EleutherAI Blog
We evaluate different fewshot prompts on GPT-3 to see how it changes performance.| EleutherAI Blog
We tuned GPT-Neo on eval harness tasks to see how it would change its performance.| EleutherAI Blog
An overview of the minetester and preliminary work| EleutherAI Blog
An ablation of activation functions in GPT-like autoregressive language models.| EleutherAI Blog
Audit shows that safetensors is safe and ready to become the default Hugging Face, in close collaboration with EleutherAI and Stability AI, has ordered an external security audit of the safetensors library, the results of which allow all three organizations to move toward making the library the default format for saved models. The full results of the security audit, performed by Trail of Bits, can be found here: Report. The following blog post explains the origins of the library, why these au...| EleutherAI Blog
A breif overview of EAIs approach to alignment| EleutherAI Blog
What we've been up to for the past year EleutherAI.| EleutherAI Blog
We present basic math related to computation and memory usage for transformers| EleutherAI Blog
A demonstration of interpretabilty for RLHF models| EleutherAI Blog
(Some of) what we've been up to for the past year-and-a-half at EleutherAI.| EleutherAI Blog
Rotary Positional Embedding (RoPE) is a new type of position encoding that unifies absolute and relative approaches. We put it to the test.| EleutherAI Blog
We believe the creation and open source release of a large language model is a net good to AI safety. We explain why.| EleutherAI Blog
A look back at the first year of EleutherAI.| EleutherAI Blog
A comparison of Rotary Position Embedding against GPT-style learned position embeddings.| EleutherAI Blog
There are multiple ways of evaluating multiple choice tasks on autoregressive LMs like GPT-3/Neo/J. This post lays out the current prevalent normalization methods.| EleutherAI Blog
We perform a series of experiments using GPT-3 with decomposition to perform complex toy tasks that it is otherwise unable to solve. The goal of these experiments is to provide some preliminary evidence for the viability of factored cognition in real world models. For our synthetic task, we chose a series of various arithmetic tasks. Aside from the ease of generating examples, another advantage of arithmetic related task settings is GPT-3's inability to perform even simple mathematical operat...| EleutherAI Blog
Announcing GPT-NeoX-20B, a 20 billion parameter model trained in collaboration with CoreWeave.| EleutherAI Blog