This post explores problems contributing to a benchmark crisis in LLM evaluation and potential solutions.| ruder.io
This post discusses Command R and Command R+, the top open-weights model on Chatbot Arena at the time of its release and highlights their RAG and multilingual capabilities.| ruder.io
This post discusses recent results on extremely long-context benchmarks, explores true zero-shot machine translation (MT), and considers how to teach LLMs a new language like humans.| ruder.io
This post discusses macro trends I observed regarding the AI job market in 2024 and the reasons I joined my new company.| ruder.io
This post gives an overview of the Big Picture Workshop at EMNLP 2023.| ruder.io
This post discusses compute as the main constraint for doing research in NLP and highlights five key research directions that do not require much compute.| ruder.io
An overview of EMNLP 2023 papers covering QA, instruction tuning, task adaptation, NLG evaluation, and multilingual models and datasets.| ruder.io
A round-up of 20 exciting NeurIPS 2023 papers related to LLMs.| ruder.io
This post covers a range of widely used instruction tuning datasets, as well as important characteristics of instruction tuning data and best practices for using the datasets.| ruder.io
An overview of modular deep learning across four dimensions (computation function, routing function, aggregation function, and training setting).| ruder.io
This post takes a closer look at the state of multilingual AI. How multilingual are current models in NLP, computer vision, and speech? What are the main recent contributions in this area? What challenges remain and how we can we address them?| ruder.io
This post discusses my highlights of ACL 2022, including language diversity and multimodality, prompting, the next big ideas and keynotes, my favorite papers, and the hybrid conference experience.| ruder.io
This post summarizes progress across multiple impactful areas in ML and NLP in 2021.| ruder.io
This post expands on the EMNLP 2021 tutorial on Multi-domain Multilingual Question Answering and highlights key insights and takeaways.| ruder.io
Recent NLP models have outpaced the benchmarks to test for them. This post provides an overview of challenges and opportunities for NLP benchmarks.| ruder.io