Do curated, tool-grounded demonstrations build stronger software agents than broad piles of generic instruction data? A team of researchers from Shanghai Jiao Tong University and SII Generative AI Research Lab (GAIR) proposes LIMI (“Less Is More for Agency”), a supervised fine-tuning method that turns a base model into a capable software/research agent using 78 samples. […] The post A New Agency-Focused Supervision Approach Scales Software AI Agents With Only 78 Examples appeared first ...| MarkTechPost
Salesforce AI Research released CoDA-1.7B, a diffusion-based language model for code that generates by denoising whole sequences with bidirectional context, updating multiple tokens in parallel rather than left-to-right next-token prediction. The research team published both Base and Instruct checkpoints and an end-to-end training/evaluation/serving stack. Understanding the architecture and training CoDA adapts a 1.7B-parameter backbone to […] The post Salesforce AI Research Releases CoDA-1...| MarkTechPost
What if, instead of re-sampling one agent, you could push Gemini-2.5 Pro to 34.1% on HLE by mixing 12–15 tool-using agents that share notes and stop early? Google Cloud AI Research, with collaborators from MIT, Harvard, and Google DeepMind, introduced TUMIX (Tool-Use Mixture)—a test-time framework that ensembles heterogeneous agent styles (text-only, code, search, guided variants) […] The post Google Proposes TUMIX: Multi-Agent Test-Time Scaling With Tool-Use Mixture appeared first on M...| MarkTechPost
Researchers from Cornell and Google introduce a unified Regression Language Model (RLM) that predicts numeric outcomes directly from code strings—covering GPU kernel latency, program memory usage, and even neural network accuracy and latency—without hand-engineered features. A 300M-parameter encoder–decoder initialized from T5-Gemma achieves strong rank correlations across heterogeneous tasks and languages, using a single text-to-number decoder […] The post Can a Small Language Model ...| MarkTechPost
Graph-R1, an advanced agentic GraphRAG framework using hypergraph knowledge and reinforcement learning for accurate, efficient QA| MarkTechPost
ByteDance Research Releases DAPO: A Fully Open-Sourced LLM Reinforcement Learning System at Scale| MarkTechPost
s1: A Simple Yet Powerful Test-Time Scaling Approach for LLMs| MarkTechPost