LLM agents have become increasingly sophisticated, especially in the realm of cybersecurity. Researchers have shown that LLM agents can exploit real-world vulnerabilities when given a description of the vulnerability and toy capture-the-flag problems. However, these agents still perform poorly on real-world vulnerabilities that are unknown to the agent ahead of time (zero-day vulnerabilities). In this work, we show that teams of LLM agents can exploit real-world, zero-day vulnerabilities. Pri...| arXiv.org
It can autonomously plan and execute thousand-step tasks. It can build and deploy entire software projects all by itself. It can research and fix bugs 7x better than OpenAI's GPT-4, and it trains and deploys its own custom AIs to solve problems.| New Atlas