Introduction Last week, researchers at Carnegie Mellon University (CMU) revealed a finding that caught the attention of both the AI and cybersecurity worlds. Their work tackled a lingering challenge: whether today’s leading large language models (LLMs) can independently carry out complex, multi-host cyber-attacks from start to finish. In their raw form, when asked to execute multi-step cyber-attacks from start to finish, these models routinely fail. They wander off-task, choose the wrong to...