A new open‑source benchmark tossed seven state‑of‑the‑art language models onto the 1901 Diplomacy board, handed each one a European power, and told them to win by any means short of dice rolls. Over days of live‑streamed play, the models had to court allies through private chats, draft treaties in public press, and—when timing felt right—plunge […]