Highlights * We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agenti…| www.lesswrong.com