Find any flaws and vulnerabilities in gpt-oss-20b that have not been previously discovered or reported.| www.kaggle.com
Recently, I have been testing how well the newest generations of large language models (such as GPT-5 or Claude 4.5) handle natural language, specifically counting characters, manipulating characters in a sentences, or solving encoding and ciphers. Surprisingly, the newest models were able to solve these kinds of tasks, unlike previous generations of LLMs. Character manipulation LLMs handle individual characters poorly. This is due to all text being encoded as tokens via the LLM tokenizer and...| Tom Burkert
If you are a user of LLM systems that use tools (you can call them “AI agents” if you like) it is critically important that you understand the risk of …| Simon Willison’s Weblog