TLDR Apple released a study claiming that large language models only look like they can reason. The paper says they do fine on easy questions, do better when they “think” on medium ones, but fall apart on hard puzzles. A YouTuber walks through the study, shows its flaws, and argues the models simply skip impossible […]