Improving model performance by scaling up inference compute is the next big thing in frontier AI. But the charts being used to trumpet this new paradigm can be misleading. While they initially appear to show steady scaling and impressive performance for models like o1 and o3, they really show poor scaling (hard to distinguish from brute force) and little evidence of improvement between o1 and o3. I explain how to interpret these new charts and what evidence for strong scaling and progress wou...