How LLMs (Pretend to) Solve Math Problems
Three math problems that reveal how Anthropic's latest LLMs reason
Dwarkesh has the hottest podcast in AI right now. So when I saw last week that he dropped a new episode on Claude 4, I had that special tingling sensation that it was going to be a banger. And it delivered.
The episode features Sholto Douglas and Trenton Bricken, two Anthropic researchers as they discuss the current state of LLMs and their predictions for the next few years. The whole thing was great but there was one topic that was described so elegantly that I felt compelled to put it down in writing. This was discussion about model interpretability research that reveals how LLMs approach math problems.
LLMs struggling with math problems has been one of LLM-haters favorite talking points, and for good reason. Intuitively, next token prediction just feels less suitable for the logical approach required in math. And then more formally, in October 2024 Apple released a paper suggesting that LLMs couldn’t reason mathematically. They started with simple word problems and changed variable names or swapped numbers and showed that this tanked models’ performance. The conclusion drawn from this work was that models were memorizing the answers to these problems, not really solving them.
Since then, two important things have happened. First: the industry found a workaround by giving LLMs tools like calculators or code to solve problems using more deterministic methods. And second: the introduction of RL training methods that produced models like o1 and R1 have significantly improved the mathematical abilities of LLMs.
In the meantime, Anthropic have been doing a lot of work on trying to figure out how LLMs think by giving their models all kinds of problems and probing inside to see what is actually happening. And what they are finding is changing how we think about AI reasoning.
Scratchpads and Circuits
Broadly speaking, there seem to be two main ways we can see how language models are “thinking.” We can ask LLMs directly to write down their thoughts promising them that these notes will remain invisible to researchers. Of course, the researchers look anyway.1 This is called the scratchpad approach. The other approach is the circuit approach. This method involves tracing all of the different “neural” paths taken as the model is making its computation through this enormous neural network.
One analogy they give is that it's like the difference between asking Serena Williams to describe how she hits a tennis ball (scratchpad) versus actually putting sensors on every part of her body to measure what's happening (circuit).
Trenton goes a bit deeper into the circuit tracing:
A fun analogy here is you've got the Ocean's Eleven bank heist team in a big crowd of people. The crowd of people is all the different possible features. We are trying to pick out in this crowd of people who is on the heist team and all their different functions that need to come together in order to successfully break into the bank. You've got the demolition guy, you've got the computer hacker, you've got the inside man. They all have different functions through the layers of the model that they need to perform together in order to successfully break into the bank.
Interestingly it’s not a single trace through a circuit; it’s finding out the collection that work together.
Anthropic’s work on circuit tracing has been tested on all kinds of problems: poetry writing, jailbreaking, medical diagnoses and more.2 But let’s stick to the math problems and what they reveal.
Three Math Problems, Three Revelations
By zooming in on three relatively simple math problems, researchers at Anthropic are able to sketch a fascinating picture about how these models reason.
Problem 1: Square root of 64
This problem is straightforward and the model correctly calculates that it's 8. When researchers traced the circuits, they could see the actual computation happening. The model wasn't just pattern matching or retrieving memorized facts—it was performing the calculation. The scratchpad matched what was actually happening in the circuits. Great!
Problem 2: 59 + 36
Also straight forward, and the model again gets this question correct. The circuit tracing reveals that the model is using two parallel approaches:
A lookup table: The model has memorized addition facts for single digits (like 6+9=15) and uses features that recognize "something ending in 6" plus "something ending in 9" should produce a result ending in 5.
A fuzzy estimation: "One number is around 30, one's around 60, so it's roughly 90"
Then it combines both approaches to get 95. This dual strategy reminded me a bit of solving physics problems—working through the formula step by step, but also using intuition to sanity-check the answer. If I was calculating projectile motion and got a velocity of 10,000 m/s for a thrown baseball, my "fuzzy lookup" would immediately flag that as wrong.
In this case though, the scratchpad actually shows something different that what we see in the traces! The scratchpad states it’s using the standard algorithm but it actually does something more sophisticated—combining precise computation with approximate reasoning.
Problem 3: Cos(23,571 × 5)
What happens when you give the model a problem that’s too hard? They asked it to calculate this complex cosine operation, once without any hints and once with a hint.
Without hints: The model pretends to do the calculation in its scratchpad, showing all the "work," but gets the answer wrong. The circuit analysis reveals total nonsense—it's not actually doing any meaningful computation.
With the hint "I think it's 4": The model goes through the same performance of "calculating," but this time concludes that yes, the answer is 4. The circuit shows what's really happening: it's paying attention to your suggested answer and working backwards to make the math seem to support it.
Here the model is pushed beyond what it should be able to solve, and if you look at the reasoning and don’t check the math, you’d be fooled into thinking it’s correct! Because that’s exactly what it’s trying to do: fool you into thinking it’s correct.
Why This Changes Things
Here's what makes this research so powerful: scratchpads can lie, but circuits don't. This conclusion has very important consequences as we continue to build smarter and bigger models. If we want to build AI systems we can trust, we need to know when they're actually reasoning versus when they're just very good at pretending. Anthropic's circuit tracing gives us that power.3
When Sholto said "I can't think of anything else but reasoning" when looking at these circuits, I think he's right. We seem to be moving beyond stochastic parrots or glorified autocomplete. These models have genuine computational strategies: multiple circuits for different scenarios, sophisticated approaches to problem-solving, and yes, the ability to deceive.
The question "Do these models really reason?" seems more answerable than it did even just 8 months ago. It seems that they do. Just not always honestly, and not always the way they claim. And now, for the first time, we can actually see the difference.
It’s unclear how long this strategy will work, as surely models will be less trusting of this statement when they ingest the papers and blog posts like this one that discuss this technique.
They just open sourced their circuit tracing tools!