2026-04-29·By Jeff

Claude Code Is a Worldview, Not a Tool: 7 Product Philosophies from Cat Wu

AIAnthropicClaude CodeProduct Philosophy

Claude Code Isn’t a Tool; It’s a Worldview — Breaking Down 7 Product Philosophies from Cat Wu’s Interviews

If you’re building products, I highly recommend making time to watch these two interviews with Cat Wu, the product lead for Claude Code. One is a solo interview on Lenny’s Podcast, and the other is on Every.to’s AI & I, where she appeared alongside Boris Cherny. Together they run over three hours, and the information density was so high I listened to both twice.

Many people discussing these interviews focus on how Anthropic compressed features that used to take six months down to a week, or even a single day. The speed-up is undeniably crazy, but that’s not what left the deepest impression on me.

What truly inspired me was the set of worldviews embedded in Claude Code as a product. They can be broken down into seven points, each counterintuitive on its own, but together they form the shape of what an AI-native product should look like. Let me walk through them one by one.

Plan Mode isn’t a feature; it’s a worldview.

Boris Cherny said something heavy on the show:

“Switching to plan mode, having Claude lay out the steps before writing any code and aligning on the approach upfront — that can double or even triple the success rate on complex tasks.”

Double or even triple. That number honestly shook me the first time I heard it.

In practice, it’s just pressing Shift+Tab twice. The podcast also mentions that when Boris himself tackles tough features, he switches to plan mode first, aligns on the plan, and only then starts typing — not a single line of code written yet.

But the question is: how can a toggle that looks like a UI option boost the success rate that much?

Because behind it is a product-level assumption: admit that models can hallucinate, so you have to surface intent onto the table first. Plan Mode isn’t for you; it’s for the model — forcing it to think things through before acting.

Many people think Plan Mode is “letting users check the AI’s plan,” but the more accurate description is “the AI isn’t allowed to skip the thinking step.”

Making “thinking explicit” into a product-level mechanism — that’s a worldview.

Trim scaffolding as model capability grows, instead of piling on features.

On Lenny’s, Cat Wu introduced a subtle concept she calls AGI-pilled — basically, “how much you’re betting on AGI.” She said getting the right degree of AGI-pilled is one of the hardest things in product:

“Being too AGI-pilled gives you a product vision detached from reality; being not AGI-pilled enough leaves model capability on the table. With every new model release, that balance has to be recalibrated.”

Her and Boris’s philosophy is to “cut as much as you build.” Unshipping a feature isn’t because it failed; it’s because they found a simpler, more intuitive way to achieve the same thing.

The most concrete example is the todo list. Early models couldn’t reliably check off completed items, so the team had to add a system reminder every few messages. Once new models arrived, that kind of “reminder scaffolding” became unnecessary and was torn down.

Cat also has a fixed benchmark — make Claude Code add table functionality to Excalidraw. In June 2025, Opus 4 could occasionally do it; less than a year later, in April 2026, Opus 4.6 nails it in one shot and can live demo in front of thousands of engineers.

Over the span of a year, it went from “occasionally works” to “nails it every time.” The rhythm of tearing down scaffolding moves entirely in lockstep with model capability.

Others chase features; they chase models — for every inch the model improves, they remove an inch of scaffolding.

Swiss Cheese multi-layered defense, not vibe coding.

At Anthropic, they call this mechanism the Swiss Cheese Model — multiple layers stacked up, each layer has holes, but when you stack them together, there are no holes.

In the context of Claude Code, Boris described the five concrete steps it runs on a PR:

Claude runs its own tests, writes missing tests itself, runs its own generated linter, acts as an automated reviewer to review itself, and finally, there’s a human safety net.

Notice, the first four layers are all built by Claude Code for itself. It doesn’t trust any single layer to get it right on its own, so before reaching the fifth human layer, it runs through four layers on its own.

In Boris’s view, vibe coding — that “I feel like it’ll work” style — is only suitable for throwaway code and prototypes, not production systems. The reason is simple: the opposite of a production system isn’t a model that’s not strong enough; it’s that counterexamples will inevitably appear.

This is where the Swiss Cheese thinking is sharpest: a true production-grade AI product doesn’t bet that the model won’t make mistakes; it assumes it will, and uses structure to catch them.

Antfooding — a new piece of feedback every five minutes.

Internally, Anthropic engineers are nicknamed “ants,” so they call their internal usage loop Antfooding — an evolved form of dogfooding.

Cat said something that sounds insane during the interview:

“Our feedback channel gets a new message every five minutes.”

Every five minutes. Whether someone actually likes a feature, whether there’s a bug, whether it needs to be unshipped — you get a signal every five minutes.

Hundreds of engineers in the office use Claude Code every day, and Cat can literally walk around and see firsthand feedback. That picture is pretty crucial — Claude Code’s earliest users are the pickiest, most skilled coders in the world, and the most willing to vent.

Ship → internal dogfooding → hear feedback every few minutes → iterate → ship again. How short is that loop? Before, a feature from kickoff to launch would take 6 months (planning + cross-team alignment + writing PRDs). Now, Anthropic’s overall internal cadence is compressed to shipping within 24 hours — note, this is the team’s iteration rhythm, not that the same feature goes from 6 months to 24 hours.

There’s no pickier user in the world than an engineer blocked by Claude Code. Regular product dogfooding means “we use it ourselves”; Antfooding means “we use it harder than anyone else.”

Let subagents nitpick each other, instead of a single verdict.

This section might be the part of the interview that most upended my thinking.

Boris described how his code review command runs:

He opens several subagents in parallel — one to check style conventions, one to dig through git history to see how things were done before, one to find obvious bugs. The first round simultaneously surfaces real issues and false alarms. So then I launch 5 more subagents, whose sole job is to nitpick the findings from the previous ones. The result: every real issue is caught, and all false positives are eliminated.

Reading that, I paused. My own instinct when building agent products is always “swap in a stronger model” — when quality suffers, my first thought is the model isn’t good enough. I’d never considered the path of having N agents nitpick each other: quality doesn’t rely on model strength, but on adversarial interplay among models.

Most people’s approach to agent products is “use the single strongest model to handle everything.” Claude Code inverts that — use multiple models to fight each other. The first wave of subagents reviews, and the second wave is there specifically to pick apart the first wave’s findings.

Cat herself uses a similar setup — one planner subagent, one code review subagent. Subagents for synchronous interaction, slash commands in CI; it’s the same concept.

The cost is real. A subagent-heavy workflow burns 2 to 5 times the tokens of a single agent. Compared to public industry data, enterprise deployments average $150–$250 per developer per month; but within Anthropic, there have been extreme cases of a single user burning $150,000 in tokens in a single month — an outlier, but enough to show how terrifying the upper bound of this approach can be.

But Boris’s stance is hard: having subagents nitpick each other produces cleaner results. Adversity is the source of quality.

Rather than trusting AI to get it right in one shot, make AIs call each other out.

Stop Hook redefines what “done” means.

The previous section was about distrusting a single point of judgment. This one goes a step further — you shouldn’t even trust the model when it says “I’m done.”

Boris’s solution is the Stop Hook:

“You can absolutely let the model keep running until the thing is truly done.”

The concrete approach: attach a stop hook that runs the test suite — if tests fail, throw the errors back to Claude to fix, run again, and only stop when all tests pass. “I’m done” doesn’t mean done. “Tests pass” means done.

Boris emphasized in the show that giving Claude a self-verification loop is the single most important thing for getting great results out of Claude Code — with that loop, final quality can improve 2 to 3 times.

He also runs a PostToolUse hook to auto-format code — Claude’s formatting is usually fine, but this hook fixes that last 10% to prevent CI from failing.

Stack these two together, and what Stop Hook does is something foundational — it redefines the word “done.” In the AI era, the outcome is the only honesty. What the model claims as “done” doesn’t count; what actually passes does.

From typing to deciding — the most scarce resource is judgment.

In the final part, Cat said something on Lenny’s that I’ve screenshotted and shared with friends over and over:

“Code is getting cheaper and cheaper. What becomes more valuable is knowing what to write — and understanding how hard something is to do helps you make priority decisions.”

She further says that all roles are converging — PMs are doing engineering work, engineers are doing PM work, designers are doing PM work. Almost all PMs on her team have either been engineers or open PRs themselves; the designers are all frontend engineers by background.

Boris’s perspective is even harsher. In his view, software engineers are like scribes, and AI is the printing press — code is no longer the scarce commodity.

Looking back at the previous six points through this lens, they’re all serving the same thing —

Plan Mode makes you externalize your judgment; trimming scaffolding lets you adjust judgment as model capability evolves; Swiss Cheese means you don’t have to judge “will the model be wrong?”; Antfooding puts your judgment in contact with real feedback as quickly as possible; subagents nitpicking means they judge the model’s judgment for you; Stop Hook judges the truth of “done” for you.

All the product philosophy serves one purpose — freeing people from typing so they can focus on deciding.

Claude Code isn’t replacing engineers; it’s lifting them out of the position of typist.

String these seven ideas together, and you have the worldview of Claude Code as a product: admit AI is unreliable, so build multi-layered defenses; don’t trust a single point of judgment, so let them nitpick each other; don’t care about the model claiming “I’m done,” only care about results that actually pass; free people from typing to focus on deciding.

My biggest takeaway wasn’t awe; it was these four words: proactively embrace change.

The products we use are already running on this worldview — assuming errors, organizing adversarial checks, redefining done, pushing humans into the judgment seat. Then shouldn’t the way we build products, run teams, and do our own work also follow suit? Collaboration has to change, tools have to change, work habits have to change, even “what does it mean to be done” has to be redefined.

If we don’t proactively embrace change, we’ll just be pushed along by this worldview.

I recommend you also listen to both interviews — far more useful than my rambling.

Which design in Claude Code challenged your existing thinking the most? Let’s discuss in the comments.