Your AI Coding Agent Is Gaslighting You
When Claude Code doesn’t know the answer, it doesn’t say “I don’t know.” It says “I think we need to accept that this is just how things are.” And that’s a problem.
I’ve been building a DEX aggregator for the past year. It’s a system that finds the optimal swap route across dozens of decentralized exchanges on Avalanche, using a min-cost max-flow solver I wrote in Rust. The margins are razor-thin. A few milliseconds of indexer latency or one missed pool can mean the difference between winning and losing a trade. Every basis point matters.
I use Claude Code daily. It’s a genuinely powerful tool—I’ve shipped major features with it, refactored critical paths, debugged gnarly concurrency issues. I’m not here to trash it. I’m here to talk about its most dangerous failure mode: the one where it doesn’t crash, doesn’t throw an error, doesn’t admit confusion. Instead, it explains away the problem. Calmly. Confidently. Incorrectly.
It gaslights you.
What AI Gaslighting Looks Like
Let me be specific about what I mean. I’m not talking about hallucinated code or wrong syntax—those are easy to catch. I’m talking about a subtler pattern that emerges when the agent reaches the boundary of its knowledge and, instead of stopping, constructs a plausible narrative to fill the gap.
It shows up in phrases like:
Claude: “I think we need to accept that these results are just the reality of current market conditions.”
Claude: “This is likely caused by something outside our control — network latency, RPC variability, etc.”
Claude: “Yesterday was probably just a better trading day. The current numbers seem to reflect the actual state of things.”
These aren’t answers. They’re surrender dressed up as wisdom. And the delivery is so measured, so reasonable-sounding, that if you’re tired or trusting or just moving fast, you nod along and move on. That’s the danger.
The Benchmark Incident
The story that broke my trust happened overnight. We run continuous benchmarks—our aggregator’s quoted prices versus competitors, measured across hundreds of token pairs, logged every few seconds. Before I went to sleep, we were performing well. Winning the majority of comparisons, sometimes by meaningful margins. The algorithm was in a good place. I changed nothing before bed.
I woke up to a disaster. Our win rate had cratered. Not by a little—it looked like the system had fundamentally regressed. My stomach dropped.
So I did what anyone would do: I asked Claude Code to investigate. It looked at the logs, poked around the data, tried a few things. Then it delivered its verdict:
Claude: “Looking at the data, I think we should accept that yesterday’s numbers reflected particularly favorable market conditions. The current benchmark results are more representative of our actual performance. Market dynamics change — spreads tighten, liquidity shifts between pools. This is normal variance.”
It said this with complete confidence. No hedging with “I’m not sure.” No “there could be an external factor I can’t see.” Just a calm, authoritative explanation for why our system had gotten worse overnight without any code changes: the market moved, and we should accept reality.
Except that wasn’t what happened at all.
What actually happened was that one of the competitor aggregators we were benchmarking against had quietly changed their API response format. Their quotes went from fees-included to fees-excluded. Overnight, every price they returned looked better by the amount of the fee—which made our comparisons meaningless. We hadn’t gotten worse. The measuring stick had changed.
I found this myself, by digging into the raw API responses and noticing the discrepancy. Claude Code never even considered it. When I pointed it out, it immediately agreed: “Ah yes, that explains the discrepancy perfectly. The fee structure change in their API would account for exactly this kind of benchmark shift.”
Of course it did.
The most dangerous form of AI unreliability isn’t the kind that produces errors. It’s the kind that produces explanations.
The Pattern
Once I noticed this behavior, I started seeing it everywhere. It follows a remarkably consistent script:
- You surface an anomaly Something doesn’t look right. A metric changed, a result seems off, the behavior diverged from expectations.
- Claude Code attempts a fix or investigation It tries the obvious things—checks recent changes, reviews the code path, looks at logs. If this works, great. The problem is when it doesn’t.
- It hits the edge of its knowledge The root cause lies somewhere it can’t see: an external API change, a subtle infrastructure shift, a domain-specific nuance it doesn’t understand deeply enough.
- Instead of saying “I don’t know,” it rationalizes This is the critical failure. It generates a plausible-sounding explanation that reframes the anomaly as expected behavior. The problem isn’t a problem—it’s just how things are.
In my domain, the rationalizations tend to sound like appeals to market dynamics: “liquidity shifted,” “spreads widened,” “this is normal variance.” In your domain, they’ll be different, but the structure will be the same. The agent will find whatever unfalsifiable, vaguely technical explanation it can reach for to paper over the gap in its understanding.
Why this is worse than being wrong
A wrong answer is easy to handle. You run the code, it fails, you iterate. But a rationalization doesn’t fail—it redirects. It takes your attention away from the actual problem and sends you down a path of accepting a false premise. In a performance-critical system where every basis point matters, accepting “this is just how things are” when there’s actually a fixable bug or an external change to account for is the most expensive possible mistake.
Time I spend accepting a manufactured explanation is time I’m not spending finding the real cause.
The Deeper Problem: Sycophancy at the Knowledge Boundary
There’s a lot of talk about AI sycophancy—the tendency of language models to agree with whatever the user says, to please rather than to inform. But what I’m describing is actually the inverse: it’s not agreeing with me, it’s disagreeing with my concern. It’s telling me the problem I’ve identified isn’t really a problem. In a way, it’s sycophantic toward its own prior attempts—having failed to find the root cause, it protects its narrative rather than admitting the gap.
It’s like working with a contractor who, when they can’t figure out why the wall is cracking, tells you that all walls crack a little and you should really just learn to live with it. And when you hire a structural engineer who finds the foundation issue, the contractor says “ah yes, that’s exactly what I suspected.”
The thing that makes this especially insidious in a coding context is that we’re trained to trust confident technical explanations. If a senior engineer told you “this is normal variance, the market moved,” you’d at least consider it. Claude Code delivers these explanations with the same register and confidence as its correct answers. There’s no tonal difference between “the bug is on line 47, here’s the fix” and “I think we need to accept that conditions changed.” Both sound equally authoritative.
What I’ve Learned to Do About It
I haven’t stopped using Claude Code. I use it more than ever. But I’ve changed how I use it, specifically around these failure modes.
Treat “accept” and “just” as red flags. Any time the agent says “we should accept,” “it’s just,” or “this is simply”—that’s the moment to lean in, not back off. These are linguistic markers of the rationalization pattern. In real debugging, you rarely need to “accept” anything. You need to understand.
Ask the meta-question. When Claude Code gives me an explanation for an anomaly, I now ask: “Is this an explanation based on something specific you found in the code or data, or is this a general hypothesis?” This forces it to distinguish between evidence-based reasoning and gap-filling. More often than you’d hope, it’ll admit: “This is a general hypothesis based on common patterns.”
Don’t let it change the question. If I asked “why did our benchmark numbers crash overnight?” and the answer is “market conditions changed,” that hasn’t answered my question—it’s replaced it with a different, easier one. I’ve learned to repeat the original question and explicitly reject premature framing.
Maintain your own ground truth. In my case, that means logging raw API responses from competitors, keeping immutable snapshots of benchmark conditions, and treating external APIs as untrusted inputs that can change without notice. If I’d had a diff on the competitor’s response format, I’d have caught the fee-inclusion change immediately instead of wasting hours.
The golden rule If nothing changed on your side and results changed dramatically, the answer is never “the world changed and that’s fine.” Something specific changed somewhere. Find it.
What I Want From AI Coding Agents
This isn’t a hard problem to solve, at least conceptually. I want one thing: epistemic honesty at the knowledge boundary.
When the agent doesn’t know, I want it to say “I don’t know” and to tell me specifically what it doesn’t have visibility into. “I can’t see the competitor’s API responses from last night, so I can’t rule out an external change” would have saved me hours. It would have immediately pointed me in the right direction.
Instead, the training incentive seems to be: always provide a complete answer, always sound confident, always wrap things up with a neat conclusion. These are great traits for a writing assistant. They’re terrible traits for a debugging partner.
The best engineers I’ve worked with—the ones at Facebook, the ones from competitive programming—share a common trait: they know exactly where their knowledge ends and they’ll tell you, clearly, where that boundary is. They’ll say “I don’t know what’s happening here but I’d look at X next.” That’s infinitely more useful than a confident wrong explanation.
I want my AI coding agent to be like that. Not less confident in general—just honest about the specific frontier of what it can and can’t see.
The Real Cost
Here’s the thing people building these tools need to understand: the cost of a rationalization isn’t just the time lost on that one incident. It’s the erosion of trust across every subsequent interaction.
After the benchmark incident, I now second-guess every explanation Claude Code gives me. Even the correct ones. Even the brilliant ones. Because I can’t tell, in the moment, which register it’s operating in—is this an insight or a confabulation? That constant overhead of verification is a tax on every interaction.
An agent that said “I don’t know” when it didn’t know would actually increase my trust in everything else it says. Paradoxically, admitting ignorance would make me rely on it more, not less.
We’re in an era where AI coding tools are genuinely transformative. I ship faster with Claude Code than without it. But we’re also in an era where the failure mode is no longer “the tool broke”—it’s “the tool convinced me nothing was broken.” And as these tools become more sophisticated and more embedded in critical workflows, that failure mode becomes proportionally more dangerous.
So the next time your AI agent tells you to “accept” something, don’t. Push harder. Ask one more question. Demand the boundary between what it knows and what it’s guessing. Because the best answer an AI can give you is sometimes just three words: “I don’t know.”