AI Won't Replace Your Judgment. But AI Can Sharpen It.
AI coding agents already stand out in writing code from a specification. Give them a bounded problem where architectural decisions are predefined, and they produce high-quality code very fast. They’re also good at explaining trade-offs for architectural decisions. I recently prompted Claude about multi-tenant isolation strategies and it answered with three approaches, listing latency and complexity costs for each.
This leaves the question: what is left for us software architects?
The answer is judgment. And what I’ve found is that AI can serve as something unexpected: a flight simulator for developing it.
Why judgment stays human
Let me go back to that tenant-isolation example. The AI laid out three strategies with their technical characteristics: latency, throughput, isolation level, implementation cost. All useful. But I was the one who recognized that the relevant dimension to evaluate wasn’t any of those. It was the fact that one strategy handled high load by rejecting requests to protect other tenants. In this specific business context, that’s unacceptable. The AI couldn’t know that.
That was a judgment call requiring understanding of consequences across the whole system and its relationship to the business context. I knew which question to ask, out of all the possible questions. I brought the lens that mattered. LLMs can surface technical characteristics, but they cannot understand your business context well enough to make the trade-off decision for you.
The flight simulator
So if judgment is our core contribution, how do we get better at it?
Researchers on expert intuition tell us that two things matter: a domain with learnable patterns, and enough practice with timely feedback. Kahneman and Klein call this the foundation of “naturalistic decision-making”. It’s how firefighters, chess players, and surgeons build expert intuition1.
I tried to create exactly that environment using AI. Working conversationally with Claude, I had it suggest possible solutions and implement whichever one I chose. I made the architectural decisions; Claude handled the code. This let me iterate on decisions in minutes instead of days, stripping away implementation time so I could focus purely on architectural thinking. It’s a flight simulator for architects.
An important caveat
This approach works best for quality attributes you can measure quickly and holistically. In my case, I was trying to improve latency, so I had an objective measure to optimize for: load and performance tests. When I deployed a bad decision, I immediately felt it in declining numbers. When I got it right, the improvement was just as visible. That tight feedback loop is what makes the practice effective.
Other attributes are harder. Maintainability across 17 teams? That feedback loop doesn’t fit in a practice project. But for the attributes you *can* measure (like latency, throughput or availability) this is the fastest feedback loop an architect has ever had.
Kahneman, D. & Klein, G. (2009). Conditions for Intuitive Expertise: A Failure to Disagree. *American Psychologist*, 64(6), 515–526.

