Three stacked layers representing a spec, a verification station, and a workshop environment for building with AI
Back to blog
AI AgentsClaude CodeAndrej Karpathy

The Three Layers of Working With AI: Spec, Verifier, and Environment

Sascha KieferAI & Agents

Most people prompt AI agents wrong. Drawing on Andrej Karpathy's method, we break down the three layers that turn an AI assistant into a reliable collaborator - the spec, the verifier, and the environment - and show how to apply each one.

We recently watched Andrej Karpathy describe how he uses AI today, and one claim stuck with us: almost everyone is prompting AI wrong. Not because the prompts are badly worded, but because they treat the model like a vending machine instead of building a system around it.

Who is Andrej Karpathy? A founding member of OpenAI and the former Director of AI at Tesla, where he led the Autopilot vision team. He's also one of the most-followed teachers in the field - his lectures and open-source projects on neural networks have shaped how a generation of engineers thinks about deep learning. When he talks about how to actually work with these models, it's worth listening.

When we dug into his method, it broke down into three simple layers that stack on top of each other: the spec, the verifier, and the environment. Get all three right and you don't just write better prompts - you build a workflow that compounds over time. In this post we'll walk through each layer and how to apply it in practice.

Why AI Needs More Than a Good Prompt

Karpathy uses a deceptively simple example to expose the gap:

"I want to go to a car wash to wash my car, and it's 50 m away. Should I drive or should I walk? And state-of-the-art models today will tell you to walk because it's so close."

We tried it ourselves across several leading models, and they all said the same thing: walk. They miss the obvious - you need the car at the car wash. AI is brilliant at what can be measured, but for context-driven judgment it has no signal to act on.

The whole point of the three layers is to bridge that gap: to move your understanding and context into a form the AI can actually use, and then to keep it honest while it works. Here's how they fit together before we dig into each one:

Diagram of the three layers: your understanding feeds the Spec, which drafts into the Verifier in a feedback loop, all living inside the Environment, producing a high-quality result

Layer One: The Spec

A spec is how you deliver your understanding to the AI in a format it can use. Many people reach for "plan mode" here, but Karpathy argues that's too high-level:

"I actually don't even like the plan mode. [...] I think there's something more general here where you have to work with your agent to design a spec that is very detailed."

He isn't saying plan mode is bad - he's saying you have to go deeper and collaborate on the spec. Three principles make a spec usable:

1. Uncover the real goal

There's a difference between a task and a goal. "Create an end-of-month report" is a task. The goal is the conclusion you're trying to draw, the decision the report drives - and that's something AI can never decide for you. A practical trick is to flip the interaction:

"Interview me to identify the goal of this project."

This pulls the knowledge out of your head and into the spec.

2. Be agile, not waterfall

People are dangerously prone to using AI agents in a waterfall manner - dumping the entire job on them at once and waiting for a finished product. The better move is agile specking: a tight scope, a clear checkpoint, review, adjust, repeat. Bake it into your instruction:

"Bias towards smaller, more compartmentalized specs."

3. Be precise and use your brain

The more precise you are, the less the AI has to assume - and every assumption is a chance for it to drift. When the AI drafts a spec for you, read it critically. Force the checkpoints:

"Make me verify key decisions explicitly to ensure nothing is missed."

Put together, these three give you a tightly scoped spec that actually aligns with your goal. This is what Karpathy calls modern engineering - and it's a skill, not a shortcut.

Layer Two: The Verifier

Layer two sits on top of the spec: once the AI produces something, how do you know it's good? To get this right, Karpathy offers a mental model - we're not building animals, we're summoning ghosts:

"If you yell at them, they're not going to work better or worse [...] it's all just kind of like these statistical simulation circuits. It's more just being suspicious of it and figuring it out over time."

We find it easier to picture a robot librarian. Ask it a question and it answers from the books in its library. If the right book isn't there, it doesn't know that - so it may confidently make something up. That's exactly why AI nails math and fumbles context: when the library has a clear answer, it shines; when it doesn't, it's confidently wrong.

The implication is liberating: pleading, yelling, or "just make it better" doesn't work, because that's treating a ghost like an animal. The one lever that does work is verification. Three places to apply it:

1. Set evaluation criteria up front

Before the AI touches anything, define what "good" looks like with precision. "Make this report look good" is vague. "The report must have three sections, each ending with a recommendation" is something the model can actually check against. Add it to your verification prompt:

"Outline the evaluation criteria you will use to ensure a high-quality final product. Be precise."

2. Use a second model as the critic

Think of a second robot librarian from a different library - a different set of books, and therefore a different perspective on whether the first answer holds up. In Claude Code, for example, you can wire up another model and ask it to grade the output:

"If this turns into a complex build, run the final output by a second model to ensure both systems agree."

3. Pull in external signal

Wherever possible, bring in ground truth. Deploying an app? Connect the AI to the system it deployed to, so it can confirm the deployment actually succeeded instead of assuming. Writing a monthly report? Feed it last quarter's reports as a reference for the exact format. You're pulling real data into the verification loop so a "success" is verifiable, not asserted.

Together, these three turn verification into a loop rather than a one-shot check:

Verification loop: define criteria up front, the AI produces output, a check against the criteria fed by a second model and external signal, revising until it passes into a verified result

As Boris Cherney, the creator of Claude Code, put it: if the AI has a feedback loop, it will 2-3x the quality of the final result. That's the entire reason this layer exists.

Layer Three: The Environment

The spec and the verifier need somewhere to live - and that's the environment you build in. Picture a workshop: the spec is the blueprint pinned to the wall, the verifier is the quality-check station by the door, and the environment is the workshop itself. The problem is that most people rebuild the workshop from scratch every single time. A single long chat history is not a workshop. Here's how to build one that improves over time.

1. Set up a proper AGENTS.md

A project instruction file gets injected on every prompt - it's the first thing the agent reads to understand how to operate. We keep ours in an AGENTS.md at the repo root: a vendor-neutral standard that an increasing number of AI coding tools read automatically, so a single file drives every assistant on the team instead of one per tool. Use it to encode defaults you'd otherwise have to repeat:

"Before building anything multi-step, include a verification plan."

Now verification is forced into every build instead of being something you have to remember to ask for. Good instruction files describe how the repo works, which custom skills exist and when to use them, where knowledge lives, and the non-negotiable working rules. Make the environment yours - the AI should be living in your world, not the other way around.

2. Build an LLM knowledge base

Karpathy went viral on this: a deliberate folder structure on your machine that holds your own "training data" in a way that's easy for the AI to navigate. Your data is your moat. This is how you start compounding your own intellectual property instead of starting from zero each session.

3. Build out your skill set

A good rule of thumb: if you plan to do something repeatedly, turn it into a custom skill - a reusable handbook for a specific task. And skills get better with use. We have a saying on our team: the best way to find a leak in a hose is to run water through it. Keep running water through your skills and the rough edges reveal themselves; your system compounds over time.

4. Create hard rules, not just requests

Depending on the cost of getting something wrong, you need different guardrails. A line in AGENTS.md like "don't touch the /important folder" is a request - the AI can still ignore it. For things that are critical not to get wrong, enforce them at the tool level. A pre-tool-use hook that inspects the file before any write or edit makes the action literally impossible, not merely discouraged - an AGENTS.md line is a guide the agent can ignore, a hook is a wall it can't.

A useful way to bucket your work:

  • Always do - safe enough to run on autopilot.
  • Ask first - anything you want to double-check before it happens.
  • Never do - lines that absolutely cannot be crossed, enforced by rules rather than requests.

The One Thing Worth Learning Deeply

Asked what still matters when intelligence gets cheap, Karpathy gave an answer that ties the three layers together:

"You can outsource your thinking, but you can't outsource your understanding."

Every layer here orbits your understanding of the bigger picture. The spec only works if you know the real goal. The verifier only works if you know what "good" looks like. The environment only works if you know which rules can never be crossed. The AI supplies the computation; you supply the understanding - and that's the part you can't delegate.

How We Apply This at vensas

None of this is theoretical for us. We treat spec-first collaboration, explicit verification loops, and well-tended project environments as part of how we build software with AI day to day - it's why our AGENTS.md conventions, custom skills, and tool-level guardrails matter as much as the code itself. The result is faster delivery without surrendering control over quality.

Conclusion

Prompting better isn't about magic words. It's about building three layers around the model:

  • The spec - move your understanding and your real goal into a precise, agile, critically-reviewed plan.
  • The verifier - define "good" up front, bring in a second opinion, and pull in external signal so quality is checked, not assumed.
  • The environment - a workshop that improves over time through instruction files, a knowledge base, reusable skills, and hard guardrails.

Get all three working together and you stop fighting the model and start compounding with it.

Need Support?

Want to bring spec-driven, verifiable AI workflows into your own team but unsure where to start? We're happy to help. Reach out via our contact page and we'll work through it together.

How are you structuring your own AI workflows? We at vensas would love to compare notes and exchange best practices.