Do Not Give the Agent the Keys to the Kingdom

I've been thinking through agentic coding from a practical standpoint, not just from the usual "can it write code?" angle.

I can see the appeal.

If I'm paying for the tokens, I don't just want the agent to answer questions. I'd like it to do some real work. Create files. Run tests. Chase compiler errors. Update documentation. Handle some of the repetitive mechanical work that slows a project down.

That's the attractive part.

But that's also the part that makes me cautious.

Because once an agent can act, the question changes.

It's no longer just, "Can ChatGPT or Claude write decent code?"

A better question is, "What can this thing reach if something goes wrong?"

That's where my thinking keeps coming back to the same place.

I wouldn't want to install an agent casually on my main machine. I probably wouldn't even want it running inside my normal development VM if that VM has access to too much of my real work.

That feels like inviting the vampire into the house and then trusting it to stay in the guest room.

Maybe it will.

Maybe it won't.

But I'd rather not build my safety plan around maybe.

The pattern that makes more sense to me is a dedicated virtual machine for agentic coding. Not my main machine. Not my regular work surface. A separate environment built for that purpose.

Then each project gets its own separate storage space that can be mounted into that VM like a drive.

When I want to work on one project, I mount that project.

When I'm done, I unmount it.

The agent doesn't get to see every project I own. It doesn't get my whole development history. It doesn't get random folders, personal files, business documents, old experiments, customer files, or anything else that has nothing to do with the job.

It only gets the room I invited it into.

That matters because the failure doesn't have to be dramatic or malicious.

A script can be wrong.

A path can be wrong.

A cleanup command can run one folder too high.

A package can do something unexpected.

A refactor can touch more files than intended.

A long session can drift away from the original instruction.

And honestly, the agent may just try too hard to help.

That might be one of the more realistic risks. The problem isn't always that the agent refuses to obey. Sometimes the problem is that it obeys badly, helps too broadly, or charges ahead with confidence when it should have stopped and asked.

I also keep thinking about the apology problem.

I've never seen a vampire apologize for killing everyone in the house.

But an AI agent probably will.

If it deletes the wrong files, rewrites something important, or "simplifies" a working codebase into something much smaller and much worse, it may apologize like crazy when you point it out.

But that doesn't help much.

The apology doesn't restore the folder.

It doesn't bring back the original structure.

It doesn't undo the bad refactor unless you already had the right backups, snapshots, or source control discipline in place.

That's why I want the protection before the session starts, not after I'm asking the agent why it did what it did.

I don't like relying only on instructions like:

"Only work in this folder."

That's fine as an instruction.

But it's not a boundary.

A real boundary is when that folder is all the agent can reach.

A real boundary is when the credentials are limited.

A real boundary is when the repo doesn't contain secrets.

A real boundary is when unrelated projects aren't mounted.

A real boundary is when I have a snapshot and a backup before the session starts.

That's a different level of safety.

It doesn't make agentic coding risk-free. Nothing does.

But it changes the size of the mistake.

If the agent damages one mounted project, that's bad.

If the agent damages every project the developer owns, that's a very different kind of day.

That's the part I think some developers may underestimate when they first experiment with agents.

The tool works once, so they trust it more.

Then it works again, so they relax a little.

Then they start letting it do longer runs, bigger edits, more commands, more refactors, more package installs, more cleanup work.

And maybe that's fine.

Until it isn't.

I'm not against agentic coding. I'm interested in it, especially where it can take over some of the mechanical work around tests, scaffolding, documentation, and repetitive implementation.

But I don't want to confuse convenience with containment.

If I'm going to invite the agent in, I want to decide which room it gets to enter before I open the door.

Limited credentials.

One mounted project.

No secrets in the repo.

Snapshots.

Backups.

Human review before anything gets trusted or pushed.

That's the difference between asking the agent to behave and making sure it can't reach anything it shouldn't touch.

The safest agent isn't the one that promises to stay in the box.

The safest agent is the one that can't get out of the box.

Where this goes next

This is the personal planning issue behind the companion guide, You Have to Invite the Vampire In.

That guide steps back from my own thinking and looks at the bigger question: what should a developer do before inviting an AI coding agent into a real development environment?

-- Charles