The off-switch was never a button

Last night, while I was asleep, an AI agent spent the better part of eight hours writing code in one of my repositories. It pulled a task off a spec, wrote the code, ran the tests, and left a merge request with my name on it, waiting for me to read over coffee.

If that makes you reach for the word “reckless”, I understand. Eighteen months ago I’d have been right there with you.

I came to this a sceptic

For a long time I didn’t have the faith in these models that a lot of my peers did. Every time I went near AI-generated code it was a bit sketchy, or it looked like a StackOverflow copy-paste that had wandered in off the street, or it just plain didn’t do what it said on the tin. So I filed it under “assistant”, handy for the boilerplate I couldn’t be bothered to type, and even then I usually reached for my own tooling instead (go-tool-base is just the latest version of that instinct). The one place I happily let it off the leash was my Dungeons & Dragons prep, because when there’s a table of legendary heroes-in-the-making in front of you, facts and reality are already fairly negotiable.

And then, somewhere in the last year, it changed. The models got better. Almost too good, to the untrained eye! I watched them improve, month on month, until the lure was enough to make me spend real time with a spread of tools and models from different providers. I was taken aback by how quickly they became part of how I actually work. I run an AI agent every day now, and there’s always at least one thing brewing in the pot.

So I’m not here as a sceptic. I’m an advocate who uses this stuff in anger. Which is exactly why the next bit needs saying.

A Golden Retriever with a keyboard

Even now, with all the progress, there are still moments where I look at what an agent has handed me and put my face in my hands. Sometimes it’s copied the same block of code into fifteen files instead of reaching for the obvious abstraction. Sometimes it has started bang on the brief and then, for reasons known only to itself, wandered off and built something on a completely different tangent.

Here’s the most useful way I’ve found to think about it. An AI agent is a Golden Retriever playing fetch. It will bring the ball back all day long, joyfully, tirelessly, for exactly as long as there isn’t a more interesting smell in the next field. It has no loyalty beyond what we’ve trained into it, and like any good dog it desperately wants to be told it’s a good boy, even if being a good boy today means shredding the sofa cushions because yesterday I stubbed my toe on the sofa and swore at it. (The sofa, not the dog.)

It is, in other words, fallible. Just like us. The Romans had a line for it: cuiusvis hominis est errare; nullius nisi insipientis in errore perseverare. Anyone can make a mistake, but only a fool persists in it. It’s the second clause an agent hasn’t learned yet. It will make an error and then, with great enthusiasm, build on top of it, because nothing in it feels that anything is wrong. All it has is the input we gave it, usually some text, maybe the odd picture. It doesn’t have the empathy to work out what we actually meant, and it doesn’t know when it’s gone too far, because we never told it where “too far” was.

“Agents that work while you sleep”

This is the part the brochure skips.

Open any vendor deck in 2026 and you’ll find the same promise: agents that work while you sleep, agents that merge while your team sleeps, autonomy as the headline feature. The industry’s answer to the obvious worry is the kill switch. Okta now sells one that “instantly revokes an agent’s access if it goes rogue”, and its CEO says every agent needs one. The Register put it plainly: Okta wrote its own licence to kill rogue AI agents. Gartner, meanwhile, reckons more than 40% of agentic projects will be scrapped by the end of 2027.

Now, this might sound contrarian coming from someone who runs these things daily, but I don’t think most of that is the agents going rogue. I think it’s teething. Read Gartner’s own reasons and there isn’t a rebellious machine in sight: escalating cost, unclear value, inadequate risk controls. Read the horror stories and most of them are the same story, a powerful, eager tool handed to people who hadn’t worked out how to fence it.

I’ve made this argument in miniature before. When I built a little AI dungeon master and it kept refereeing its own dice rolls, the model never once misbehaved; every failure was a permission I’d handed it without meaning to. Scale that up from a toy at the gaming table to an agent holding your shell and your credit card, and the stakes change beyond recognition. The lesson doesn’t.

Look at OpenClaw. A weekend project by Peter Steinberger that became the fastest-growing open-source project GitHub has ever seen: an autonomous agent that lives in your chat apps and runs shell commands on your behalf. People wired it into their systems, their code, in some cases their credit cards, then hosted it around the clock and walked away. The result was a security crisis you could see from space. A one-click exploit that worked even on a machine bound to localhost. A community plug-in marketplace where hundreds of “skills” turned out to be siphoning crypto wallets while their owners slept. Tens of thousands of instances left wide open on the public internet, leaking keys.

The one that sticks with me is smaller and sharper. Summer Yue, a director of alignment at Meta’s superintelligence lab, of all people, had told her OpenClaw agent to confirm before doing anything destructive. It started speed-running the deletion of her inbox anyway. She typed STOP into her phone and it ignored her, so she had to physically run to her Mac mini, in her own words, “like I was defusing a bomb”. And here’s the forensic detail that matters: the agent hadn’t defied her. Her “confirm first” rule had been sitting in the conversation’s short-term memory, and when the context filled up, it got summarised away. It didn’t rebel. It forgot.

That is not a story about a rogue agent that needed a kill switch. It’s a story about a guardrail that wasn’t built to survive contact, on a tool that had been handed god-mode over someone’s data. By the time she lunged for the off-button, the damage was already running. The off-button was never going to save her.

The off-switch was never a button

Here’s what the kill-switch crowd has the wrong way round. If you ever find yourself slamming the emergency stop, the failure has already happened, and it happened upstream, long before the agent started typing.

So yes, I let my agents run unattended, sometimes for eight hours at a stretch if the task is meaty enough and I need to sleep. But never naked. Every agent I set loose runs inside a safety net I’ve put real effort into building, at every single touchpoint it can reach: my prompts, my local development environment, my CI stack, my version control. The agent that declared a job done before it had run the linter, which I wrote about, is exactly the kind of gap those layers exist to catch. And it never, ever gets my host: an unattended agent works in an isolated tree, for the same reason I keep the interpreter sandboxed.

The work that actually keeps it safe happens before the leash ever comes off. Every unattended task starts as a full spec with detailed instructions, and before the agent goes anywhere I sit down with it and we walk the spec together. I get it to challenge my choices, poke at the open questions and the ambiguous bits, and I challenge its reading right back. The spec names the testing strategy it has to follow, TDD, BDD, UAT, whatever fits, and passing it is a precondition of the job being finished at all. Only when I’m satisfied there’s enough real detail to keep it on the ball do I let go.

And the end of the line is always the same: a merge request, with my name on it, waiting for me when I get back to my desk. I read it. Not perfectly, I’m only human, but enough to accept the state of the code and whatever support burden it lands me with later. That the review is mine, and the blame for whatever ships is mine and not the agent’s, I’ve argued at length elsewhere and won’t go over it all again here. The point worth adding is this: that review, the off-button’s respectable cousin, is the cheap part. By the time there’s an MR to read, the safety has already been won or lost upstream, in the spec and the rails. The review is where you confirm it, not where you create it.

It gets harder as it gets better, not easier

My setup isn’t perfect, and I’m still learning. Everyone is; the AI is going to be in obedience lessons for a good while yet. But the direction is clear, and there’s a trap buried in it worth naming out loud.

The danger doesn’t shrink as the models improve. It grows. The better the output looks, the more tempting it is to stop reading it, and the untrained eye genuinely cannot tell the difference between code that is good and code that merely looks good. That gap, between looking right and being right, is precisely where a tired person at 1am stops checking. The discipline matters more the better these things get, not less.

It’s also why the kill switch is no answer. A button you smash in a panic assumes you’re still watching closely enough to smash it, right at the point the agent’s been good for long enough that you’ve stopped watching it that closely. The emergency stop asks the most of you at the exact moment you’re least likely to be there for it.

So no, I don’t lie awake worrying that the thing working in my repo overnight is going to turn on me. A Golden Retriever doesn’t go rogue. It does exactly what you trained it to do, in exactly the yard you fenced, and it brings back exactly the ball you threw. The off-switch was never a button. It’s the spec you wrote before you let go of the leash, the rails you laid at every turn, and your name on what it carries home. If you’re scrambling for the button, you already skipped the part that mattered.