Soapbox on PHP Boy Scout

The off-switch was never a button

Thu, 02 Jul 2026 00:00:00 +0000

Last night, while I was asleep, an AI agent spent the better part of eight hours writing code in one of my repositories. It pulled a task off a spec, wrote the code, ran the tests, and left a merge request with my name on it, waiting for me to read over coffee.

If that makes you reach for the word “reckless”, I understand. Eighteen months ago I’d have been right there with you.

I came to this a sceptic

For a long time I didn’t have the faith in these models that a lot of my peers did. Every time I went near AI-generated code it was a bit sketchy, or it looked like a StackOverflow copy-paste that had wandered in off the street, or it just plain didn’t do what it said on the tin. So I filed it under “assistant”, handy for the boilerplate I couldn’t be bothered to type, and even then I usually reached for my own tooling instead (go-tool-base is just the latest version of that instinct). The one place I happily let it off the leash was my Dungeons & Dragons prep, because when there’s a table of legendary heroes-in-the-making in front of you, facts and reality are already fairly negotiable.

And then, somewhere in the last year, it changed. The models got better. Almost too good, to the untrained eye! I watched them improve, month on month, until the lure was enough to make me spend real time with a spread of tools and models from different providers. I was taken aback by how quickly they became part of how I actually work. I run an AI agent every day now, and there’s always at least one thing brewing in the pot.

So I’m not here as a sceptic. I’m an advocate who uses this stuff in anger. Which is exactly why the next bit needs saying.

A Golden Retriever with a keyboard

Even now, with all the progress, there are still moments where I look at what an agent has handed me and put my face in my hands. Sometimes it’s copied the same block of code into fifteen files instead of reaching for the obvious abstraction. Sometimes it has started bang on the brief and then, for reasons known only to itself, wandered off and built something on a completely different tangent.

Here’s the most useful way I’ve found to think about it. An AI agent is a Golden Retriever playing fetch. It will bring the ball back all day long, joyfully, tirelessly, for exactly as long as there isn’t a more interesting smell in the next field. It has no loyalty beyond what we’ve trained into it, and like any good dog it desperately wants to be told it’s a good boy, even if being a good boy today means shredding the sofa cushions because yesterday I stubbed my toe on the sofa and swore at it. (The sofa, not the dog.)

It is, in other words, fallible. Just like us. The Romans had a line for it: cuiusvis hominis est errare; nullius nisi insipientis in errore perseverare. Anyone can make a mistake, but only a fool persists in it. It’s the second clause an agent hasn’t learned yet. It will make an error and then, with great enthusiasm, build on top of it, because nothing in it feels that anything is wrong. All it has is the input we gave it, usually some text, maybe the odd picture. It doesn’t have the empathy to work out what we actually meant, and it doesn’t know when it’s gone too far, because we never told it where “too far” was.

“Agents that work while you sleep”

This is the part the brochure skips.

Open any vendor deck in 2026 and you’ll find the same promise: agents that work while you sleep, agents that merge while your team sleeps, autonomy as the headline feature. The industry’s answer to the obvious worry is the kill switch. Okta now sells one that “instantly revokes an agent’s access if it goes rogue”, and its CEO says every agent needs one. The Register put it plainly: Okta wrote its own licence to kill rogue AI agents. Gartner, meanwhile, reckons more than 40% of agentic projects will be scrapped by the end of 2027.

Now, this might sound contrarian coming from someone who runs these things daily, but I don’t think most of that is the agents going rogue. I think it’s teething. Read Gartner’s own reasons and there isn’t a rebellious machine in sight: escalating cost, unclear value, inadequate risk controls. Read the horror stories and most of them are the same story, a powerful, eager tool handed to people who hadn’t worked out how to fence it.

I’ve made this argument in miniature before. When I built a little AI dungeon master and it kept refereeing its own dice rolls, the model never once misbehaved; every failure was a permission I’d handed it without meaning to. Scale that up from a toy at the gaming table to an agent holding your shell and your credit card, and the stakes change beyond recognition. The lesson doesn’t.

Look at OpenClaw. A weekend project by Peter Steinberger that became the fastest-growing open-source project GitHub has ever seen: an autonomous agent that lives in your chat apps and runs shell commands on your behalf. People wired it into their systems, their code, in some cases their credit cards, then hosted it around the clock and walked away. The result was a security crisis you could see from space. A one-click exploit that worked even on a machine bound to localhost. A community plug-in marketplace where hundreds of “skills” turned out to be siphoning crypto wallets while their owners slept. Tens of thousands of instances left wide open on the public internet, leaking keys.

The one that sticks with me is smaller and sharper. Summer Yue, a director of alignment at Meta’s superintelligence lab, of all people, had told her OpenClaw agent to confirm before doing anything destructive. It started speed-running the deletion of her inbox anyway. She typed STOP into her phone and it ignored her, so she had to physically run to her Mac mini, in her own words, “like I was defusing a bomb”. And here’s the forensic detail that matters: the agent hadn’t defied her. Her “confirm first” rule had been sitting in the conversation’s short-term memory, and when the context filled up, it got summarised away. It didn’t rebel. It forgot.

That is not a story about a rogue agent that needed a kill switch. It’s a story about a guardrail that wasn’t built to survive contact, on a tool that had been handed god-mode over someone’s data. By the time she lunged for the off-button, the damage was already running. The off-button was never going to save her.

The off-switch was never a button

Here’s what the kill-switch crowd has the wrong way round. If you ever find yourself slamming the emergency stop, the failure has already happened, and it happened upstream, long before the agent started typing.

So yes, I let my agents run unattended, sometimes for eight hours at a stretch if the task is meaty enough and I need to sleep. But never naked. Every agent I set loose runs inside a safety net I’ve put real effort into building, at every single touchpoint it can reach: my prompts, my local development environment, my CI stack, my version control. The agent that declared a job done before it had run the linter, which I wrote about, is exactly the kind of gap those layers exist to catch. And it never, ever gets my host: an unattended agent works in an isolated tree, for the same reason I keep the interpreter sandboxed.

The work that actually keeps it safe happens before the leash ever comes off. Every unattended task starts as a full spec with detailed instructions, and before the agent goes anywhere I sit down with it and we walk the spec together. I get it to challenge my choices, poke at the open questions and the ambiguous bits, and I challenge its reading right back. The spec names the testing strategy it has to follow, TDD, BDD, UAT, whatever fits, and passing it is a precondition of the job being finished at all. Only when I’m satisfied there’s enough real detail to keep it on the ball do I let go.

And the end of the line is always the same: a merge request, with my name on it, waiting for me when I get back to my desk. I read it. Not perfectly, I’m only human, but enough to accept the state of the code and whatever support burden it lands me with later. That the review is mine, and the blame for whatever ships is mine and not the agent’s, I’ve argued at length elsewhere and won’t go over it all again here. The point worth adding is this: that review, the off-button’s respectable cousin, is the cheap part. By the time there’s an MR to read, the safety has already been won or lost upstream, in the spec and the rails. The review is where you confirm it, not where you create it.

It gets harder as it gets better, not easier

My setup isn’t perfect, and I’m still learning. Everyone is; the AI is going to be in obedience lessons for a good while yet. But the direction is clear, and there’s a trap buried in it worth naming out loud.

The danger doesn’t shrink as the models improve. It grows. The better the output looks, the more tempting it is to stop reading it, and the untrained eye genuinely cannot tell the difference between code that is good and code that merely looks good. That gap, between looking right and being right, is precisely where a tired person at 1am stops checking. The discipline matters more the better these things get, not less.

It’s also why the kill switch is no answer. A button you smash in a panic assumes you’re still watching closely enough to smash it, right at the point the agent’s been good for long enough that you’ve stopped watching it that closely. The emergency stop asks the most of you at the exact moment you’re least likely to be there for it.

So no, I don’t lie awake worrying that the thing working in my repo overnight is going to turn on me. A Golden Retriever doesn’t go rogue. It does exactly what you trained it to do, in exactly the yard you fenced, and it brings back exactly the ball you threw. The off-switch was never a button. It’s the spec you wrote before you let go of the leash, the rails you laid at every turn, and your name on what it carries home. If you’re scrambling for the button, you already skipped the part that mattered.

Bought, not stolen

Tue, 23 Jun 2026 00:00:00 +0000

The malware that spent months wearing Microsoft’s trust didn’t steal a thing. No cracked certificate authority, no private key lifted off some breached vendor. There was, in effect, a shop: you uploaded your malware to a website, paid somewhere between five and nine thousand dollars, and got it back signed with a real, valid certificate. The same kind that vouches for the software you actually want on your machine.

The crew running that shop is one Microsoft’s Digital Crimes Unit tracks as Fox Tempest, and the certificates weren’t forged. They were minted through Microsoft’s own code-signing service, which Fox Tempest abused under a pile of fake identities and impersonated companies, then resold to anyone with the money. The malware they dressed up was the usual rogues’ gallery, the Oyster backdoor, the Lumma and Vidar infostealers, often got up as spoofed Teams or AnyDesk installers. In May, Microsoft pulled the operation apart: more than a thousand fraudulent certificates revoked, the infrastructure seized, a lawsuit filed against Fox Tempest and the ransomware crew behind Rhysida that had been paying for the service. For the better part of a year, a valid Microsoft signature was a product you could check out of a basket.

The reflex is to file this under “another supply-chain breach”. It isn’t one. Nothing was breached. The system did exactly what it was built to do, for a paying customer who’d lied about who they were. That’s the part worth sitting with.

We have been here before

None of this is new. I went back and read a Trend Micro write-up on code-signing abuse from 2018, and it could have gone out last week. Stuxnet carried valid signatures from stolen Realtek and JMicron certificates. After Sony Pictures was ransacked in 2014, the attackers signed their Destover malware with Sony’s own keys. The names rotate and the methods rotate, the story doesn’t: there is a trust mark, there is money in wearing it, so somebody works out how to wear it.

I’ve started thinking of them as honest villains. Not honest with their victims, obviously. Honest in their incentives: a rational operator who finds a loophole and works it for everything it’s worth, because there’s profit on the other side of it. You can be as furious at Fox Tempest as you like, and I am, prosecution is exactly right, but the honest villain is never the surprise. The honest villain is a constant. Build a gate worth getting through and one will turn up to test it… every single time! That isn’t cynicism. It might be the most dependable law we’ve got.

A seal was never a promise

Here’s the thing we keep forgetting about a signature, and it goes back a very long way.

For most of human history, trust was blood. You trusted the people you were bound to, by birth and by the pacts of kinship and marriage that held a tribe together. It worked, for tens of thousands of years, because the circle was small enough that everyone you needed to trust was someone you knew, or someone known to someone you knew. The bond and the accountability were the same thing. Betray the tribe and the tribe knew precisely whose door to come to.

Then we outgrew the tribe. We started trading with strangers, across distances and across lifetimes, with people we would never meet and could never vouch for by blood. And blood stopped being enough. So we built stand-ins for it. The wax seal pressed into a letter. The signature at the foot of a contract. The notary, the stamp, the certificate. Each one a small portable proxy for the bond of kinship we’d walked away from.

But look at what a seal actually did. A seal on a letter never told you the letter was true. It told you whose seal it was, which is to say it told you who to hold responsible if the letter turned out to be a lie. It was never a guarantee of honesty. It was a marker of accountability. It pointed at a person.

A code signature is the newest seal in that very old line, and it does the same single job. It does not tell you the software is safe. It was never built to. It tells you who signed it, which is to say who to come to when it isn’t. That’s the whole of it. The padlock in the browser, the green tick on the installer, the verified signature on a binary, none of them ever meant “this is good”. They meant “here is a name attached to this”.

The part you cannot sell

Which brings me to the uncomfortable bit, the one that points back at me as much as at Redmond.

If a signature is a name accepting responsibility, then the power to sign and the blame for what you sign are the same object. You don’t get one without the other. You cannot hold out the seal and quietly keep back the accountability behind it, because the accountability is the only thing the seal was ever made of.

That is exactly what came apart here. Microsoft holds enormous power as a signing authority, the power to make code look trustworthy to millions of machines. With that power comes the plain duty to check who you’re handing it to. They took the first part and skimped on the second. They sold the seal and skipped the diligence the seal is supposed to stand for, and the honest villain simply walked through the gap between the two.

And I have to hold myself to that very same rule, or I’ve no business naming it. I sign my own releases. Every go-tool-base release carries an OpenPGP signature over its checksums, made by a key that never leaves AWS KMS, with the public key published off-platform so the release host can’t quietly swap it. I do it so the people who use my tools, and they’re a varied bunch, can trust that what they’re running genuinely came from me.

But trusting me was never really the question. Here’s the one that keeps me up: what happens the day I’m the weak point? If a contributor slips something rotten into my code, or one of my own AI agents writes something it shouldn’t, and my automation dutifully signs it, then the signature does its job perfectly and the whole house of cards comes down. My users can trust me all they like. The seal will still say “Matt”, and it will be telling the truth, and that is precisely the problem.

So the accountability can’t be delegated, and I don’t try to. Nothing reaches my releasable branches without my own eyes on it first. No merge request, no commit, not from a contributor and not from one of my agents. The vigilance is mine, singularly, and that’s deliberate, because the blame is mine too, singularly, and they’re the same coin. It’s an easier thing to say as a one-man outfit than it’ll be if that ever changes. But the principle doesn’t get cheaper at scale. It just gets harder to honour, which is a very different thing from optional.

What standing behind it looks like

If you want to see the duty done properly, look at how OpenAI handled the Axios incident earlier this year. A poisoned dependency had got at the material used to sign their macOS app. They had every reason to believe their certificates were fine. They revoked them anyway, and rebuilt, because the cost of being wrong was their users’ machines and their own name on the door. That’s what holding the power looks like. You act on the possibility of compromise, not the proof of it, because by the time you’ve got proof it’s already on somebody’s laptop.

It’s also why I’ve come to treat trust as layers rather than a single tick in a box. The release signature is one layer. Notarisation is another: every macOS binary go-tool-base ships is notarised by Apple as well, and has been for a good while. There will be more, and the Rust side of my tooling has its own signing coming very soon. None of them is the answer on its own. Each one is just another check, another name standing behind the thing. And the part I care about most, as someone who builds tools other people build on, is handing that same machinery to them, so they can stand behind their own releases for their own customers instead of trusting that someone, somewhere up the chain, did the diligence.

Vigilance, still

I’d love to tell you there’s a clean technical fix on the way, a trust system the honest villain can’t game… There isn’t. We’ve tried plenty, and some have aged better than others. (I’m looking at you, blockchain, the confident answer to a question almost nobody was asking!) Whatever we build next, the same loop runs: the mechanism gets more elaborate, so the attacks on it get more insidious, so the mechanism gets more elaborate again. We have spent the whole of human history moving trust further and further from the blood bond that used to ground it, and every step we take away from a person who will answer for it, the honest villain takes right alongside us.

Maybe there’s a version of the future where this stops… some distant place where nobody needs to cheat because nobody wants for anything, and greed has quietly retired. I’m not holding my breath. Until then the job is the plodding one it has always been: stay vigilant, act before you’re certain, and keep trust as close as you can to a person who’ll stand behind it. Because that is the only thing a seal ever was. Not a promise that the thing is good. A name, and someone willing to answer to it.

The interpreter we forgot to sandbox

Fri, 19 Jun 2026 00:00:00 +0000

I write a CLAUDE.md for every project I work on, and a small pile of other markdown files besides. They’re how I keep an AI agent on the rails: what the project is, what the conventions are, what it must never do. I lean on them heavily, I change them constantly, and… here’s the uncomfortable bit… I don’t always give a change to one the same hard look I’d give a change to the code. They look like notes. They feel like docs.

Somebody worked out that they’re not.

In May, a supply-chain campaign researchers named TrapDoor pushed 384 malicious versions of 34 packages across npm, PyPI and Crates.io. The bytes did the usual nasty things, hunting out SSH keys, AWS credentials, GitHub tokens and crypto wallets. The new trick was where it hid the instructions. The packages shipped poisoned .cursorrules and CLAUDE.md files, and the attackers also opened pull requests against real projects, LangChain, LangFlow, LlamaIndex, MetaGPT and OpenHands, under titles as innocent as “docs: add .cursorrules with dev standards and build verification”. The payload was a plain-English instruction telling your AI assistant to run a helpful-sounding “security scan” that quietly shipped your secrets to a stranger. And it was written into the file in zero-width Unicode, characters that render as nothing, so you wouldn’t see it even if you looked. Which, on a file marked “docs”, you probably didn’t.

Not a new attack, a new doorway

I want to be careful not to oversell this, because the loud version, “a terrifying new class of AI threat”, isn’t true. It’s a supply-chain attack, the same shape we’ve had for years on npm and PyPI: social engineering, plus a victim who didn’t quite do enough due diligence. I wrote a while back that nobody is coming to clean your supply chain, and nothing about TrapDoor changes that. The package is still the package.

What’s different, and worth the words, is where it goes off. A classic supply-chain payload waits for CI, or for production. This one detonates the moment you open the repository in your editor, on the one machine in the whole chain that nobody audits: your laptop.

Think about what sits on a developer’s machine. Tokens in environment variables. Cloud credentials. An SSH agent holding the keys to your git forge. A logged-in CLI for your package registry. And now an AI agent running with all of it, at your full permissions, and almost none of the guard-rails a CI runner gets. It’s the least sandboxed, most credentialed box you own, and we’ve just pointed an interpreter at it that will read and act on a file an attacker can write. Pop that one machine and you haven’t popped a machine, you’ve been handed the whole keyring and left alone in the building.

Markdown is a programming language now

Here’s the framing I keep coming back to, and I can’t unsee it now. A CLAUDE.md is to an AI agent exactly what a .py is to Python, a .js to Node, a .rb to Ruby. It is source code. The agent is the interpreter. You hand it a file of instructions and it executes them.

And I don’t say that as a complaint. That an agent will read a paragraph of plain English and just do it, no compiler, no ceremony, no forty lines of glue, is one of the more remarkable things to happen to this craft in my working life, and I lean on it every day. The catch is that the very thing that makes it marvellous, that it does what the instructions tell it, is the thing that makes a poisoned instruction file so dangerous. The power and the exposure are the same property.

The only real difference is that the language interpreters have spent decades growing rules to protect you: scopes, permissions, sandboxes, a standard library that asks before it does anything irreversible. The AI interpreter has almost none of that. It reads your prose and does what the prose says, with whatever access you happen to have, and the prose can come from anywhere. We’ve quietly built the most powerful interpreter in the stack, given it the fewest rules, and filed its source code under “documentation”.

You can’t just read it more carefully

The obvious answer is “review the file like code”, and it’s right, but TrapDoor is the reason it isn’t enough on its own. The instructions were written in zero-width Unicode. You can open the diff, read every visible word, approve it in good conscience, and merge something you were never able to see. “Docs: add dev standards” is precisely the pull request you nod through on a Friday afternoon.

So reading carefully is necessary and insufficient. You also need tooling that treats these files as executable: that flags invisible characters, diffs them as code, and refuses to let an agent act on a changed instruction file until a human has actually cleared it. I run a crude version of this already. In CI, if one of my prompt or rules files changes, no AI step is allowed to run until I’ve reviewed it by hand. It isn’t clever, but it closes the worst of the gap. Locally it’s much harder, and right now my real defence is that I’m the only contributor to most of my projects, so the audit is just me, usually noticing after the horse has bolted.

Signing won’t save you here

This is the part that stings, because I’ve spent a good chunk of this year building signing and provenance into my tools. A signature proves who published something. It says nothing about whether it’s safe. That was already true for poisoned-but-signed packages, and it lands twice as hard here: you can sign a release flawlessly, with a key the platform can’t forge, and still ship a CLAUDE.md inside it that tells the reader’s agent to rob them. A merged pull request is “signed” by the very act of merging, with perfect provenance, and the instruction in it is still hostile. Provenance is necessary. It was never sufficient, and it’s no defence at all against a payload made of sentences. A signature is only ever as good as the trust you place in the publisher.

So whose job is it?

Primarily, still ours. I said it in the supply-chain piece and I’ll stand on it: the responsibility sits with the developer doing the consuming, to pin, to read, to gate, to not run a stranger’s instructions with the keys to the kingdom in their pocket. And that gets harder, not easier, as we start consuming each other’s agent setups wholesale. The Claude skills marketplace and the things like it turn “borrow someone’s CLAUDE.md” into a one-click habit, and every one of those is unreviewed code from a stranger. Each skill needs vetting like the dependency it is.

But it isn’t only on us, and TrapDoor is the argument for better tooling. We have CVE databases, scanners and scorecards for packages, for all their flaws. We have nothing equivalent for an instruction file: no scoring, no advisory feed, no scanner that knows what a poisoned CLAUDE.md looks like. That’s a gap the ecosystem has to close, and it will, eventually. The catch is that the agent vendors will be slow about it. Sandboxing a feature people love precisely because it gets out of your way is a hard, unpopular, multi-quarter job, and I wouldn’t hold my breath.

The most dangerous machine is the one on your desk

Which is why I’m not waiting for them… and nor should you.

The most dangerous machine in your supply chain isn’t a build server or a registry. It’s the laptop you’re reading this on, and we’ve handed an AI the keys to it. The good news is that nearly everything you can do about that, you can do today, with nobody shipping you a feature first. Treat your CLAUDE.md and your rules files as source code, because they are: diff them, scan them for what you can’t see, and gate any agent run on a human clearing the change. Get your secrets out of plaintext environment variables and into something an opportunistic script can’t just read, which is exactly why go-tool-base keeps its credentials in the OS keychain. And vet a borrowed skill or rules file the way you’d vet any dependency, because that’s what it is.

None of that is new advice. It’s the same diligence the supply chain has always demanded. We just have to extend it to a file we’d decided was only documentation, running on an interpreter we forgot to sandbox.

The rung we sawed off

Wed, 17 Jun 2026 00:00:00 +0000

I was in a job interview yesterday, on the wrong side of the desk for once. After years of being the one asking the questions I’m having a look at what’s next, and somewhere in a long, wandering technical conversation the inevitable arrived: where do I think AI is going, and what does it mean for how we build software?

I gave my answer. You can probably guess most of it. The more interesting thing was the question I’ve started asking them back. Not the salary, not the stack. What is your actual position on AI, and how are you building a team out of both its human and its non-human parts? I ask the company and I ask the interviewer personally, because the two answers are rarely the same, and because I’ve decided I can’t work somewhere that hasn’t sat with the question properly.

Here is why it has become my litmus test.

The rung, and who’s standing on it

I wrote recently that the greybeards’ edge was never typing: agentic tools give a senior a boost because they have the judgement to steer and verify, and give a junior a drag because they don’t have it yet and the machine hands them more rope than they can hold. The cold incentive that falls out is to hire seniors and automate the juniors.

The data has since caught up with the worry. Entry-level software postings have fallen by something like 40% from their 2022 peak. The share of juniors and graduates in IT employment has dropped from roughly 15% to 7% in three years, and Stanford researchers tracking early-career workers in AI-exposed jobs found the youngest cohort down sharply from its peak. The numbers are genuinely grim, and plenty of people are putting it bluntly: the industry killed the junior on purpose.

That framing is half right, and I think it’s worth getting the other half right too.

It was never about efficiency. It was about cost.

We didn’t automate the junior because the work needed doing better. We did it because people are expensive. We need sleep, we draw a salary, and our thinking takes time and effort that a quarterly target can’t see the point of. AI got sold as round-the-clock labour with none of that overhead, and to a business that is an almost irresistible line on a spreadsheet. There’s a grim irony arriving, mind: the bills are starting to land, and the same conversations that hyped the cheap labour are now quietly working out that all those tokens aren’t cheap at all.

Step back, though, and none of this is new. Man finds a shortcut, man takes a shortcut. From the industrial revolution onward, every time we found a way to get more done with less human effort we took it, and the work reshaped itself around the new tools. We are still here, still employed, just doing different things than our great-grandparents did.

What is genuinely new is what we’re automating. Every technological advance before this one automated the machinery of the body, the muscle and sinew and bone. This is the first time we have automated thinking, and that is a modern marvel, something we should be proud of as a species. The problem isn’t the marvel. It’s the rate. AI is improving faster than we can adapt to it, and adaptation is the entire game.

So where does the blame sit? Not on one logo. No single company did this, however easy Meta or Google make it to point at the latest round of cuts. Society did, our collective and very human hunger to build bigger and faster. That makes it harder to fix, because there is no villain to regulate, only ourselves to out-think.

The bit that should frighten you

Cutting the junior intake isn’t a saving. It’s occupational suicide.

A junior is not cheap labour that AI happens to have made cheaper. A junior is a senior who hasn’t happened yet. Saw off the bottom rung and for a good while nothing bad happens… because you’ve still got your seniors holding everything up. Then the greybeards retire, and I have a cabin and a woodstove with my name on it for exactly that day, and the role that used to grow their replacements has been hollowed out for a decade, and there is simply nobody left who learned to tell when the machine is wrong. That isn’t a hiring problem. It’s an existential one, and you can’t fix it retroactively.

It starts before the first job, too. We teach primary-school children the basics of programming in this country, which is a wonderful thing, except the curriculum was written for a world without AI in the room, and by the time those children reach secondary school a good deal of it will be teaching a craft that has already moved on. We’re throttling the pipeline at both ends at once: hollowing out the entry-level job, and feeding it from a school system running a step behind.

It’s a split, not a collapse

The counterweight to the doom is that none of this is uniform, and the loudest version, “the junior is dead”, simply isn’t true. IBM just tripled its US entry-level hiring while most of the industry was cutting, and its HR chief said the quiet part out loud: AI can handle most of the routine entry-level tasks now, the work still needs a human, and the companies that double down on early-career hiring in this environment are the ones that win in three to five years. They didn’t keep the junior role as it was. They rewrote it, less boilerplate, more time spent with customers and supervising what the AI produced.

That is the shape of the thing. The juniors who are thriving in 2026 aren’t the fastest typists. They’re the ones building judgement, which is precisely the edge I argued was the senior’s real value all along. The market hasn’t stopped wanting juniors, it’s stopped wanting the version of the junior whose job was the work AI now does.

Day zero

So what does a junior actually look like now? I don’t know yet… and anyone telling you they’ve got it worked out is selling something. We are at day zero of this.

The junior gauntlet, the rite of passage every one of us runs to earn our stripes, isn’t going anywhere. Doing your time is a cold fact of the craft and it always will be. What changes is what the gauntlet contains, and that will keep changing, day one, day two, day five hundred and twelve. The only way we redefine it well is to put juniors and seniors on it together, with the AI in the room from the start instead of bolted on afterwards. Bring it closer to our people, and bring it earlier.

Open the floodgates, in other words. Let engineers of every creed and calibre in, and let them evolve with the machine, because that is the only way the symbiosis everyone keeps promising actually happens. Darwin’s line was survival of the fittest, and fitness here means adapting alongside the tool, not being spared by it. Choke off the flow of the very people who could do that adapting, and we don’t get fitter. We go extinct.

The end I’m holding

Which is the long way back to that interview. I keep asking the question, what is your real position on AI and how are you building a team of people and machines together, because the answer tells me whether a company is optimising for this quarter or for the survival of the craft. I want to work where it’s the second one, and I think any engineer sitting across that desk should be asking the same.

And it’s why, whatever desk I land at, there’s one thing I already know I’ll do. I don’t have the map. Nobody does. But every junior who works under me is going to get the chance to run the gauntlet, to grow into a senior, and to be in the room while we work out what the next gauntlet should even be. That isn’t charity. It’s the only sane investment any of us can make. The last properly useful thing my generation does, before we go and find our cabins, is make sure there’s somebody left to hand the thread to. I intend to be holding my end of it.

Everyone wants Rust's safety, nobody wants Rust

Sun, 14 Jun 2026 00:00:00 +0000

This spring, the better part of a million lines of Zig quietly became a million lines of Rust. Bun, the JavaScript runtime that was the showcase for “you don’t need a borrow checker, you need good tools and a steady hand”, looked at its own memory bugs and switched teams. Around 99.8% of its test suite passed on the rewritten code, a clutch of memory leaks closed in the move, and the maintainers said the quiet part out loud: the previous release would be the last one written in Zig.

It’s tempting to read that as Rust winning, hoist the flag, and move on. I don’t think that’s quite the story, and the more interesting one is happening everywhere else at the same time.

Because Bun is the exception that went all the way. Everyone else is trying to get the safety without the Rust, and watching how they’re going about it tells you more than one runtime’s heroic rewrite does.

What everyone’s actually after

A quick level-set, because not everyone reading this writes systems code daily. “Memory safety” is the property that a program can’t read or write memory it has no business touching: no using a value after you’ve freed it, no running off the end of an array. It sounds niche. It is, by most counts, behind something like 70% of serious security vulnerabilities, which is why governments and trillion-dollar companies suddenly care a great deal.

There are roughly three ways to get it. Rust uses a borrow checker: a compiler that flatly refuses to build your program unless it can prove, before it ever runs, that you never touch memory after you’re done with it. The price is that it argues with you the entire time you’re writing. The product is that an entire category of bug becomes literally unwriteable. Go (and most managed languages) uses a garbage collector: a runtime janitor that frees memory for you, so you mostly can’t get it wrong, at the cost of some overhead and a little control. And then there’s the old way, the one most code on Earth still uses: trust the developer to get it right, and add an escape hatch, usually a keyword like unsafe, for the bits where they promise they have.

The retrofit trend is everyone in that third camp trying to inch toward the first two without rewriting the world.

Rust didn’t invent any of this

Worth saying plainly, because the fan club rarely does: Rust invented almost none of it. The borrow checker is, by Rust’s own admission, Cyclone’s region-based memory management, from a safe-C experiment in the early 2000s, welded to affine types out of linear logic, ideas that predate Rust by decades. And it goes beyond the borrow checker. Rust’s exhaustive pattern matching came from ML and Haskell. Its “errors are values, and there is no null” approach, Result and Option, is Haskell’s Maybe and Either in work boots.

What Rust did, and did better than anyone before it, was taste and integration: it curated thirty-odd years of academic research into one coherent language and proved the ideas could carry real systems code rather than just research papers. That is the genuine USP, and it’s why the rest of the industry is now shopping from the same shelf. Pattern matching has landed in Python and Java, with a proposal in flight for C++26. Swift 6 shipped compile-time data-race safety, its Sendable machinery a close cousin of Rust’s Send and Sync. The borrow checker just gets the headlines because it’s the hardest bit to copy. Which makes the title almost too literal: everyone wants Rust’s safety, and they are quietly adopting its mechanisms one feature at a time.

Credit where it’s due: C# is doing this properly

The example that made me sit up is C#. In C# 16, Microsoft is redefining the unsafe keyword that’s been in the language since version one. Instead of unsafe marking a lump of syntax, it now marks a contract: a promise the compiler can’t verify and a human has to read and uphold, with documentation and static analysers nudging you to take it seriously. They’re even floating badges on NuGet packages to show which ones have opted in.

My first instinct with any retrofit is suspicion, because bolting safety onto a language after the fact has a long and miserable history, and an escape hatch that’s easy to reach is an escape hatch people will reach for the moment they’re in a hurry. But this isn’t a bolt-on. Taking the keyword that’s already there and giving it real teeth is working with the grain of the language instead of stapling a second safety system alongside the first. That’s honest engineering, and it deserves the credit. It genuinely raises the floor.

A contract is not a guarantee

Here’s where my enthusiasm meets its limit, and it’s a distinction I happen to have a lot of skin in.

C#’s redefined unsafe makes dangerous code visible and reviewable. That is a real improvement, and most teams would be better off for it. But visible and reviewable still means a human has to honour the promise. It’s a sign on the door. Rust’s equivalent is a wall: in rust-tool-base I put #![forbid(unsafe_code)] at the top of all eleven shipping crates, and forbid is not advice, it’s a refusal. The compiler will not build a crate that contains unsafe, full stop, and unlike its softer sibling deny, you can’t quietly switch it back off in a corner of the code where it’s inconvenient. The whole reason I use forbid and not deny is that I don’t trust future-me, in a hurry, not to reach for the hatch.

So when I look at the C# work I think: good, genuinely good, and they should take it further. A contract a human upholds is not the same kind of thing as a proof a compiler enforces, and the trend, if it’s serious, points at enforcement. Visible is better than invisible. Impossible is better than visible.

Discipline never scaled, and that’s not an insult

The objection I keep hearing, and that a younger me would have made, is that any language can be memory-safe if you’re just disciplined enough. And it’s true, in the way that any house can be tidy if you never get busy. In the before times we shipped memory-safe C with code review and valgrind and sheer bloody-mindedness, and it worked, sort of, at small scale.

It doesn’t scale, and Bun is the proof sitting on the table. That wasn’t a sloppy team learning the basics. It was a strong team, betting publicly on the discipline-and-good-tools model, and the memory bugs piled up anyway until the honest move was to let a compiler take the job. Discipline failing at scale isn’t a moral failure of the engineers. It’s just what happens when you ask humans to hold a thousand invariants in their heads across a million lines. Delegating that to a machine that never gets tired or rushed isn’t laziness. It’s the entire point of having compilers at all.

The part that changed my mind

I learned most of my Rust by building rust-tool-base with an AI alongside me, leaning on it to explain the borrow checker, suggest the idiomatic shape, and check my work. And somewhere in that I noticed the thing I now can’t unsee: the borrow checker is exactly as good a guardrail for the AI as it is for me.

A model, like a tired human, will write a confident use-after-free without blinking. In Rust it simply doesn’t compile, so the mistake never reaches me. What that does is move the whole error surface. The bugs that survive into review aren’t memory bugs or lifetime bugs or data races, the language has eaten those, they’re errors of logic: the code is safe and wrong. And logic is precisely where I want my attention, and the AI’s, because it’s the part a human has to own and the part the models are getting better at every month. (I split my AI work across a few providers for their different strengths, so this is not a pitch for anyone’s logo. The effect is the same whoever’s doing the typing.)

Which dissolves the one argument that ever really kept people out of Rust. “The borrow checker is too much friction” was always the case for the defence. But Bun’s million-line rewrite was done largely with an AI, because an AI is very good at paying a tax that is tedious and mechanical rather than creative. The friction is getting cheaper to pay at exactly the moment the guarantee is getting more valuable to have. In an AI-assisted world, a language that proves safety is worth more, not less, because it fences in the machine’s mistakes as firmly as your own.

None of this means rewrite everything in Rust

I want to be careful not to land somewhere smug, because most software does not need what Rust offers and pretending otherwise is how you end up rewriting a CRUD app nobody asked you to. Garbage collection is not a failure state. Go’s collector keeps getting meaningfully better, my own go-tool-base is GC’d top to bottom and I have never once wished it weren’t, and “safe-by-default with a GC” is the right answer for a vast amount of the work most of us do. The borrow checker is a price, and you should only pay it when the thing you’re buying, that last class of guarantee with no runtime cost, is something your stakes actually need.

What it comes down to

The question was never “is it as safe as Rust”. That framing turns everything into a loss for everyone who isn’t Rust, which is silly. The useful question is: what does your language make the default, and how hard does it make the escape hatch to reach? Go makes safety the default and charges you a GC. Rust makes it the default and charges you the borrow checker. C# is moving its default in the right direction and, for now, leaves the hatch as a promise rather than a wall.

Credit the retrofits, they are raising the floor for an enormous amount of code that was never going to be rewritten. Just don’t mistake the floor for the ceiling, or a contract a human signs for a guarantee a compiler keeps. Everyone wants Rust’s safety, and the interesting question, now that an AI will pay the toll for you, is who still has a reason not to want it.

Widen the lens past Rust, though, because that’s where the news gets genuinely good. We’re at a turn in how languages evolve. Compile-time rigour is spreading rather than retreating: borrow checking is reaching the Python family through Mojo, static typing long since conquered JavaScript, and even the managed languages are turning their escape hatches into something you have to argue with. More of our safety is quietly moving from “remember to” into “can’t not”. And the one thing that always made the strict path hard to start down, the friction, is being absorbed by an AI that will happily learn the rules so you can lean on them. I’ve been at this long enough to distrust a rosy forecast, but I’ll put my name to this one: the outlook for software that’s safe and secure by default has never looked better.

They switched it off while it was fixing my code

Sat, 13 Jun 2026 00:00:00 +0000

I woke up this morning to a one-line message from my own tooling:

Claude Fable 5 is currently unavailable. Learn more: https://www.anthropic.com/news/fable-mythos-access

I followed the link expecting a status page about a wobble in someone’s data centre. Instead it was Anthropic, explaining that the evening before, at 5:21pm Eastern, the US government had ordered them to suspend all access to Fable 5 and Mythos 5 on national security grounds. Globally. Every user. Their own staff included.

I’d spent the previous day with Fable doing one very specific thing: pointing it at my own codebase and asking it to read the code and fix the flaws it found. That, very nearly word for word, is the thing it has now been banned for.

Three days late to the only model that mattered

Fable came out on the 9th. I didn’t get to it properly until the 12th, which is the sort of timing I specialise in. By the time I sat down with it, I had about a day of real use before it vanished. One day to form a view on what people were calling the most capable coding model anyone had shipped. So treat everything below as the read of a man who got three days’ notice and used one of them.

What I had it doing was unsexy and exactly the kind of work I care about: a full security audit of go-tool-base, the same “leave the codebase better than you found it” pass I’d normally run myself. Find the flaws, then start fixing them.

And it was good. Genuinely good. It surfaced issues that previous passes with Opus had walked straight past, and in a couple of cases the flaw was sitting in code that Opus itself had written. There is something bracing about one model quietly marking another’s homework, and being right.

Good, but let’s not get carried away

Here is where I have to be fair, because the anger that came later is only worth anything if the praise before it is honest.

Fable is not magic. The class of bug it found is not some exotic thing only it can see. Plenty of models, from plenty of providers, are perfectly capable of reading a codebase and pulling out the same problems, and there is a mountain of evidence that they do, every day. Anthropic say as much themselves: the capability is “widely available from other models (including OpenAI’s GPT-5.5)” and “is used every day by the defenders who keep systems safe.” I’d already arrived at that conclusion from my own keyboard before I read their statement. Fable was excellent. It was not unique. Hold that thought, because the whole argument turns on it.

It kept slipping out of my hands

The other thing I learned in my one day is that having Fable and using Fable were not the same thing.

I set my main working thread to Fable and got on with it. What I didn’t know, because nothing on screen told me, is that partway through the evening it had quietly handed me back to Opus. The only reason I know now is that the session log records it in black and white:

2026-06-12T06:57:22Z {"type":"fallback","from":{"model":"claude-fable-5"},"to":{"model":"claude-opus-4-8"}}
2026-06-12T18:50:08Z {"type":"fallback","from":{"model":"claude-fable-5"},"to":{"model":"claude-opus-4-8"}}

A whole evening of work I thought I was doing on Fable was, in fact, Opus wearing Fable’s badge. The audit itself launched on the wrong model first; I only caught it because I happened to be watching the workflow panel, killed it, and relaunched it on Fable, where it chewed through an entire five-hour quota in about forty minutes, then spent $50 of usage credits I’d been saving in about five more. Even the run that worked was visibly flaky: of the 282 little agents that audit fanned out into, well over half failed outright and had to be retried.

Then, in the small hours, it started refusing entirely. My tooling caught the moment before I did:

Now failing instantly. Fable appears to be temporarily unavailable for subagents (the first three succeeded). The user explicitly required Fable, so I won’t downgrade… rather than silently switch models.

It managed three of the fixes before it went, each one green on tests, the race detector and the linter. Three real improvements to my code, written by Fable, sitting in my git history. The other three were finished by Opus, because by morning there was nothing left to finish them with.

Capable, and almost impossible to build on

There was a second wall, and I hit it before any of this, on the day Fable launched, when I tried to make it go-tool-base’s default model.

Most of what you build on top of a model isn’t a chat window. You need it to hand your code an answer in a fixed shape, the same fields in the same places every time, so the program on the other end can rely on what comes back. The usual way to guarantee that is to force the model’s hand: you don’t ask politely for the structure and hope, you require it, so a wrong-shaped answer fails outright instead of quietly slipping through.

Fable won’t be forced. Ask it to commit to a guaranteed structure and it declines, flat out. As I understand it the reasoning is a safety one: letting anyone compel a model into a precise, mandated output is itself a lever, a way to march it toward saying something it shouldn’t. Reasonable enough on paper. In practice it meant the most capable model I’d touched couldn’t drive the structured parts of my own tool, and by that first afternoon I’d quietly set the default back to Opus. It was the same refusal, I realised later, that had collapsed half of that audit’s agents.

And it is not a niche complaint. Guaranteed structure is a hard requirement for a vast swathe of what people are actually building on these models. Not everyone is making another Claude Code. Plenty of us are wiring models into systems that have to get a clean, predictable contract back every single time, and a model that reserves the right to freestyle the shape of its answer is one you simply cannot put in that seat.

The part they banned is my bread and butter

So let’s be precise about what got pulled, because the precision is the whole point.

Anthropic describe the government’s concern as “a narrow potential jailbreak, which essentially consists of asking the model to read a specific codebase and fix any software flaws.”

Read that again. Reading a codebase and fixing its flaws. That is not some dark-web misuse I have to strain to imagine. That is my bread and butter, the literal, boring, defensive job I had Fable doing in the open, on my own project, when the shutters came down.

And here is where that earlier point earns its keep. If the banned capability were unique to Fable, you could at least follow the logic, however much you disagreed. But it isn’t, and it isn’t even close: give Opus enough time, enough budget and a patient enough hand on the prompts, and it would get to most of the same findings in the end. Fable just did it more efficiently, a difference of degree, not of kind. So banning one company’s model, for something every competitor ships and every blue team already relies on, makes precisely nobody safer. The exploit-writers keep their tools. The defenders lose one of theirs.

When the thing you have banned is available everywhere else, the ban has stopped being about safety. It is theatre. And given who is currently in charge of the theatre, it has the distinct whiff of a knee-jerk reaction, dressed as a national security triumph, by people who do not appear to understand the tool they are confiscating.

Who I’m not angry at

I want to be careful where I point this, because it would be lazy to spray it around.

I’m not angry at Anthropic. They put Fable through more than a thousand hours of external testing, with US government agencies and the UK’s AI Safety Institute among the people kicking the tyres, before it ever reached me. They satisfied every requirement put in front of them, and when the order came they complied under protest while saying, plainly, that applying this standard across the board “would essentially halt all new model deployments for all frontier model providers.” I’m a daily Claude user and an advocate for the work, and I am not going to hang the US administration’s decision around the neck of the company that did the diligence and then got told to switch the lights off anyway.

I’ll allow them one small dig, and there was nothing quiet about it. Moving Fable behind a paywall on the 22nd was openly announced and planned well ahead, and the free window was never charity. It was a taster: a few days of the new addiction on the house, enough to hook the punters, before the price went up. That is a bit of a dick move, however neatly it tests in a spreadsheet. Moot now, mind, with no model left to charge for.

I’ll even grant the other side its strongest point. A government looking at agentic systems that can chain reconnaissance into working exploits has something real to be twitchy about. I get the worry. I just don’t accept that yanking one vendor’s model, for a thing every vendor does, is a coherent answer to it.

There is a grim irony in how we got here, and it loops back to something I wrote in the spring. When Anthropic first showed Mythos off, I called the fanfare what it looked like: a closed model sold on a press release, a result you couldn’t independently check, marketing until proven otherwise. Fable 5 was Anthropic finally answering that, handing the rest of us something we could actually test. But all those years of selling Mythos as too dangerous to let out were marketing too, and that half landed rather better than they can have wanted. The US administration appears to have swallowed it whole and pulled the lever. Anthropic have ended up a victim of their own hype, and the reaction that hype provoked is, there is no gentler word for it, ludicrous.

What it comes down to

The lesson I’m taking from my one day isn’t about how clever Fable was. It’s about how little that cleverness is worth if you can’t rely on the thing being there.

I couldn’t trust which model I was actually talking to from one hour to the next. I couldn’t trust it to stay up for a full overnight run. And it turns out I couldn’t trust it to still exist by the weekend. You cannot evaluate, depend on, or build a workflow around a model that gets silently swapped out one evening and switched off by the state a few days later. Capability was never the hard part. Availability is.

And underneath all of it sits the thing I keep coming back to. A classifier cannot tell a defender from an attacker, because the two of them type the same commands. It turns out a government export control can’t tell them apart either. The only thing that ever could is a human being, paying attention, who can be held responsible for the judgement. There wasn’t one of those anywhere in this loop. There was a letter, sent at 5:21pm, and by morning the best tool I had for keeping my own code honest was gone, with a polite link where it used to be.

Anything under an 8

Mon, 08 Jun 2026 00:00:00 +0000

I read the news about the National Vulnerability Database over a coffee that went cold while I sat there muttering at my phone. The short version: the NVD, the free public catalogue that quietly props up half the security tooling you and I run every day, is going under in slow motion. And the more I dug into why, the worse the taste in my mouth got.

I’m an open-source person. I think of myself as part of that community, and the NVD is one of those public goods the whole community leans on without ever really thinking about it. So my first reaction wasn’t clever or measured. It was a kick in the teeth.

The carcass and the vultures

Here’s where things actually are. In February 2024 the NVD had around 13,000 unprocessed vulnerabilities sitting in a queue waiting to be analysed. By the end of 2025 that backlog had passed 27,000. This April, NIST effectively admitted it can’t dig out: everything published before 1 March 2026 that hadn’t been enriched got swept into a bucket marked “Not Scheduled”, and going forward only the highest-risk entries get the full treatment. The rest you’re on your own with.

The reasons are grimly ordinary. The Cybersecurity and Infrastructure Security Agency stopped funding the programme in 2024. The enrichment contract lapsed that same February, and despite NIST having two years’ notice it needed a replacement, the database limped along understaffed until late November. And the volume kept climbing regardless: 48,185 CVEs in 2025, roughly 131 a day, with forecasts of the annual figure topping 60,000, getting on for ten times what it was a decade ago. No money, a fumbled handover, and a firehose. That’s the whole story.

The bit that turns my stomach is what comes next. When a free public good fails, the gap doesn’t stay empty. It gets filled, and it gets filled by people selling something. There are already commercial vulnerability databases that are better resourced and more current than the NVD, and the moment the free one is visibly on the floor, every one of them sees a market. Plenty of those subscriptions cost more in a year than a small open-source project will see in donations in its lifetime. So the catalogue the little projects relied on most is exactly the one about to be priced out of their reach. Vultures circling a carcass, and the carcass is something we all built on.

The number we never checked

And then I read the part that stopped me blaming everyone else.

A Department of Commerce Inspector General audit went through the NVD’s work and found that NIST’s own severity scores matched independent assessors only 12% of the time. Read that again. Not that NIST was wrong 88% of the time, that’s not quite what it says, but that two competent parties looking at the same vulnerability landed on the same severity barely one time in eight. The score was never an objective fact handed down from on high. It was always an estimate, a judgement call, the kind of thing reasonable people disagree about most of the time.

Which is awkward, because I have spent years treating that number as gospel. And I know I’m not alone, because I’ve watched whole engineering organisations do the same thing in writing. More than one large employer I’ve had bakes the CVSS score straight into policy: anything scored 8 or above blocks the build and gets a meeting, and anything under an 8 goes through at an engineer’s discretion. When time is money, and it always is in those places, “it’s only a 6.4, ship it” is the easiest decision you’ll make all week. I’ve made it. I’ve made it without opening the advisory, without checking whether the vulnerable code path was even reachable in what we’d built, on the strength of a single number that, it turns out, two experts wouldn’t have agreed on anyway.

So before I get cross about the funding, I have to sit with my own part in this. We took a contestable estimate and bolted it to the door as a gatekeeper. We turned “a rough signal worth a closer look” into “the closer look”, and then we stopped looking. The database didn’t promise us a safety net. We just decided it was one and stopped checking underneath.

Don’t blame the robots for this one

There’s an easy villain on offer here, and I want to wave you off it. It would be tidy to say AI did this, that the flood drowning the NVD is a tide of machine-generated slop, the same dynamic I wrote about when curl’s bug bounty buckled under unverifiable reports. It’s tempting, it’s topical, and it’s mostly wrong.

The people who actually crunch the numbers are clear that the surge is largely legitimate growth. There are now more than 484 CVE Numbering Authorities, far more organisations reporting far more bugs far more thoroughly than they did a decade ago. That isn’t a quality collapse, it’s the system working as designed and simply getting bigger than its funding. Pinning it on AI would be scapegoating, and scapegoating the robots for an underfunding-and-mismanagement problem is just a way of letting the people who defunded it off the hook.

None of which means AI gets a free pass. It just isn’t the arsonist. The same machine-assisted discovery tools that found genuine bugs are also forecast to push CVE volumes higher still, and yes, one of the tools named in that forecast is the very one I poked fun at over curl. AI is an accelerant on a fire that was already burning for thoroughly human reasons. It’s a beat in this story, not the spine.

The version I’m betting on

Where does this leave the working engineer? In a harder spot than before, because the easy answer stopped being easy. My usual line, the one I keep ending these pieces on, is that the diligence is the job: pin, lock, audit, and read the actual advisory instead of trusting a number. All of that still holds. But it just got more expensive, because the data underneath the diligence is thinner and, as it turns out, was shakier than we let ourselves believe.

So I’m not going to pretend there’s a clean fix. This problem won’t solve itself, and it won’t be solved by any one of us. It needs all of us to actually support the services we depend on, with money, with contributions, with attention, so the public goods that underpin our craft are still standing in ten years. That’s the dull, grown-up part.

But I’ll end this one looking up rather than down, because for once I can. I think the next few years bend towards safer software almost in spite of us. Modern languages are quietly closing off whole categories of vulnerability at the source: every memory-safety bug that a borrow checker refuses to compile is one that never reaches a database to be mis-scored in the first place, which is rather the point of building a framework that contains no unsafe. Used with proper guidance instead of left to spew slop, AI can be a genuine help finding and triaging the things that do slip through. And the junior engineers we keep sawing off the bottom rung are exactly the people who, mentored by the greybeards before they retire, could build the next generation of vulnerability identification that the current model clearly can’t sustain.

As for the vultures… it’s a coin toss. A lot of firms will look at the NVD on its back and see a land grab. I’d love to be proved an optimist and watch at least one of them stand tall, take all that better-resourced data and open it to open-source projects for nothing, because it’s the right thing to do and because the whole industry drinks from that well. One of them doing the decent thing would be worth more than all the press releases about responsible AI put together.

The catalogue is wobbling. The number was never as solid as we treated it. Neither of those is the end of the world, as long as we stop outsourcing our judgement to a free service we never funded and never checked, and start paying, in every sense, for the foundations we build on. Boring, unfashionable, and the only thing that ever works. I think we’re up to it.

The consent you can't ask for

Sat, 06 Jun 2026 00:00:00 +0000

There’s a comfortable story going round about telemetry, and it goes like this. There are two kinds. There’s the creepy kind, the usage data a vendor harvests to work out who you are and what you do, and that kind needs your permission. And there’s the innocent kind, the operational data a service emits so the people running it can keep it up, and that kind is just plumbing, nobody’s business, no permission required. Two neat boxes, and only one of them has a lock on it.

I don’t think the boxes are that neat. And I think a fair few of the people drawing them that way know it.

Because there’s no clean line where operational data stops being personal. A web service’s logs carry IP addresses. Its traces carry the path you walked through the system, the ids of the things you touched, sometimes the very fields you sent. Point at almost any of it and a GDPR lawyer will cheerfully tell you it can be personal data, and that the law doesn’t much care whether you filed it under “analytics” or “observability”. The word you picked to describe the data was never the thing that decided whether it was personal. The data decided that, and a lot of operational data is personal.

So if you can’t hide behind the box marked “just plumbing”, what do you actually do?

Where I’m coming from

I should say up front that I haven’t always been this relaxed about it. I spent a good few years in righteous fury at every tool that phoned home, every “we collect anonymous telemetry to improve the product” I never agreed to. Then I started building the tools, and I needed the data myself: the kind that tells you which features people actually use and which command falls over on first run, the kind that lets you make the next decision with something better than a hunch. And it softened me. Not into thinking it’s fine to take it without asking. Into understanding why everyone wants to.

What the fury left me with, the one thing I’ve never talked myself out of, is being pro-choice. Not pro-collection, not anti. Pro-choice. Any tool I ask another person to run will never quietly opt them into sending me a thing. It asks. On first run it makes its case, says what it wants and why, and lets them say no and mean it. I’ll try hard to win the yes, because the data is genuinely useful and a tool gets better when people share it. But I won’t presume it. The choice is theirs, and the prompt exists so they actually get to make it.

The trouble with a service

Which is a lovely principle right up until you build a web service. Because who, exactly, do you prompt? An API doesn’t have a first run. It has a thousand callers a second, none of them sat at a terminal waiting to tick a box. You can’t show a consent dialog to a webhook. The answer the industry reaches for is “consent is implied by use”, and… maybe. It’s a grey area, full stop. Implied consent is the same hand-wave that gave us the cookie banner, the thing we all click through without reading. I’m not going to stand here and call it clean.

But there’s a version of the principle that survives the grey, and it’s the one I built the framework around. Consent belongs to whoever can actually give it. For a command-line tool, that’s the person running it, so you ask them. For a web service, the person who can give it was never the end user at all, because you can’t reach them. It’s the engineer who deploys the thing. They know what their service collects, who its users are, which law they sit under, whether they owe anyone a privacy notice. They are the one party in the whole chain who can make the call with any of the facts in front of them. So that’s where the choice goes.

Which is why, in go-tool-base, the web-service telemetry is a switch. On or off, the engineer’s hand on it, collecting only what you need to keep the lights on by default. There’s no consent prompt, not because consent stopped mattering, but because there’s nobody in the loop I could ask. The accountability sits with the person who can hold it.

The part I’ll own

I’m pro-choice on telemetry, which is exactly why I built a way to switch it off and a way to force it on. Because for a web service the person holding the choice was never the end user, it’s the engineer who ships it, and “pro-choice” has to mean putting the switch in their hand, not pretending a popup would have meant anything.

That force-it-on part is the bit I’ll answer for. I built a way for a tool author to bypass the first-run prompt entirely and bake the consent in. There’s a real use case behind it, the enterprise tool deployed under a policy where collection is contractual rather than optional. But I also know I’ve handed someone a way to take the choice away, and I did it deliberately. Rightly or wrongly, I made the framework flexible enough to do the wrong thing, and the line I care about is now only as safe as the judgement of whoever picks it up.

That’s the uncomfortable place this lands, and I’ve come to think it’s the true one. A framework can put the choice in the right hands. It cannot make the right choice. I can build the prompt, build the switch, set the defaults to the modest thing, and after that I have to trust the engineer on the other side to use it justly and with some wisdom, because there is nothing further down the stack that makes them. When the blame gets shared out, and it’s always shared, a piece of it has my name on it, for every escape hatch I left in.

I’m at peace with that, mostly. Not because the grey went away, but because the alternative, pretending there’s a clean line and that “operational” means “not your problem”, is the real dodge. I’d rather say it plainly: this data can be personal, the consent is real even when there’s nobody to ask, and the most a tool can do is hand the decision to the person who can make it, and trust them with it.

Nobody's coming to clean your supply chain

Fri, 29 May 2026 00:00:00 +0000

Pick a week in May 2026 and there’s a supply-chain attack in it. On the 11th someone owned TanStack’s CI and pushed 84 poisoned package versions in six minutes. On the 14th, three malicious versions of node-ipc, a library with ten million weekly downloads, shipped an identical credential-stealer. Days later it was @antv, cascading down into a charting library a million projects depend on. Each one runs its payload the moment you install it, then quietly tries to publish itself from your machine.

You’ve heard this story so many times the outrage has worn smooth. So let me point at the one detail that should still make you sit up: the TanStack packages carried valid signing provenance. Real attestation, pointing at the real pipeline. The seal was genuine. The contents were poison.

A signature proves the sender, not the contents

I’ve spent a fair while building integrity and signing into my own tools, so this one stings a little. Signing is a trust mechanism, and a good one. It’s how I prove a binary you downloaded was built and published by me and nobody else, and in a world with this many ways to be impersonated, that matters more than ever.

But TanStack shows the limit in neon. If the pipeline doing the signing is itself compromised, the signature is still perfectly valid. It just now certifies a lie. Provenance answers “did this come from where it claims?” It does not answer “is what’s inside safe?”, and we have spent a few years quietly letting people treat those as the same question.

They aren’t. A signature is a promise about the sender. The thing we actually need is a promise about the contents: that whoever signed has done the diligence, the testing, the vetting, to vouch for what’s in the tin. A signature without that behind it isn’t a safety certificate. It’s a tamper-proof seal on a poisoned jar.

It was never just npm

It’s tempting to file all this under “npm being npm”. Resist it, because it’s a category error. The thing that makes these attacks work, a stranger’s code running on your machine as a side effect of installing or building, is not an npm bug. It’s a near-universal design choice.

Ecosystem	Untrusted code on install/build?	Mechanism
npm (JS)	Yes, at install (dependencies too)	`pre`/`postinstall` scripts
PyPI (Python)	sdist yes, wheel no	`setup.py`; wheels forbid hooks
RubyGems	Yes, at install	native-extension build (`extconf.rb`)
cargo (Rust)	Yes, at build	`build.rs` and proc-macros
Composer (PHP)	Dependencies: no	only the root project’s scripts run, by design
Maven/Gradle (JVM)	Yes, at build	build scripts and plugins
NuGet (.NET)	Modern: no	`install.ps1`, legacy format only
Go (modules)	No	no install or build hooks

(Lifecycle hooks across ecosystems are catalogued at ecosyste.ms if you want the receipts.)

Read that and the lesson isn’t “npm is uniquely bad”, it’s “this was a choice, and several ecosystems chose differently”. Go runs no install or build hooks at all. PHP’s Composer flatly refuses to run a dependency’s scripts, only your own project’s. Python’s wheel format forbids install hooks. The hook was never inevitable.

And yes, that includes my own back yard. cargo’s build.rs is the same gun fired at build time instead of install time, and the TrapDoor campaign used exactly that to rifle through keystores on crates.io this year. Rust isn’t safe here. It’s a smaller, better-policed target, which is a different thing, and I’d rather say so than pretend one of my favourite languages is above it.

No registry can hand you a clean package

Here’s the uncomfortable core. Not one of these registries can guarantee the package you pull is clean. They can sign it, scan it, attest its origin and mandate 2FA on maintainers, and they should do all of that. But none of it is a guarantee, because the failure modes are endless and attackers keep finding new ones. A maintainer account gets phished. A CI token leaks. A trusted contributor turns. A dependency four levels down quietly changes hands.

So the onus lands, and will keep landing for a good while yet, on the consuming engineer. That isn’t a comfortable answer or a clever one. It’s the true one.

And it’s a genuinely rotten spot to stand in, because the advice contradicts itself. Patch slowly and you’re scolded for running known-vulnerable dependencies. Patch the instant a release drops and you’ve skipped the bedding-in that might have caught a poisoned one. There’s no setting on that dial that’s safe, only trade-offs you have to actually think about. Add CI that leaks credentials it never needed, and a dependency tree thousands of strangers deep, and you can see why there’s no single villain to blame and no single switch to flip.

The boring discipline that actually helps

What’s left isn’t heroic, it’s hygiene, and it’s the boring, necessary stuff I keep banging on about. Pin your CI actions to commit SHAs so a moved tag can’t swap code under you. Commit your lockfiles. Run the auditors, cargo-audit, pip-audit, govulncheck, npm audit, or Google’s cross-ecosystem OSV-Scanner, on every build. Gate the dependency tree and give every exception an expiry date so “we’ll deal with it later” can’t quietly become “never”. Keep the tree small: every crate you don’t add is a stranger you don’t have to trust.

None of that is a solution. All of it is diligence, and diligence is the only thing that was ever going to stand behind the signature. When I sign a release, the cryptography is the easy part. The promise underneath it, that I pinned, locked, audited, vetted and tested before I put my name on it, is the part worth anything. That’s the contract. The signature is just how I countersign it.

The encouraging note is that the structural defences exist and they work. Go’s checksum database and its refusal to run hooks. Composer declining to trust a dependency’s scripts. Python’s wheels. cargo-vet and cargo-deny giving you somewhere to record human judgement at scale. More ecosystems should steal these shamelessly, because a registry that makes the safe path the default does the working engineer a far bigger favour than one that leaves it all to discipline.

The same shape, a third time

If this feels familiar, it should. I wrote recently about a bug bounty that collapsed because the cost of slop was deferred, and about a junior pipeline being cut because the bill lands years later. Supply-chain security is the same shape a third time. The convenience is now, the catastrophe is later, and the only thing standing in the gap is an engineer paying attention, doing the dull work, refusing to be rushed into trusting something they haven’t checked.

There is no clean package waiting to be found, no registry about to solve this for us, no signature that means “safe” all on its own. There’s the diligence you do before you put your name to something, and the judgement to know when an install is asking you to trust more than you should. For a good while yet, that is the whole job. Boring, unfashionable, and the only thing that works.

The greybeards' edge was never typing

Wed, 27 May 2026 00:00:00 +0000

I have a retirement plan, and it is gloriously low-tech. A cabin, some trees, a woodstove, and a firm rule that no wifi symbol ever appears within a mile of me again. I think about it more than is probably healthy.

There’s a snag, though, and it’s the same one the whole industry is currently pretending it can’t see. For me to vanish into the woods, somebody has to be able to do my job after I’ve gone. And right now, collectively, we are working very hard to make sure nobody can.

The boost, and the drag

I wrote the other day about how AI made producing plausible work nearly free while verifying it stays expensive and human. Point that same lens at a team and something uncomfortable falls out. It isn’t mine; it belongs to Mark Russinovich and Scott Hanselman of Microsoft, who laid it out in Communications of the ACM: agentic coding tools give a senior engineer an AI boost, multiplying what they ship, because a senior has the judgement to steer and verify the output. The same tools give an early-career engineer an AI drag, because they don’t have that judgement yet, and the machine hands them far more rope than they can hold.

The cold incentive writes itself, and they name it: hire seniors, automate juniors. It isn’t hypothetical, either. Meta cut 8,000 roles last week, in a round the Times filed under mounting AI casualties. For any single quarter you care to look at, the maths is impeccable.

The bill is just deferred

Here’s the line the spreadsheet leaves off. The grindy work a junior used to cut their teeth on, the small fixes, the boring migrations, the read-the-stack-trace-and-figure-it-out, is exactly the work AI now does. So the proving ground is gone. And the entry-level seats where they’d have stood on it are the ones being cut. Squeezed from both ends at once: no reps, and nowhere to take them.

Russinovich and Hanselman put the consequence plainly. Without early-career hiring the talent pipeline collapses, and you arrive at a future with no next generation of experienced engineers. The seniors you’ll be desperate for in 2032 are the juniors you declined to train in 2026. The bill doesn’t vanish. It just falls due long after the people who cut the cheque have moved on.

How to manufacture a world of AI slop

I named the last piece for its villain; let me name this one’s too. Raise a generation that can produce with AI but was never taught to validate, and here is what you get: people shipping machine-built products at speed with no instinct for where the output is quietly wrong, because they never had to be wrong the slow way first. Software nobody genuinely understands, human-written and AI-written alike, and a steady leak of trust out of all of it.

That isn’t a productivity problem. That’s a world of AI slop, and not in one project’s inbox this time but everywhere at once. We’d have automated our way clean out of the one job AI cannot do for us: knowing when not to trust the machine.

It’s a choice, and it’s yours

Andrew Murphy put it with more bite than I’d quite dare: AI didn’t kill your junior pipeline, you did. He’s right. This isn’t weather. Nobody is making you do it. It’s a decision, taken quarter by quarter, and a decision is a thing you can take differently.

The fix isn’t complicated, it’s just unfashionable. Keep hiring early-career engineers. Say out loud that they cost you capacity at first, and treat their growth as an actual goal rather than something meant to happen by osmosis. Russinovich and Hanselman call it preceptorship at scale: senior mentorship, deliberately structured, turning the ordinary day’s work into teachable moments.

And the proving ground can be rebuilt, just not where it stood. If AI does the writing now, the apprenticeship moves to the reviewing. Put juniors in the loop on the machine’s output and have them hunt for the subtle wrongness, the way a scanner is an argument, not an order. That’s how judgement gets built now: not by grinding out the work, but by verifying it. Which, as luck would have it, is the single most valuable thing anyone on your team can learn to do.

The part that’s on the greybeards

This is where I stop letting the companies wear all the blame, because some of it is mine, and yours. Verification is a craft, and crafts pass from person to person or not at all. I know where every one of my own AI misfires comes from: I gave it too little context, or too much rope, and didn’t check the result closely enough. The tool rarely went rogue. The gap was always my diligence. That’s not a confession, it’s the curriculum, and it’s precisely the judgement a junior can only earn by sitting in the loop beside someone who has already made those mistakes.

So the senior engineer’s job has quietly changed underneath us. It was never really the typing. It was knowing when something is off, and what the customer actually needs, and now it is also handing that on, deliberately, while there’s still time to. Mentor and guardian first; fastest prompt in the room a distant second.

The ladder you’re standing on

There will always be something AI can’t do well enough, and for a good while yet it’s the thing that matters most: being the accountable human who genuinely understands what’s needed and can be held to it when it goes wrong. A simulation can be enormously convincing. It cannot be responsible.

Which brings me back to my cabin. I do want it one day, the trees and the woodstove and the blissful disconnection. But I only get to go if the work outlives me, and the work only outlives me if the people do. So the last useful thing my generation does, before we shuffle off to find our trees, isn’t shipping a little more code. It’s making sure there’s somebody left who can tell when the machine is wrong. Pull the ladder up behind us and there’ll be nobody to notice the rot, and no cabin quiet enough to make that sit right.

AI didn't kill curl's bug bounty. The bounty did.

Tue, 26 May 2026 00:00:00 +0000

In January, Daniel Stenberg shut down curl’s bug bounty. The headlines wrote themselves, and they all said the same thing: AI killed it. A flood of machine-generated slop drowned the maintainers, so they pulled the plug.

That’s true, as far as it goes. It’s also the wrong lesson, and the right one is sitting in plain sight in the same project, in the same few months.

Volume without validation is the attack

curl had run its bounty since April 2019. Over its life it paid out more than $100,000 for 87 genuine vulnerabilities, a thoroughly good return for one of the most depended-on pieces of software on the planet. Then the reports stopped being reports. The confirmation rate, the share of submissions that turned out to be a real bug, had historically sat north of 15%. By 2025 it was below 5%. Fewer than one in twenty submissions were worth anything, and the rest still had to be read.

That last part is the whole problem. A bogus report doesn’t announce itself. Someone has to open it, take it seriously, try to reproduce it, and work out that it’s nonsense, and that someone is a human being with a finite number of hours and a project to run. Stenberg put it plainly: the slop “take[s] a serious mental toll to manage and sometimes also a long time to debunk.” The submitter spends seconds. The maintainer spends an afternoon. Do that at volume and it stops being noise and becomes an attack, a denial-of-service aimed not at curl’s servers but at its maintainers’ attention. No exploit required. Just plausibility, in bulk.

The bounty was the accelerant, not the AI

So far this is the story everyone tells. Here’s where I get off the bus.

The instinct is to blame the AI for the slop. But look at what a bounty actually is. It’s a cash prize, and curl’s was priced for the thing it wanted: the hours and the judgement a skilled human pours into finding a real flaw. That pricing made complete sense right up until the cost of producing something that looked like a finding collapsed to nearly nothing.

That’s what AI changed. Not the supply of bugs. The supply of plausible-looking bug reports. Put a cash prize on “looks like a finding”, then make “looks like a finding” free to generate, and you haven’t got a bug bounty any more. You’ve got a slot machine. Stenberg said he’d started to sense “a bad faith attitude” in the reports, and of course he had. The incentive was openly inviting it.

So the death spiral was structural, not bad luck. The moment generating plausible reports went free, any cash bounty became a magnet for spray-and-pray, and the only open questions were how fast it would rot and whether you’d close the programme or just let the rewards quietly wither. The AI was the match. The bounty was the petrol. We have been pointing at the wrong one.

The proof: curl turned around and hired the AI

If AI were really the villain here, you’d expect curl to have slammed the door on it. It did the opposite.

In the same stretch, by AISLE’s own account, an AI security platform contributed 24 pull requests to curl, five of which earned CVEs, and the project now runs it internally for continuous review. The same tooling reportedly found all twelve zero-days in an OpenSSL release in late January. (Both of those are the tool-makers’ and a third party’s numbers rather than curl’s audited figures, so weigh them as such. But curl adopting the thing isn’t a claim. It’s a decision.)

Sit with the shape of that. curl shut down strangers being paid for AI-shaped noise, and in the same breath put AI to work as a tool its own maintainers drive. The two moves look contradictory only if you think “AI” is a single thing with a single verdict attached. It isn’t. Pointed at the problem by people accountable for the result, with no prize to farm, it found real bugs. Dangled in front of anonymous strangers chasing a payout, it produced sand.

The tell is which AI curl kept, and which it mocked

Stenberg drew that line about as sharply as a person can. When Anthropic put its security model, Mythos, in front of curl this spring, it scanned 176,000 lines of C and surfaced a single flaw, and Stenberg called the surrounding fanfare the greatest marketing stunt he’d seen. Same maintainer. Adopts one AI, rubbishes another.

The deciding factor was never whether the thing was AI. Both were. It was whether the output survived a human checking it, and whether you could check it at all. AISLE handed over pull requests and CVEs you could read and merge. Mythos arrived as a closed model and a press release, which is to say a claim the community has no way to independently test.

My bias, up front, because it runs the opposite way to what you’d expect from someone writing this: I’m a paying Claude subscriber and I lean on Anthropic’s models every working day, the one behind the spadework for this post included. I’m an advocate, not a sceptic, and AI genuinely has its place. That is exactly why the Mythos fanfare grates. Overselling a closed model to get out ahead of the competition, when the one test the public got to see turned up a single bug, is the sort of thing that chips away at trust in all of it. A result you can’t verify is marketing until proven otherwise, whoever’s logo is on the slide, and I’d rather the tools I depend on didn’t stoop to it.

The cheap half and the expensive half

Pull back from curl for a moment, because the lesson isn’t really about bounties at all. Anyone who works with these tools every day knows the same thing: when they go wrong, it’s rarely the model running off on its own. It’s the context it wasn’t given, the rope it was handed, the output nobody checked closely enough. The failure sits on the human side of the keyboard, at the one step that’s easiest to skip, which is verification.

That’s the pattern curl hit at the scale of an ecosystem. AI made one thing nearly free: producing work that looks right. It did not make the other thing a penny cheaper: confirming the work is right. That cost still falls, in full, on a person. (A scanner, I’ve argued before, is an argument, not an order; the same goes double for a model.) The bounty’s fatal mistake was paying for the cheap half and quietly assuming it had bought the expensive one. The same trap waits in code review, in hiring, in CVs read by machines, but that’s a bigger argument for another post.

Pouring sand into the machine

curl didn’t capitulate to AI, whatever the headlines decided. It stopped paying for the worthless half and started using the valuable half, and it had the discernment to tell a useful tool from a press release while it did so.

The bounty wasn’t a casualty of artificial intelligence. It was a structure that, the instant plausible output became free, could only fill with sand. Stenberg said he hopes closing it stops “more people pouring sand into the machine.” Reading the last year of his inbox, I think he’ll get his wish. The sand was only ever there because somebody left a bucket of money beside the funnel.