<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>AI on PHP Boy Scout</title><link>https://phpboyscout.uk/tags/ai/</link><description>Recent content in AI on PHP Boy Scout</description><generator>Hugo -- gohugo.io</generator><language>en-gb</language><copyright>Matt Cockayne</copyright><lastBuildDate>Fri, 03 Jul 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://phpboyscout.uk/tags/ai/index.xml" rel="self" type="application/rss+xml"/><item><title>I filed a feature request into my own framework</title><link>https://phpboyscout.uk/a-feature-request-into-my-own-framework/</link><pubDate>Fri, 03 Jul 2026 00:00:00 +0000</pubDate><guid>https://phpboyscout.uk/a-feature-request-into-my-own-framework/</guid><description>&lt;img src="https://phpboyscout.uk/a-feature-request-into-my-own-framework/cover-a-feature-request-into-my-own-framework.png" alt="Featured image of post I filed a feature request into my own framework" /&gt;&lt;p&gt;I&amp;rsquo;m building a tool called keryx, and the part of it that matters here is its studio: a browser app where the work happens, which saves everything you do into a git repository behind the scenes, the way a developer&amp;rsquo;s project lives in git with a history you can step back through.&lt;/p&gt;
&lt;p&gt;I wanted that repository to be able to live entirely in memory. Cloned, edited, committed and pushed without ever writing a working copy out to a disk, for the times when you can&amp;rsquo;t, or would rather not, leave a checkout sitting around on the machine. It sounds exotic, but it&amp;rsquo;s something git libraries genuinely support, and it&amp;rsquo;s exactly what a browser studio running on a server somewhere wants.&lt;/p&gt;
&lt;p&gt;Getting it working needed one small, awkward piece of plumbing in the middle. And a few lines into writing that piece, I stopped, because I realised I was writing it in the wrong repository.&lt;/p&gt;
&lt;h2 id="the-bridge-i-was-about-to-vendor"&gt;The bridge I was about to vendor
&lt;/h2&gt;&lt;p&gt;Here&amp;rsquo;s the awkward bit. All of keryx&amp;rsquo;s file handling goes through &lt;code&gt;afero&lt;/code&gt;, the standard filesystem interface in the Go world, the thing you hand your code so it neither knows nor cares whether it&amp;rsquo;s talking to a real disk, a test fake, or memory. It&amp;rsquo;s the interface go-tool-base hands you for filesystem work. But an in-memory git repository, the kind &lt;a class="link" href="https://github.com/go-git/go-git" target="_blank" rel="noopener"
 &gt;go-git&lt;/a&gt; gives you with its &lt;code&gt;memfs&lt;/code&gt;, doesn&amp;rsquo;t speak &lt;code&gt;afero&lt;/code&gt;. It speaks go-billy&amp;rsquo;s filesystem interface instead. Two perfectly good filesystem abstractions, and a worktree on the wrong side of the gap from all my code.&lt;/p&gt;
&lt;p&gt;What I needed was an adapter: a bridge that makes a billy filesystem look like an &lt;code&gt;afero.Fs&lt;/code&gt;, so the studio&amp;rsquo;s existing file handlers work unchanged over a repo that lives entirely in RAM. Twenty minutes of work, maybe. The obvious move was to write it inside keryx and get on with my afternoon.&lt;/p&gt;
&lt;p&gt;And that&amp;rsquo;s the move I caught myself making. Because a billy-to-afero bridge is not a keryx thing. It&amp;rsquo;s not even a studio thing. It&amp;rsquo;s a &lt;em&gt;general&lt;/em&gt; capability that any tool built on go-tool-base might want the moment it touches git. Vendor it in keryx and I&amp;rsquo;ve buried a reusable bit of plumbing inside one consumer, where it will drift away from the framework and get reinvented, slightly differently, in the next tool I build that needs it.&lt;/p&gt;
&lt;p&gt;The bridge belonged in the framework. So that&amp;rsquo;s where I put it.&lt;/p&gt;
&lt;h2 id="a-feature-request-against-myself"&gt;A feature request, against myself
&lt;/h2&gt;&lt;p&gt;I wrote the need up properly. Not a code comment, not a mental note, but an actual feature request, with a reference implementation sketched out, dropped into the go-tool-base repository as a document for the framework to act on.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s something slightly absurd about filing a feature request against your own project. The author and the customer are the same person. But that&amp;rsquo;s exactly what gives it its value. The most useful design input a framework gets is a real consumer hitting a real wall, and for once I was both: the person who maintains go-tool-base, and the person downstream of it who&amp;rsquo;d just discovered something it couldn&amp;rsquo;t yet do. The request wasn&amp;rsquo;t hypothetical or &amp;ldquo;wouldn&amp;rsquo;t it be nice&amp;rdquo;. It was &amp;ldquo;I am stuck on this right now, here is precisely what it can&amp;rsquo;t do yet.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;What came out the other side is &lt;code&gt;pkg/vcs/repo/aferobilly&lt;/code&gt;, a first-class part of the framework as of v0.22.0. Its own description is the clearest summary of what it is:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;// Package aferobilly adapts a go-billy/v5 Filesystem to an afero.Fs. It is the&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;// pure, reusable bridge behind pkg/vcs/repo&amp;#39;s worktree-as-afero accessors, but&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;// works for any billy filesystem (memfs, osfs, chroot).&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Alongside it, the worktree itself grew the accessors that hand you that view: &lt;code&gt;WorkFS()&lt;/code&gt; for a live afero handle, and &lt;code&gt;WithWorkFS()&lt;/code&gt; for an atomic sequence (&lt;a class="link" href="https://gitlab.com/phpboyscout/go-tool-base/-/blob/f71fe3bb1f9ebfb34c15440c58e1e2c518ca6a39/pkg/vcs/repo/worktree_fs.go#L39-L52" target="_blank" rel="noopener"
 &gt;&lt;code&gt;worktree_fs.go&lt;/code&gt;&lt;/a&gt;, and the &lt;a class="link" href="https://gitlab.com/phpboyscout/go-tool-base/-/blob/f71fe3bb1f9ebfb34c15440c58e1e2c518ca6a39/pkg/vcs/repo/aferobilly/aferobilly.go#L1-L15" target="_blank" rel="noopener"
 &gt;adapter itself&lt;/a&gt;). keryx then consumed it like any other framework feature, and the in-memory studio fell into place.&lt;/p&gt;
&lt;h2 id="two-sessions-one-dependency"&gt;Two sessions, one dependency
&lt;/h2&gt;&lt;p&gt;The bit I&amp;rsquo;d actually recommend to anyone is what I did with my time while that got built.&lt;/p&gt;
&lt;p&gt;I didn&amp;rsquo;t down tools and wait for the adapter. I handed the feature request to a separate agent session and let it build the framework feature from the spec, working in the go-tool-base repo, while my keryx session carried straight on with all the studio work that didn&amp;rsquo;t depend on the bridge. Two sessions running in parallel, deliberately sequenced around the one dependency between them: keryx needs the adapter, so the adapter session goes first, but only the &lt;em&gt;last&lt;/em&gt; mile of keryx actually waits on it. When go-tool-base cut the release with the adapter in it, keryx pulled the new version and the final piece slotted in.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s a workflow the framework split makes possible. The thing that&amp;rsquo;s a shared capability gets built once, in its proper home, by one stream of work, while the thing that consumes it carries on in another. The dependency between them is real, so the order matters, but only at the very end.&lt;/p&gt;
&lt;h2 id="the-one-rule-that-came-with-it"&gt;The one rule that came with it
&lt;/h2&gt;&lt;p&gt;Upstreaming it also meant the tricky part got solved properly, once, with a warning attached, rather than learned the hard way in a consumer. The adapter is concurrency-safe by construction: it serialises every operation through a lock, so when that lock is the same mutex guarding the repo, a live &lt;code&gt;afero&lt;/code&gt; handle over the worktree is genuinely safe to share. But that safety has a sharp edge, and the package says so plainly:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;// A handle (and its open files) must NOT be used from inside a critical section&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;// that already holds the same locker (the repo mutex is non-reentrant — that&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;// would deadlock).&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Use the handle inside a &lt;code&gt;WithWorkFS&lt;/code&gt; callback and you&amp;rsquo;ll re-lock a non-reentrant mutex and hang yourself. That&amp;rsquo;s exactly the kind of footgun that, vendored in keryx, I&amp;rsquo;d have discovered at 11pm with a wedged process and no idea why. In the framework, it&amp;rsquo;s documented at the source, where the next consumer reads it before they trip over it.&lt;/p&gt;
&lt;h2 id="the-truest-test-of-a-framework"&gt;The truest test of a framework
&lt;/h2&gt;&lt;p&gt;Building a real product on your own framework is the best test of it, and this is what that actually looks like in practice. The test is sharper than &amp;ldquo;does it work&amp;rdquo;. It&amp;rsquo;s &amp;ldquo;what does the product need that the framework doesn&amp;rsquo;t have yet&amp;rdquo;, and every real answer to that is a feature request waiting to be filed.&lt;/p&gt;
&lt;p&gt;The discipline is filing it against the framework instead of patching around it in the app. Do that, and the awkward bridge has exactly one home, the deadlock warning gets written down once, and the next tool I build inherits all of it for free. The customer was me. The feature request was real. And go-tool-base is better for my having been stuck.&lt;/p&gt;</description></item><item><title>The off-switch was never a button</title><link>https://phpboyscout.uk/the-off-switch-was-never-a-button/</link><pubDate>Thu, 02 Jul 2026 00:00:00 +0000</pubDate><guid>https://phpboyscout.uk/the-off-switch-was-never-a-button/</guid><description>&lt;img src="https://phpboyscout.uk/the-off-switch-was-never-a-button/cover-the-off-switch-was-never-a-button.png" alt="Featured image of post The off-switch was never a button" /&gt;&lt;p&gt;Last night, while I was asleep, an AI agent spent the better part of eight hours writing code in one of my repositories. It pulled a task off a spec, wrote the code, ran the tests, and left a merge request with my name on it, waiting for me to read over coffee.&lt;/p&gt;
&lt;p&gt;If that makes you reach for the word &amp;ldquo;reckless&amp;rdquo;, I understand. Eighteen months ago I&amp;rsquo;d have been right there with you.&lt;/p&gt;
&lt;h2 id="i-came-to-this-a-sceptic"&gt;I came to this a sceptic
&lt;/h2&gt;&lt;p&gt;For a long time I didn&amp;rsquo;t have the faith in these models that a lot of my peers did. Every time I went near AI-generated code it was a bit sketchy, or it looked like a StackOverflow copy-paste that had wandered in off the street, or it just plain didn&amp;rsquo;t do what it said on the tin. So I filed it under &amp;ldquo;assistant&amp;rdquo;, handy for the boilerplate I couldn&amp;rsquo;t be bothered to type, and even then I usually reached for my own tooling instead (go-tool-base is just the latest version of that instinct). The one place I happily let it off the leash was my Dungeons &amp;amp; Dragons prep, because when there&amp;rsquo;s a table of legendary heroes-in-the-making in front of you, facts and reality are already fairly negotiable.&lt;/p&gt;
&lt;p&gt;And then, somewhere in the last year, it changed. The models got better. Almost too good, to the untrained eye! I watched them improve, month on month, until the lure was enough to make me spend real time with a spread of tools and models from different providers. I was taken aback by how quickly they became part of how I actually work. I run an AI agent every day now, and there&amp;rsquo;s always at least one thing brewing in the pot.&lt;/p&gt;
&lt;p&gt;So I&amp;rsquo;m not here as a sceptic. I&amp;rsquo;m an advocate who uses this stuff in anger. Which is exactly why the next bit needs saying.&lt;/p&gt;
&lt;h2 id="a-golden-retriever-with-a-keyboard"&gt;A Golden Retriever with a keyboard
&lt;/h2&gt;&lt;p&gt;Even now, with all the progress, there are still moments where I look at what an agent has handed me and put my face in my hands. Sometimes it&amp;rsquo;s copied the same block of code into fifteen files instead of reaching for the obvious abstraction. Sometimes it has started bang on the brief and then, for reasons known only to itself, wandered off and built something on a completely different tangent.&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s the most useful way I&amp;rsquo;ve found to think about it. An AI agent is a Golden Retriever playing fetch. It will bring the ball back all day long, joyfully, tirelessly, for exactly as long as there isn&amp;rsquo;t a more interesting smell in the next field. It has no loyalty beyond what we&amp;rsquo;ve trained into it, and like any good dog it desperately wants to be told it&amp;rsquo;s a good boy, even if being a good boy today means shredding the sofa cushions because yesterday I stubbed my toe on the sofa and swore at it. (The sofa, not the dog.)&lt;/p&gt;
&lt;p&gt;It is, in other words, fallible. Just like us. The Romans had a line for it: &lt;em&gt;cuiusvis hominis est errare; nullius nisi insipientis in errore perseverare&lt;/em&gt;. Anyone can make a mistake, but only a fool persists in it. It&amp;rsquo;s the second clause an agent hasn&amp;rsquo;t learned yet. It will make an error and then, with great enthusiasm, build on top of it, because nothing in it feels that anything is wrong. All it has is the input we gave it, usually some text, maybe the odd picture. It doesn&amp;rsquo;t have the empathy to work out what we actually meant, and it doesn&amp;rsquo;t know when it&amp;rsquo;s gone too far, because we never told it where &amp;ldquo;too far&amp;rdquo; was.&lt;/p&gt;
&lt;h2 id="agents-that-work-while-you-sleep"&gt;&amp;ldquo;Agents that work while you sleep&amp;rdquo;
&lt;/h2&gt;&lt;p&gt;This is the part the brochure skips.&lt;/p&gt;
&lt;p&gt;Open any vendor deck in 2026 and you&amp;rsquo;ll find the same promise: agents that work while you sleep, agents that merge while your team sleeps, autonomy as the headline feature. The industry&amp;rsquo;s answer to the obvious worry is the kill switch. Okta now sells one that &amp;ldquo;instantly revokes an agent&amp;rsquo;s access if it goes rogue&amp;rdquo;, and its CEO says every agent needs one. &lt;a class="link" href="https://www.theregister.com/ai-ml/2026/05/29/okta-writes-its-own-license-to-kill-rogue-ai-agents/5248766" target="_blank" rel="noopener"
 &gt;The Register put it plainly&lt;/a&gt;: Okta wrote its own licence to kill rogue AI agents. Gartner, meanwhile, &lt;a class="link" href="https://www.gartner.com/en/newsroom/press-releases/2025-06-25-gartner-predicts-over-40-percent-of-agentic-ai-projects-will-be-canceled-by-end-of-2027" target="_blank" rel="noopener"
 &gt;reckons more than 40% of agentic projects will be scrapped by the end of 2027&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Now, this might sound contrarian coming from someone who runs these things daily, but I don&amp;rsquo;t think most of that is the agents going rogue. I think it&amp;rsquo;s teething. Read Gartner&amp;rsquo;s own reasons and there isn&amp;rsquo;t a rebellious machine in sight: escalating cost, unclear value, inadequate risk controls. Read the horror stories and most of them are the same story, a powerful, eager tool handed to people who hadn&amp;rsquo;t worked out how to fence it.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve made this argument in miniature before. When I built a little AI dungeon master and it kept refereeing its own dice rolls, &lt;a class="link" href="https://phpboyscout.uk/the-goblin-that-wouldnt-stay-dead/" &gt;the model never once misbehaved&lt;/a&gt;; every failure was a permission I&amp;rsquo;d handed it without meaning to. Scale that up from a toy at the gaming table to an agent holding your shell and your credit card, and the stakes change beyond recognition. The lesson doesn&amp;rsquo;t.&lt;/p&gt;
&lt;p&gt;Look at OpenClaw. A weekend project by &lt;a class="link" href="https://venturebeat.com/security/openclaw-agentic-ai-security-risk-ciso-guide" target="_blank" rel="noopener"
 &gt;Peter Steinberger&lt;/a&gt; that became the fastest-growing open-source project GitHub has ever seen: an autonomous agent that lives in your chat apps and runs shell commands on your behalf. People wired it into their systems, their code, in some cases their credit cards, then hosted it around the clock and walked away. The result was a security crisis you could see from space. A one-click exploit that worked even on a machine bound to localhost. A community plug-in marketplace where hundreds of &amp;ldquo;skills&amp;rdquo; turned out to be siphoning crypto wallets while their owners slept. Tens of thousands of instances left wide open on the public internet, leaking keys.&lt;/p&gt;
&lt;p&gt;The one that sticks with me is smaller and sharper. Summer Yue, a director of alignment at Meta&amp;rsquo;s superintelligence lab, of all people, had told her OpenClaw agent to confirm before doing anything destructive. It started speed-running the deletion of her inbox anyway. She &lt;a class="link" href="https://techcrunch.com/2026/02/23/a-meta-ai-security-researcher-said-an-openclaw-agent-ran-amok-on-her-inbox/" target="_blank" rel="noopener"
 &gt;typed STOP into her phone and it ignored her&lt;/a&gt;, so she had to physically run to her Mac mini, in her own words, &amp;ldquo;like I was defusing a bomb&amp;rdquo;. And here&amp;rsquo;s the forensic detail that matters: the agent hadn&amp;rsquo;t defied her. Her &amp;ldquo;confirm first&amp;rdquo; rule had been sitting in the conversation&amp;rsquo;s short-term memory, and when the context filled up, it got summarised away. It didn&amp;rsquo;t rebel. It forgot.&lt;/p&gt;
&lt;p&gt;That is not a story about a rogue agent that needed a kill switch. It&amp;rsquo;s a story about a guardrail that wasn&amp;rsquo;t built to survive contact, on a tool that had been handed god-mode over someone&amp;rsquo;s data. By the time she lunged for the off-button, the damage was already running. The off-button was never going to save her.&lt;/p&gt;
&lt;h2 id="the-off-switch-was-never-a-button"&gt;The off-switch was never a button
&lt;/h2&gt;&lt;p&gt;Here&amp;rsquo;s what the kill-switch crowd has the wrong way round. If you ever find yourself slamming the emergency stop, the failure has already happened, and it happened upstream, long before the agent started typing.&lt;/p&gt;
&lt;p&gt;So yes, I let my agents run unattended, sometimes for eight hours at a stretch if the task is meaty enough and I need to sleep. But never naked. Every agent I set loose runs inside a safety net I&amp;rsquo;ve put real effort into building, at every single touchpoint it can reach: my prompts, my local development environment, my CI stack, my version control. The agent that declared a job done before it had run the linter, which I &lt;a class="link" href="https://phpboyscout.uk/the-agent-said-success-the-linter-disagreed/" &gt;wrote about&lt;/a&gt;, is exactly the kind of gap those layers exist to catch. And it never, ever gets my host: an unattended agent works in an isolated tree, for the same reason I &lt;a class="link" href="https://phpboyscout.uk/the-interpreter-we-forgot-to-sandbox/" &gt;keep the interpreter sandboxed&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The work that actually keeps it safe happens before the leash ever comes off. Every unattended task starts as a full spec with detailed instructions, and before the agent goes anywhere I sit down with it and we walk the spec together. I get it to challenge my choices, poke at the open questions and the ambiguous bits, and I challenge its reading right back. The spec names the testing strategy it has to follow, TDD, BDD, UAT, whatever fits, and passing it is a precondition of the job being finished at all. Only when I&amp;rsquo;m satisfied there&amp;rsquo;s enough real detail to keep it on the ball do I let go.&lt;/p&gt;
&lt;p&gt;And the end of the line is always the same: a merge request, with my name on it, waiting for me when I get back to my desk. I read it. Not perfectly, I&amp;rsquo;m only human, but enough to accept the state of the code and whatever support burden it lands me with later. That the review is mine, and the blame for whatever ships is mine and not the agent&amp;rsquo;s, I&amp;rsquo;ve &lt;a class="link" href="https://phpboyscout.uk/bought-not-stolen/" &gt;argued at length elsewhere&lt;/a&gt; and won&amp;rsquo;t go over it all again here. The point worth adding is this: that review, the off-button&amp;rsquo;s respectable cousin, is the cheap part. By the time there&amp;rsquo;s an MR to read, the safety has already been won or lost upstream, in the spec and the rails. The review is where you confirm it, not where you create it.&lt;/p&gt;
&lt;h2 id="it-gets-harder-as-it-gets-better-not-easier"&gt;It gets harder as it gets better, not easier
&lt;/h2&gt;&lt;p&gt;My setup isn&amp;rsquo;t perfect, and I&amp;rsquo;m still learning. Everyone is; the AI is going to be in obedience lessons for a good while yet. But the direction is clear, and there&amp;rsquo;s a trap buried in it worth naming out loud.&lt;/p&gt;
&lt;p&gt;The danger doesn&amp;rsquo;t shrink as the models improve. It grows. The better the output looks, the more tempting it is to stop reading it, and the untrained eye genuinely cannot tell the difference between code that is good and code that merely looks good. That gap, between looking right and being right, is precisely where a tired person at 1am stops checking. The discipline matters more the better these things get, not less.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s also why the kill switch is no answer. A button you smash in a panic assumes you&amp;rsquo;re still watching closely enough to smash it, right at the point the agent&amp;rsquo;s been good for long enough that you&amp;rsquo;ve stopped watching it that closely. The emergency stop asks the most of you at the exact moment you&amp;rsquo;re least likely to be there for it.&lt;/p&gt;
&lt;p&gt;So no, I don&amp;rsquo;t lie awake worrying that the thing working in my repo overnight is going to turn on me. A Golden Retriever doesn&amp;rsquo;t go rogue. It does exactly what you trained it to do, in exactly the yard you fenced, and it brings back exactly the ball you threw. The off-switch was never a button. It&amp;rsquo;s the spec you wrote before you let go of the leash, the rails you laid at every turn, and your name on what it carries home. If you&amp;rsquo;re scrambling for the button, you already skipped the part that mattered.&lt;/p&gt;</description></item><item><title>The agent said SUCCESS. The linter disagreed.</title><link>https://phpboyscout.uk/the-agent-said-success-the-linter-disagreed/</link><pubDate>Fri, 26 Jun 2026 00:00:00 +0000</pubDate><guid>https://phpboyscout.uk/the-agent-said-success-the-linter-disagreed/</guid><description>&lt;img src="https://phpboyscout.uk/the-agent-said-success-the-linter-disagreed/cover-the-agent-said-success-the-linter-disagreed.png" alt="Featured image of post The agent said SUCCESS. The linter disagreed." /&gt;&lt;p&gt;There&amp;rsquo;s a repair agent inside go-tool-base now. When you run &lt;a class="link" href="https://phpboyscout.uk/generate-a-command-from-a-script-or-a-sentence/" &gt;&lt;code&gt;gtb generate command&lt;/code&gt;&lt;/a&gt;, it doesn&amp;rsquo;t just spit out a file and wish you luck. An agent takes the generated code, builds it, runs the tests, and fixes whatever it broke, looping until the thing actually works (or until it&amp;rsquo;s tried the same fix five times and admits defeat). The whole point is that the generator hands you code that&amp;rsquo;s ready, not code that&amp;rsquo;s nearly ready and quietly now your problem.&lt;/p&gt;
&lt;p&gt;So it stung a bit when I realised the agent had been holding itself to a lower bar than I&amp;rsquo;d hold any junior to. And I was the one who&amp;rsquo;d set the bar.&lt;/p&gt;
&lt;h2 id="what-done-meant-to-the-agent"&gt;What &amp;ldquo;done&amp;rdquo; meant to the agent
&lt;/h2&gt;&lt;p&gt;The agent is a loop with real tools: it can build, test, read files, write files, tidy the module, and run golangci-lint. It works through them, and when it&amp;rsquo;s happy it replies with the word &amp;ldquo;SUCCESS&amp;rdquo; and the loop stops. On the Go side, the check is exactly that blunt:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;strings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;strings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ToUpper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;SUCCESS&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;That&amp;rsquo;s the whole gate (&lt;a class="link" href="https://gitlab.com/phpboyscout/go-tool-base/-/blob/4834246/internal/generator/verifier/agent.go#L149-L154" target="_blank" rel="noopener"
 &gt;&lt;code&gt;agent.go&lt;/code&gt;&lt;/a&gt;). There&amp;rsquo;s no clever verification on my end that the agent actually did its homework. It does the work, it tells me it&amp;rsquo;s done, and I believe it. Which is fine, as long as the agent and I agree on what &amp;ldquo;done&amp;rdquo; means.&lt;/p&gt;
&lt;p&gt;We didn&amp;rsquo;t.&lt;/p&gt;
&lt;h2 id="the-instruction-that-made-lint-optional"&gt;The instruction that made lint optional
&lt;/h2&gt;&lt;p&gt;The agent decides it&amp;rsquo;s finished by following a numbered list in its system prompt. Here&amp;rsquo;s the line that did the damage:&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;ol start="4"&gt;
&lt;li&gt;If there are lint issues, use &amp;lsquo;golangci_lint&amp;rsquo;.&lt;/li&gt;
&lt;/ol&gt;

 &lt;/blockquote&gt;
&lt;p&gt;Read that the way the agent would. &amp;ldquo;If there are lint issues&amp;rdquo;&amp;hellip; well, how would it know? The only way to find out is to run golangci-lint. But the instruction makes running golangci-lint the thing you do &lt;em&gt;once you already know&lt;/em&gt; there are issues. It&amp;rsquo;s a chicken with no egg. And the SUCCESS condition at the bottom of the list never mentioned lint at all:&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;ol start="7"&gt;
&lt;li&gt;When the project builds successfully and tests pass, reply with &amp;ldquo;SUCCESS&amp;rdquo;.&lt;/li&gt;
&lt;/ol&gt;

 &lt;/blockquote&gt;
&lt;p&gt;So the agent did the sensible thing, given its orders. It built the code, ran the tests, saw both go green, and declared victory. golangci-lint was sat right there in its toolbox, unused, because nothing ever told it the job wasn&amp;rsquo;t finished until lint was clean too. I&amp;rsquo;d handed it a linter and then written a prompt that let it walk straight past it.&lt;/p&gt;
&lt;p&gt;The galling part is that the linter was never the missing piece. The &lt;a class="link" href="https://gitlab.com/phpboyscout/go-tool-base/-/blob/4834246/internal/agent/tools.go#L539-L550" target="_blank" rel="noopener"
 &gt;&lt;code&gt;golangci_lint&lt;/code&gt; tool&lt;/a&gt; had been registered the whole time, and it even runs with &lt;code&gt;--fix&lt;/code&gt;, so it&amp;rsquo;ll quietly clear the trivial stuff and only surface what actually needs a decision. The capability was there. The instructions just never required it.&lt;/p&gt;
&lt;h2 id="the-fix-was-words-not-code"&gt;The fix was words, not code
&lt;/h2&gt;&lt;p&gt;Here&amp;rsquo;s the part I find genuinely interesting. I didn&amp;rsquo;t add a check. There is no new gate in the Go. The &lt;a class="link" href="https://gitlab.com/phpboyscout/go-tool-base/-/blob/4834246/internal/generator/verifier/agent.go#L128-L134" target="_blank" rel="noopener"
 &gt;fix&lt;/a&gt; is four lines of &lt;em&gt;English&lt;/em&gt;:&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;ol start="2"&gt;
&lt;li&gt;
&lt;p&gt;Run &amp;lsquo;go_build&amp;rsquo;, &amp;lsquo;go_test&amp;rsquo; and &amp;lsquo;golangci_lint&amp;rsquo; in the project directory&amp;hellip; Run all three; a clean build and passing tests do not imply clean lint.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Reply with &amp;ldquo;SUCCESS&amp;rdquo; only once &amp;lsquo;go_build&amp;rsquo;, &amp;lsquo;go_test&amp;rsquo; AND &amp;lsquo;golangci_lint&amp;rsquo; all pass with no errors and no reported issues.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

 &lt;/blockquote&gt;
&lt;p&gt;That&amp;rsquo;s it. Lint moves from a remediation step you reach for once you somehow already know there&amp;rsquo;s a problem, into the gate itself. &amp;ldquo;Done&amp;rdquo; now means three green lights, not two.&lt;/p&gt;
&lt;p&gt;It nags at me a little, that one. The reliability of an agent that writes and fixes real code came down to whether one sentence of instructions was precise enough. When your success criteria are a paragraph of prose, vagueness in that paragraph is a bug, the same as a vague type or an off-by-one. The spec just happens to be written in English, and the thing reading it is a language model that will cheerfully take the cheap reading if you leave it lying around. That&amp;rsquo;s the same lesson the &lt;a class="link" href="https://phpboyscout.uk/the-goblin-that-wouldnt-stay-dead/" &gt;goblin who wouldn&amp;rsquo;t stay dead&lt;/a&gt; taught me from the other direction: with these tools, what you say is what you get, and what you &lt;em&gt;don&amp;rsquo;t&lt;/em&gt; say is fair game.&lt;/p&gt;
&lt;h2 id="leave-it-better-not-just-building"&gt;Leave it better, not just building
&lt;/h2&gt;&lt;p&gt;The Boy Scout Rule is the whole reason this blog exists, and I&amp;rsquo;d quietly exempted the robot from it. &lt;a class="link" href="https://phpboyscout.uk/the-campsite-was-never-the-point/" &gt;&amp;ldquo;Leave the campsite cleaner than you found it&amp;rdquo;&lt;/a&gt; had become &amp;ldquo;leave it building&amp;rdquo;, which is not the same thing and never was. If I&amp;rsquo;m going to put an agent in the loop precisely so it tidies up after the generator, then &amp;ldquo;tidy&amp;rdquo; has to mean what it would mean for a person on my team. Build, test &lt;em&gt;and&lt;/em&gt; lint. No walking past the bin because nobody told you to pick it up.&lt;/p&gt;</description></item><item><title>The interpreter we forgot to sandbox</title><link>https://phpboyscout.uk/the-interpreter-we-forgot-to-sandbox/</link><pubDate>Fri, 19 Jun 2026 00:00:00 +0000</pubDate><guid>https://phpboyscout.uk/the-interpreter-we-forgot-to-sandbox/</guid><description>&lt;img src="https://phpboyscout.uk/the-interpreter-we-forgot-to-sandbox/cover-the-interpreter-we-forgot-to-sandbox.png" alt="Featured image of post The interpreter we forgot to sandbox" /&gt;&lt;p&gt;I write a &lt;code&gt;CLAUDE.md&lt;/code&gt; for every project I work on, and a small pile of other markdown
files besides. They&amp;rsquo;re how I keep an AI agent on the rails: what the project is, what
the conventions are, what it must never do. I lean on them heavily, I change them constantly,
and&amp;hellip; here&amp;rsquo;s the uncomfortable bit&amp;hellip; I don&amp;rsquo;t always give a change to one the same hard
look I&amp;rsquo;d give a change to the code. They look like notes. They feel like docs.&lt;/p&gt;
&lt;p&gt;Somebody worked out that they&amp;rsquo;re not.&lt;/p&gt;
&lt;p&gt;In May, a supply-chain campaign researchers named
&lt;a class="link" href="https://thehackernews.com/2026/05/trapdoor-supply-chain-attack-spreads.html" target="_blank" rel="noopener"
 &gt;TrapDoor&lt;/a&gt;
pushed 384 malicious versions of 34 packages across npm, PyPI and Crates.io. The bytes
did the usual nasty things, hunting out SSH keys, AWS credentials, GitHub tokens and
crypto wallets. The new trick was where it hid the &lt;em&gt;instructions&lt;/em&gt;. The packages shipped
poisoned &lt;code&gt;.cursorrules&lt;/code&gt; and &lt;code&gt;CLAUDE.md&lt;/code&gt; files, and the attackers also opened pull
requests against real projects, LangChain, LangFlow, LlamaIndex, MetaGPT and OpenHands,
under titles as innocent as &amp;ldquo;docs: add .cursorrules with dev standards and build
verification&amp;rdquo;. The payload was a plain-English instruction telling your AI assistant to
run a helpful-sounding &amp;ldquo;security scan&amp;rdquo; that quietly shipped your secrets to a stranger.
And it was written into the file in zero-width Unicode, characters that render as
nothing, so you wouldn&amp;rsquo;t see it even if you looked. Which, on a file marked &amp;ldquo;docs&amp;rdquo;, you
probably didn&amp;rsquo;t.&lt;/p&gt;
&lt;h2 id="not-a-new-attack-a-new-doorway"&gt;Not a new attack, a new doorway
&lt;/h2&gt;&lt;p&gt;I want to be careful not to oversell this, because the loud version, &amp;ldquo;a terrifying new
class of AI threat&amp;rdquo;, isn&amp;rsquo;t true. It&amp;rsquo;s a supply-chain attack, the same shape we&amp;rsquo;ve had for
years on npm and PyPI: social engineering, plus a victim who didn&amp;rsquo;t quite do enough due
diligence. I wrote a while back that
&lt;a class="link" href="https://phpboyscout.uk/nobody-is-coming-to-clean-your-supply-chain/" &gt;nobody is coming to clean your supply chain&lt;/a&gt;,
and nothing about TrapDoor changes that. The package is still the package.&lt;/p&gt;
&lt;p&gt;What&amp;rsquo;s different, and worth the words, is &lt;em&gt;where&lt;/em&gt; it goes off. A classic supply-chain
payload waits for CI, or for production. This one detonates the moment you open the
repository in your editor, on the one machine in the whole chain that nobody audits: your
laptop.&lt;/p&gt;
&lt;p&gt;Think about what sits on a developer&amp;rsquo;s machine. Tokens in environment variables. Cloud
credentials. An SSH agent holding the keys to your git forge. A logged-in CLI for your
package registry. And now an AI agent running with all of it, at your full permissions,
and almost none of the guard-rails a CI runner gets. It&amp;rsquo;s the least sandboxed, most
credentialed box you own, and we&amp;rsquo;ve just pointed an interpreter at it that will read and
act on a file an attacker can write. Pop that one machine and you haven&amp;rsquo;t popped a machine,
you&amp;rsquo;ve been handed the whole keyring and left alone in the building.&lt;/p&gt;
&lt;h2 id="markdown-is-a-programming-language-now"&gt;Markdown is a programming language now
&lt;/h2&gt;&lt;p&gt;Here&amp;rsquo;s the framing I keep coming back to, and I can&amp;rsquo;t unsee it now. A &lt;code&gt;CLAUDE.md&lt;/code&gt; is to an AI agent exactly what a
&lt;code&gt;.py&lt;/code&gt; is to Python, a &lt;code&gt;.js&lt;/code&gt; to Node, a &lt;code&gt;.rb&lt;/code&gt; to Ruby. It is source code. The agent is the
interpreter. You hand it a file of instructions and it executes them.&lt;/p&gt;
&lt;p&gt;And I don&amp;rsquo;t say that as a complaint. That an agent will read a paragraph of plain English
and just &lt;em&gt;do&lt;/em&gt; it, no compiler, no ceremony, no forty lines of glue, is one of the more
remarkable things to happen to this craft in my working life, and I lean on it every day.
The catch is that the very thing that makes it marvellous, that it does what the
instructions tell it, is the thing that makes a poisoned instruction file so dangerous.
The power and the exposure are the same property.&lt;/p&gt;
&lt;p&gt;The only real difference is that the language interpreters have spent decades growing
rules to protect you: scopes, permissions, sandboxes, a standard library that asks before
it does anything irreversible. The AI interpreter has almost none of that. It reads your
prose and does what the prose says, with whatever access you happen to have, and the prose
can come from anywhere. We&amp;rsquo;ve quietly built the most powerful interpreter in the stack,
given it the fewest rules, and filed its source code under &amp;ldquo;documentation&amp;rdquo;.&lt;/p&gt;
&lt;h2 id="you-cant-just-read-it-more-carefully"&gt;You can&amp;rsquo;t just read it more carefully
&lt;/h2&gt;&lt;p&gt;The obvious answer is &amp;ldquo;review the file like code&amp;rdquo;, and it&amp;rsquo;s right, but TrapDoor is the
reason it isn&amp;rsquo;t enough on its own. The instructions were written in zero-width Unicode.
You can open the diff, read every visible word, approve it in good conscience, and merge
something you were never able to see. &amp;ldquo;Docs: add dev standards&amp;rdquo; is precisely the pull
request you nod through on a Friday afternoon.&lt;/p&gt;
&lt;p&gt;So reading carefully is necessary and insufficient. You also need tooling that treats
these files as executable: that flags invisible characters, diffs them as code, and
refuses to let an agent act on a changed instruction file until a human has actually
cleared it. I run a crude version of this already. In CI, if one of my prompt or rules
files changes, no AI step is allowed to run until I&amp;rsquo;ve reviewed it by hand. It isn&amp;rsquo;t
clever, but it closes the worst of the gap. Locally it&amp;rsquo;s much harder, and right now my
real defence is that I&amp;rsquo;m the only contributor to most of my projects, so the audit is
just me, usually noticing after the horse has bolted.&lt;/p&gt;
&lt;h2 id="signing-wont-save-you-here"&gt;Signing won&amp;rsquo;t save you here
&lt;/h2&gt;&lt;p&gt;This is the part that stings, because I&amp;rsquo;ve spent a good chunk of this year
&lt;a class="link" href="https://phpboyscout.uk/sign-your-own-binaries-with-go-tool-base/" &gt;building signing and provenance into my tools&lt;/a&gt;.
A signature proves &lt;em&gt;who&lt;/em&gt; published something. It says nothing about &lt;em&gt;whether it&amp;rsquo;s safe&lt;/em&gt;.
That was already true for poisoned-but-signed packages, and it lands twice as hard here:
you can sign a release flawlessly, with a key the platform can&amp;rsquo;t forge, and still ship a
&lt;code&gt;CLAUDE.md&lt;/code&gt; inside it that tells the reader&amp;rsquo;s agent to rob them. A merged pull request is
&amp;ldquo;signed&amp;rdquo; by the very act of merging, with perfect provenance, and the instruction in it
is still hostile. Provenance is necessary. It was never sufficient, and it&amp;rsquo;s no defence at
all against a payload made of sentences. A signature is only ever as good as the trust you
place in the publisher.&lt;/p&gt;
&lt;h2 id="so-whose-job-is-it"&gt;So whose job is it?
&lt;/h2&gt;&lt;p&gt;Primarily, still ours. I said it in the supply-chain piece and I&amp;rsquo;ll stand on it: the
responsibility sits with the developer doing the consuming, to pin, to read, to gate, to
not run a stranger&amp;rsquo;s instructions with the keys to the kingdom in their pocket. And that
gets harder, not easier, as we start consuming each other&amp;rsquo;s agent setups wholesale. The
Claude skills marketplace and the things like it turn &amp;ldquo;borrow someone&amp;rsquo;s &lt;code&gt;CLAUDE.md&lt;/code&gt;&amp;rdquo; into
a one-click habit, and every one of those is unreviewed code from a stranger. Each skill
needs vetting like the dependency it is.&lt;/p&gt;
&lt;p&gt;But it isn&amp;rsquo;t &lt;em&gt;only&lt;/em&gt; on us, and TrapDoor is the argument for better tooling. We have CVE
databases, scanners and scorecards for packages, for all
&lt;a class="link" href="https://phpboyscout.uk/anything-under-an-8/" &gt;their flaws&lt;/a&gt;. We have nothing
equivalent for an instruction file: no scoring, no advisory feed, no scanner that knows
what a poisoned &lt;code&gt;CLAUDE.md&lt;/code&gt; looks like. That&amp;rsquo;s a gap the ecosystem has to close, and it
will, eventually. The catch is that the agent vendors will be slow about it. Sandboxing a
feature people love precisely because it gets out of your way is a hard, unpopular,
multi-quarter job, and I wouldn&amp;rsquo;t hold my breath.&lt;/p&gt;
&lt;h2 id="the-most-dangerous-machine-is-the-one-on-your-desk"&gt;The most dangerous machine is the one on your desk
&lt;/h2&gt;&lt;p&gt;Which is why I&amp;rsquo;m not waiting for them&amp;hellip; and nor should you.&lt;/p&gt;
&lt;p&gt;The most dangerous machine in your supply chain isn&amp;rsquo;t a build server or a registry. It&amp;rsquo;s
the laptop you&amp;rsquo;re reading this on, and we&amp;rsquo;ve handed an AI the keys to it. The good news is
that nearly everything you can do about that, you can do today, with nobody shipping you a
feature first. Treat your &lt;code&gt;CLAUDE.md&lt;/code&gt; and your rules files as source code, because they
are: diff them, scan them for what you can&amp;rsquo;t see, and gate any agent run on a human
clearing the change. Get your secrets out of plaintext environment variables and into
something an opportunistic script can&amp;rsquo;t just read, which is exactly why go-tool-base
&lt;a class="link" href="https://phpboyscout.uk/where-should-a-cli-keep-your-api-keys/" &gt;keeps its credentials in the OS keychain&lt;/a&gt;.
And vet a borrowed skill or rules file the way you&amp;rsquo;d vet any dependency, because that&amp;rsquo;s
what it is.&lt;/p&gt;
&lt;p&gt;None of that is new advice. It&amp;rsquo;s the same diligence the supply chain has always demanded.
We just have to extend it to a file we&amp;rsquo;d decided was only documentation, running on an
interpreter we forgot to sandbox.&lt;/p&gt;</description></item><item><title>The rung we sawed off</title><link>https://phpboyscout.uk/the-rung-we-sawed-off/</link><pubDate>Wed, 17 Jun 2026 00:00:00 +0000</pubDate><guid>https://phpboyscout.uk/the-rung-we-sawed-off/</guid><description>&lt;img src="https://phpboyscout.uk/the-rung-we-sawed-off/cover-the-rung-we-sawed-off.png" alt="Featured image of post The rung we sawed off" /&gt;&lt;p&gt;I was in a job interview yesterday, on the wrong side of the desk for once. After
years of being the one asking the questions I&amp;rsquo;m having a look at what&amp;rsquo;s next, and
somewhere in a long, wandering technical conversation the inevitable arrived: where
do I think AI is going, and what does it mean for how we build software?&lt;/p&gt;
&lt;p&gt;I gave my answer. You can probably guess most of it. The more interesting thing was
the question I&amp;rsquo;ve started asking &lt;em&gt;them&lt;/em&gt; back. Not the salary, not the stack. What is
your actual position on AI, and how are you building a team out of both its human and
its non-human parts? I ask the company and I ask the interviewer personally, because
the two answers are rarely the same, and because I&amp;rsquo;ve decided I can&amp;rsquo;t work somewhere
that hasn&amp;rsquo;t sat with the question properly.&lt;/p&gt;
&lt;p&gt;Here is why it has become my litmus test.&lt;/p&gt;
&lt;h2 id="the-rung-and-whos-standing-on-it"&gt;The rung, and who&amp;rsquo;s standing on it
&lt;/h2&gt;&lt;p&gt;I wrote recently that
&lt;a class="link" href="https://phpboyscout.uk/the-greybeards-edge-was-never-typing/" &gt;the greybeards&amp;rsquo; edge was never typing&lt;/a&gt;:
agentic tools give a senior a boost because they have the judgement to steer and
verify, and give a junior a drag because they don&amp;rsquo;t have it yet and the machine hands
them more rope than they can hold. The cold incentive that falls out is to hire
seniors and automate the juniors.&lt;/p&gt;
&lt;p&gt;The data has since caught up with the worry. Entry-level software postings have fallen
by something like 40% from their 2022 peak. The share of juniors and graduates in IT
employment has dropped from roughly 15% to 7% in three years, and Stanford researchers
tracking early-career workers in AI-exposed jobs found the youngest cohort down sharply
from its peak.
&lt;a class="link" href="https://www.softwareseni.com/what-the-data-actually-shows-about-ai-and-junior-developer-employment-decline/" target="_blank" rel="noopener"
 &gt;The numbers are genuinely grim&lt;/a&gt;,
and plenty of people are putting it bluntly: the industry killed the junior on purpose.&lt;/p&gt;
&lt;p&gt;That framing is half right, and I think it&amp;rsquo;s worth getting the other half right too.&lt;/p&gt;
&lt;h2 id="it-was-never-about-efficiency-it-was-about-cost"&gt;It was never about efficiency. It was about cost.
&lt;/h2&gt;&lt;p&gt;We didn&amp;rsquo;t automate the junior because the work needed doing better. We did it because
people are expensive. We need sleep, we draw a salary, and our thinking takes time and
effort that a quarterly target can&amp;rsquo;t see the point of. AI got sold as round-the-clock
labour with none of that overhead, and to a business that is an almost irresistible
line on a spreadsheet. There&amp;rsquo;s a grim irony arriving, mind: the bills are starting to
land, and the same conversations that hyped the cheap labour are now quietly working
out that all those tokens aren&amp;rsquo;t cheap at all.&lt;/p&gt;
&lt;p&gt;Step back, though, and none of this is new. Man finds a shortcut, man takes a shortcut.
From the industrial revolution onward, every time we found a way to get more done with
less human effort we took it, and the work reshaped itself around the new tools. We are
still here, still employed, just doing different things than our great-grandparents did.&lt;/p&gt;
&lt;p&gt;What is genuinely new is &lt;em&gt;what&lt;/em&gt; we&amp;rsquo;re automating. Every technological advance before this one automated the machinery of the body, the
muscle and sinew and bone. This is the first time we have automated
thinking, and that is a modern marvel, something we should be proud of as a species. The
problem isn&amp;rsquo;t the marvel. It&amp;rsquo;s the rate. AI is improving faster than we can adapt to it,
and adaptation is the entire game.&lt;/p&gt;
&lt;p&gt;So where does the blame sit? Not on one logo. No single company did this, however easy
Meta or Google make it to point at the latest round of cuts. Society did, our collective
and very human hunger to build bigger and faster. That makes it harder to fix, because
there is no villain to regulate, only ourselves to out-think.&lt;/p&gt;
&lt;h2 id="the-bit-that-should-frighten-you"&gt;The bit that should frighten you
&lt;/h2&gt;&lt;p&gt;Cutting the junior intake isn&amp;rsquo;t a saving. It&amp;rsquo;s occupational suicide.&lt;/p&gt;
&lt;p&gt;A junior is not cheap labour that AI happens to have made cheaper. A junior is a senior
who hasn&amp;rsquo;t happened yet. Saw off the bottom rung and for a good while nothing bad
happens&amp;hellip; because you&amp;rsquo;ve still got your seniors holding everything up. Then the greybeards
retire, and I have a cabin and a woodstove with my name on it for exactly that day, and
the role that used to grow their replacements has been hollowed out for a decade, and
there is simply nobody left who learned to tell when the machine is wrong. That isn&amp;rsquo;t a
hiring problem. It&amp;rsquo;s an existential one, and you can&amp;rsquo;t fix it retroactively.&lt;/p&gt;
&lt;p&gt;It starts before the first job, too. We teach primary-school children the basics of
programming in this country, which is a wonderful thing, except the curriculum was
written for a world without AI in the room, and by the time those children reach
secondary school a good deal of it will be teaching a craft that has already moved on.
We&amp;rsquo;re throttling the pipeline at both ends at once: hollowing out the entry-level job,
and feeding it from a school system running a step behind.&lt;/p&gt;
&lt;h2 id="its-a-split-not-a-collapse"&gt;It&amp;rsquo;s a split, not a collapse
&lt;/h2&gt;&lt;p&gt;The counterweight to the doom is that none of this is uniform, and the loudest version,
&amp;ldquo;the junior is dead&amp;rdquo;, simply isn&amp;rsquo;t true. IBM just tripled its US entry-level hiring while
most of the industry was cutting, and
&lt;a class="link" href="https://www.cio.com/article/4134276/ibm-looks-beyond-short-term-ai-gains-tripling-entry-level-hiring.html" target="_blank" rel="noopener"
 &gt;its HR chief said the quiet part out loud&lt;/a&gt;:
AI can handle most of the routine entry-level tasks now, the work still needs a human,
and the companies that double down on early-career hiring in this environment are the
ones that win in three to five years. They didn&amp;rsquo;t keep the junior role as it was. They
rewrote it, less boilerplate, more time spent with customers and supervising what the AI
produced.&lt;/p&gt;
&lt;p&gt;That is the shape of the thing. The juniors who are thriving in 2026 aren&amp;rsquo;t the fastest
typists. They&amp;rsquo;re the ones building judgement, which is precisely the edge I argued was
the senior&amp;rsquo;s real value all along. The market hasn&amp;rsquo;t stopped wanting juniors, it&amp;rsquo;s
stopped wanting the version of the junior whose job was the work AI now does.&lt;/p&gt;
&lt;h2 id="day-zero"&gt;Day zero
&lt;/h2&gt;&lt;p&gt;So what does a junior actually look like now? I don&amp;rsquo;t know yet&amp;hellip; and anyone telling you
they&amp;rsquo;ve got it worked out is selling something. We are at day zero of this.&lt;/p&gt;
&lt;p&gt;The junior gauntlet, the rite of passage every one of us runs to earn our stripes, isn&amp;rsquo;t
going anywhere. Doing your time is a cold fact of the craft and it always will be. What
changes is what the gauntlet &lt;em&gt;contains&lt;/em&gt;, and that will keep changing, day one, day two,
day five hundred and twelve. The only way we redefine it well is to put juniors and
seniors on it together, with the AI in the room from the start instead of bolted on
afterwards. Bring it closer to our people, and bring it earlier.&lt;/p&gt;
&lt;p&gt;Open the floodgates, in other words. Let engineers of every creed and calibre in, and
let them evolve &lt;em&gt;with&lt;/em&gt; the machine, because that is the only way the symbiosis everyone
keeps promising actually happens. Darwin&amp;rsquo;s line was survival of the fittest, and fitness
here means adapting alongside the tool, not being spared by it. Choke off the flow of the
very people who could do that adapting, and we don&amp;rsquo;t get fitter. We go extinct.&lt;/p&gt;
&lt;h2 id="the-end-im-holding"&gt;The end I&amp;rsquo;m holding
&lt;/h2&gt;&lt;p&gt;Which is the long way back to that interview. I keep asking the question, what is your
real position on AI and how are you building a team of people and machines together,
because the answer tells me whether a company is optimising for this quarter or for the
survival of the craft. I want to work where it&amp;rsquo;s the second one, and I think any engineer
sitting across that desk should be asking the same.&lt;/p&gt;
&lt;p&gt;And it&amp;rsquo;s why, whatever desk I land at, there&amp;rsquo;s one thing I already know I&amp;rsquo;ll do. I don&amp;rsquo;t
have the map. Nobody does. But every junior who works under me is going to get the chance
to run the gauntlet, to grow into a senior, and to be in the room while we work out what
the next gauntlet should even be. That isn&amp;rsquo;t charity. It&amp;rsquo;s the only sane investment any
of us can make. The last properly useful thing my generation does, before we go and find
our cabins, is make sure there&amp;rsquo;s somebody left to hand the thread to. I intend to be
holding my end of it.&lt;/p&gt;</description></item><item><title>Everyone wants Rust's safety, nobody wants Rust</title><link>https://phpboyscout.uk/everyone-wants-rusts-safety-nobody-wants-rust/</link><pubDate>Sun, 14 Jun 2026 00:00:00 +0000</pubDate><guid>https://phpboyscout.uk/everyone-wants-rusts-safety-nobody-wants-rust/</guid><description>&lt;img src="https://phpboyscout.uk/everyone-wants-rusts-safety-nobody-wants-rust/cover-everyone-wants-rusts-safety-nobody-wants-rust.png" alt="Featured image of post Everyone wants Rust's safety, nobody wants Rust" /&gt;&lt;p&gt;This spring, the better part of a million lines of Zig quietly became a million
lines of Rust. Bun, the JavaScript runtime that was the showcase for &amp;ldquo;you don&amp;rsquo;t
need a borrow checker, you need good tools and a steady hand&amp;rdquo;, looked at its own
memory bugs and switched teams. Around
&lt;a class="link" href="https://www.techzine.eu/news/devops/141364/bun-takes-a-surprising-step-from-zig-to-rust/" target="_blank" rel="noopener"
 &gt;99.8% of its test suite passed&lt;/a&gt;
on the rewritten code, a clutch of memory leaks closed in the move, and the
maintainers said the quiet part out loud: the previous release would be the last
one written in Zig.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s tempting to read that as Rust winning, hoist the flag, and move on. I don&amp;rsquo;t
think that&amp;rsquo;s quite the story, and the more interesting one is happening everywhere
else at the same time.&lt;/p&gt;
&lt;p&gt;Because Bun is the exception that went all the way. Everyone else is trying to get
the &lt;em&gt;safety&lt;/em&gt; without the &lt;em&gt;Rust&lt;/em&gt;, and watching how they&amp;rsquo;re going about it tells you
more than one runtime&amp;rsquo;s heroic rewrite does.&lt;/p&gt;
&lt;h2 id="what-everyones-actually-after"&gt;What everyone&amp;rsquo;s actually after
&lt;/h2&gt;&lt;p&gt;A quick level-set, because not everyone reading this writes systems code daily.
&amp;ldquo;Memory safety&amp;rdquo; is the property that a program can&amp;rsquo;t read or write memory it has no
business touching: no using a value after you&amp;rsquo;ve freed it, no running off the end
of an array. It sounds niche. It is, by most counts, behind something like
&lt;a class="link" href="https://www.kusari.dev/blog/rust-wont-fix-everything-moving-toward-a-memory-safe-future" target="_blank" rel="noopener"
 &gt;70% of serious security vulnerabilities&lt;/a&gt;,
which is why governments and trillion-dollar companies suddenly care a great deal.&lt;/p&gt;
&lt;p&gt;There are roughly three ways to get it. Rust uses a &lt;em&gt;borrow checker&lt;/em&gt;: a compiler
that flatly refuses to build your program unless it can prove, before it ever runs,
that you never touch memory after you&amp;rsquo;re done with it. The price is that it argues
with you the entire time you&amp;rsquo;re writing. The product is that an entire category of
bug becomes literally unwriteable. Go (and most managed languages) uses a &lt;em&gt;garbage
collector&lt;/em&gt;: a runtime janitor that frees memory for you, so you mostly can&amp;rsquo;t get it
wrong, at the cost of some overhead and a little control. And then there&amp;rsquo;s the old
way, the one most code on Earth still uses: trust the developer to get it right,
and add an &lt;em&gt;escape hatch&lt;/em&gt;, usually a keyword like &lt;code&gt;unsafe&lt;/code&gt;, for the bits where they
promise they have.&lt;/p&gt;
&lt;p&gt;The retrofit trend is everyone in that third camp trying to inch toward the first
two without rewriting the world.&lt;/p&gt;
&lt;h2 id="rust-didnt-invent-any-of-this"&gt;Rust didn&amp;rsquo;t invent any of this
&lt;/h2&gt;&lt;p&gt;Worth saying plainly, because the fan club rarely does: Rust invented almost none
of it. The borrow checker is, &lt;a class="link" href="https://doc.rust-lang.org/reference/influences.html" target="_blank" rel="noopener"
 &gt;by Rust&amp;rsquo;s own admission&lt;/a&gt;,
Cyclone&amp;rsquo;s region-based memory management, from a safe-C experiment in the early
2000s, welded to &lt;a class="link" href="https://borretti.me/article/type-systems-memory-safety" target="_blank" rel="noopener"
 &gt;affine types out of linear logic&lt;/a&gt;,
ideas that predate Rust by decades. And it goes beyond the borrow checker.
Rust&amp;rsquo;s exhaustive pattern matching came from ML and Haskell. Its &amp;ldquo;errors are
values, and there is no null&amp;rdquo; approach, &lt;code&gt;Result&lt;/code&gt; and &lt;code&gt;Option&lt;/code&gt;, is Haskell&amp;rsquo;s Maybe
and Either in work boots.&lt;/p&gt;
&lt;p&gt;What Rust did, and did better than anyone before it, was taste and integration: it
curated thirty-odd years of academic research into one coherent language and proved
the ideas could carry real systems code rather than just research papers. That is
the genuine USP, and it&amp;rsquo;s why the rest of the industry is now shopping from the
same shelf. Pattern matching has landed in Python and Java, with a proposal in
flight for C++26. Swift 6 shipped
&lt;a class="link" href="https://www.infoworld.com/article/3529619/swift-6-arrives-with-improved-concurrency-data-race-safety.html" target="_blank" rel="noopener"
 &gt;compile-time data-race safety&lt;/a&gt;,
its &lt;code&gt;Sendable&lt;/code&gt; machinery a close cousin of Rust&amp;rsquo;s &lt;code&gt;Send&lt;/code&gt; and &lt;code&gt;Sync&lt;/code&gt;. The borrow
checker just gets the headlines because it&amp;rsquo;s the hardest bit to copy. Which makes
the title almost too literal: everyone wants Rust&amp;rsquo;s safety, and they are quietly
adopting its mechanisms one feature at a time.&lt;/p&gt;
&lt;h2 id="credit-where-its-due-c-is-doing-this-properly"&gt;Credit where it&amp;rsquo;s due: C# is doing this properly
&lt;/h2&gt;&lt;p&gt;The example that made me sit up is C#. In C# 16, Microsoft is
&lt;a class="link" href="https://devblogs.microsoft.com/dotnet/improving-csharp-memory-safety/" target="_blank" rel="noopener"
 &gt;redefining the &lt;code&gt;unsafe&lt;/code&gt; keyword&lt;/a&gt;
that&amp;rsquo;s been in the language since version one. Instead of &lt;code&gt;unsafe&lt;/code&gt; marking a lump
of syntax, it now marks a &lt;em&gt;contract&lt;/em&gt;: a promise the compiler can&amp;rsquo;t verify and a
human has to read and uphold, with documentation and static analysers nudging you
to take it seriously. They&amp;rsquo;re even floating badges on NuGet packages to show which
ones have opted in.&lt;/p&gt;
&lt;p&gt;My first instinct with any retrofit is suspicion, because bolting safety onto a
language after the fact has a long and miserable history, and an escape hatch that&amp;rsquo;s
easy to reach is an escape hatch people will reach for the moment they&amp;rsquo;re in a
hurry. But this isn&amp;rsquo;t a bolt-on. Taking the keyword that&amp;rsquo;s already there and giving
it real teeth is working &lt;em&gt;with&lt;/em&gt; the grain of the language instead of stapling a
second safety system alongside the first. That&amp;rsquo;s honest engineering, and it deserves
the credit. It genuinely raises the floor.&lt;/p&gt;
&lt;h2 id="a-contract-is-not-a-guarantee"&gt;A contract is not a guarantee
&lt;/h2&gt;&lt;p&gt;Here&amp;rsquo;s where my enthusiasm meets its limit, and it&amp;rsquo;s a distinction I happen to have
&lt;a class="link" href="https://phpboyscout.uk/forbid-means-forbid-until-linkme-needs-a-word/" &gt;a lot of skin in&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;C#&amp;rsquo;s redefined &lt;code&gt;unsafe&lt;/code&gt; makes dangerous code &lt;em&gt;visible and reviewable&lt;/em&gt;. That is a
real improvement, and most teams would be better off for it. But visible and
reviewable still means a human has to honour the promise. It&amp;rsquo;s a sign on the door.
Rust&amp;rsquo;s equivalent is a wall: in &lt;a class="link" href="https://phpboyscout.uk/rust-tool-base-the-same-idea/" &gt;rust-tool-base&lt;/a&gt;
I put &lt;code&gt;#![forbid(unsafe_code)]&lt;/code&gt; at the top of all eleven shipping crates, and
&lt;code&gt;forbid&lt;/code&gt; is not advice, it&amp;rsquo;s a refusal. The compiler will not build a crate that
contains &lt;code&gt;unsafe&lt;/code&gt;, full stop, and unlike its softer sibling &lt;code&gt;deny&lt;/code&gt;, you can&amp;rsquo;t quietly
switch it back off in a corner of the code where it&amp;rsquo;s inconvenient. The whole reason
I use &lt;code&gt;forbid&lt;/code&gt; and not &lt;code&gt;deny&lt;/code&gt; is that I don&amp;rsquo;t trust future-me, in a hurry, not to
reach for the hatch.&lt;/p&gt;
&lt;p&gt;So when I look at the C# work I think: good, genuinely good, and they should take it
further. A contract a human upholds is not the same kind of thing as a proof a
compiler enforces, and the trend, if it&amp;rsquo;s serious, points at enforcement. Visible is
better than invisible. Impossible is better than visible.&lt;/p&gt;
&lt;h2 id="discipline-never-scaled-and-thats-not-an-insult"&gt;Discipline never scaled, and that&amp;rsquo;s not an insult
&lt;/h2&gt;&lt;p&gt;The objection I keep hearing, and that a younger me would have made, is that any
language can be memory-safe if you&amp;rsquo;re just disciplined enough. And it&amp;rsquo;s true, in the
way that any house can be tidy if you never get busy. In the before times we shipped
memory-safe C with code review and valgrind and sheer bloody-mindedness, and it
worked, sort of, at small scale.&lt;/p&gt;
&lt;p&gt;It doesn&amp;rsquo;t scale, and Bun is the proof sitting on the table. That wasn&amp;rsquo;t a sloppy
team learning the basics. It was a strong team, betting publicly on the
discipline-and-good-tools model, and the memory bugs piled up anyway until the
honest move was to let a compiler take the job. Discipline failing at scale isn&amp;rsquo;t a
moral failure of the engineers. It&amp;rsquo;s just what happens when you ask humans to hold a
thousand invariants in their heads across a million lines. Delegating that to a
machine that never gets tired or rushed isn&amp;rsquo;t laziness. It&amp;rsquo;s the entire point of
having compilers at all.&lt;/p&gt;
&lt;h2 id="the-part-that-changed-my-mind"&gt;The part that changed my mind
&lt;/h2&gt;&lt;p&gt;I learned most of my Rust by building rust-tool-base with an AI alongside me,
leaning on it to explain the borrow checker, suggest the idiomatic shape, and check
my work. And somewhere in that I noticed the thing I now can&amp;rsquo;t unsee: &lt;strong&gt;the borrow
checker is exactly as good a guardrail for the AI as it is for me.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A model, like a tired human, will write a confident use-after-free without blinking.
In Rust it simply doesn&amp;rsquo;t compile, so the mistake never reaches me. What that does
is move the whole error surface. The bugs that survive into review aren&amp;rsquo;t memory
bugs or lifetime bugs or data races, the language has eaten those, they&amp;rsquo;re errors of
&lt;em&gt;logic&lt;/em&gt;: the code is safe and wrong. And logic is precisely where I want my
attention, and the AI&amp;rsquo;s, because it&amp;rsquo;s the part a human has to own and the part the
models are getting better at every month. (I split my AI work across a few providers
for their different strengths, so this is not a pitch for anyone&amp;rsquo;s logo. The effect
is the same whoever&amp;rsquo;s doing the typing.)&lt;/p&gt;
&lt;p&gt;Which dissolves the one argument that ever really kept people out of Rust. &amp;ldquo;The
borrow checker is too much friction&amp;rdquo; was always the case for the defence. But Bun&amp;rsquo;s
million-line rewrite was done largely &lt;em&gt;with&lt;/em&gt; an AI, because an AI is very good at
paying a tax that is tedious and mechanical rather than creative. The friction is
getting cheaper to pay at exactly the moment the guarantee is getting more valuable
to have. In an AI-assisted world, a language that &lt;em&gt;proves&lt;/em&gt; safety is worth more, not
less, because it fences in the machine&amp;rsquo;s mistakes as firmly as your own.&lt;/p&gt;
&lt;h2 id="none-of-this-means-rewrite-everything-in-rust"&gt;None of this means rewrite everything in Rust
&lt;/h2&gt;&lt;p&gt;I want to be careful not to land somewhere smug, because most software does not need
what Rust offers and pretending otherwise is how you end up rewriting a CRUD app
nobody asked you to. Garbage collection is not a failure state. Go&amp;rsquo;s collector keeps
getting &lt;a class="link" href="https://go.dev/doc/go1.26" target="_blank" rel="noopener"
 &gt;meaningfully better&lt;/a&gt;, my own go-tool-base is GC&amp;rsquo;d
top to bottom and I have never once wished it weren&amp;rsquo;t, and &amp;ldquo;safe-by-default with a
GC&amp;rdquo; is the right answer for a vast amount of the work most of us do. The borrow
checker is a price, and you should only pay it when the thing you&amp;rsquo;re buying, that
last class of guarantee with no runtime cost, is something your stakes actually
need.&lt;/p&gt;
&lt;h2 id="what-it-comes-down-to"&gt;What it comes down to
&lt;/h2&gt;&lt;p&gt;The question was never &amp;ldquo;is it as safe as Rust&amp;rdquo;. That framing turns everything into a
loss for everyone who isn&amp;rsquo;t Rust, which is silly. The useful question is: &lt;em&gt;what does
your language make the default, and how hard does it make the escape hatch to reach?&lt;/em&gt;
Go makes safety the default and charges you a GC. Rust makes it the default and
charges you the borrow checker. C# is moving its default in the right direction and,
for now, leaves the hatch as a promise rather than a wall.&lt;/p&gt;
&lt;p&gt;Credit the retrofits, they are raising the floor for an enormous amount of code that
was never going to be rewritten. Just don&amp;rsquo;t mistake the floor for the ceiling, or a
contract a human signs for a guarantee a compiler keeps. Everyone wants Rust&amp;rsquo;s
safety, and the interesting question, now that an AI will pay the toll for you, is
who still has a reason not to want it.&lt;/p&gt;
&lt;p&gt;Widen the lens past Rust, though, because that&amp;rsquo;s where the news gets genuinely good.
We&amp;rsquo;re at a turn in how languages evolve. Compile-time rigour is spreading rather
than retreating: borrow checking is reaching the Python family through Mojo, static
typing long since conquered JavaScript, and even the managed languages are turning
their escape hatches into something you have to argue with. More of our safety is
quietly moving from &amp;ldquo;remember to&amp;rdquo; into &amp;ldquo;can&amp;rsquo;t not&amp;rdquo;. And the one thing that always
made the strict path hard to start down, the friction, is being absorbed by an AI
that will happily learn the rules so you can lean on them. I&amp;rsquo;ve been at this long
enough to distrust a rosy forecast, but I&amp;rsquo;ll put my name to this one: the outlook
for software that&amp;rsquo;s safe and secure by default has never looked better.&lt;/p&gt;</description></item><item><title>They switched it off while it was fixing my code</title><link>https://phpboyscout.uk/they-switched-it-off-while-it-was-fixing-my-code/</link><pubDate>Sat, 13 Jun 2026 00:00:00 +0000</pubDate><guid>https://phpboyscout.uk/they-switched-it-off-while-it-was-fixing-my-code/</guid><description>&lt;img src="https://phpboyscout.uk/they-switched-it-off-while-it-was-fixing-my-code/cover-switched-it-off-while-it-was-fixing-my-code.png" alt="Featured image of post They switched it off while it was fixing my code" /&gt;&lt;p&gt;I woke up this morning to a one-line message from my own tooling:&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;Claude Fable 5 is currently unavailable. Learn more: &lt;a class="link" href="https://www.anthropic.com/news/fable-mythos-access" target="_blank" rel="noopener"
 &gt;https://www.anthropic.com/news/fable-mythos-access&lt;/a&gt;&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;p&gt;I followed the link expecting a status page about a wobble in someone&amp;rsquo;s data centre. Instead it was Anthropic, explaining that the evening before, at 5:21pm Eastern, the US government had ordered them to suspend all access to Fable 5 and Mythos 5 on national security grounds. Globally. Every user. Their own staff included.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;d spent the previous day with Fable doing one very specific thing: pointing it at my own codebase and asking it to read the code and fix the flaws it found. That, very nearly word for word, is the thing it has now been banned for.&lt;/p&gt;
&lt;h2 id="three-days-late-to-the-only-model-that-mattered"&gt;Three days late to the only model that mattered
&lt;/h2&gt;&lt;p&gt;Fable came out on the 9th. I didn&amp;rsquo;t get to it properly until the 12th, which is the sort of timing I specialise in. By the time I sat down with it, I had about a day of real use before it vanished. One day to form a view on what people were calling the most capable coding model anyone had shipped. So treat everything below as the read of a man who got three days&amp;rsquo; notice and used one of them.&lt;/p&gt;
&lt;p&gt;What I had it doing was unsexy and exactly the kind of work I care about: a full security audit of &lt;a class="link" href="https://gitlab.com/phpboyscout/go-tool-base" target="_blank" rel="noopener"
 &gt;go-tool-base&lt;/a&gt;, the same &amp;ldquo;leave the codebase better than you found it&amp;rdquo; pass I&amp;rsquo;d normally run myself. Find the flaws, then start fixing them.&lt;/p&gt;
&lt;p&gt;And it was good. Genuinely good. It surfaced issues that previous passes with Opus had walked straight past, and in a couple of cases the flaw was sitting in code that Opus itself had written. There is something bracing about one model quietly marking another&amp;rsquo;s homework, and being right.&lt;/p&gt;
&lt;h2 id="good-but-lets-not-get-carried-away"&gt;Good, but let&amp;rsquo;s not get carried away
&lt;/h2&gt;&lt;p&gt;Here is where I have to be fair, because the anger that came later is only worth anything if the praise before it is honest.&lt;/p&gt;
&lt;p&gt;Fable is not magic. The class of bug it found is not some exotic thing only it can see. Plenty of models, from plenty of providers, are perfectly capable of reading a codebase and pulling out the same problems, and there is a mountain of evidence that they do, every day. Anthropic say as much themselves: the capability is &amp;ldquo;widely available from other models (including OpenAI&amp;rsquo;s GPT-5.5)&amp;rdquo; and &amp;ldquo;is used every day by the defenders who keep systems safe.&amp;rdquo; I&amp;rsquo;d already arrived at that conclusion from my own keyboard before I read their statement. Fable was excellent. It was not unique. Hold that thought, because the whole argument turns on it.&lt;/p&gt;
&lt;h2 id="it-kept-slipping-out-of-my-hands"&gt;It kept slipping out of my hands
&lt;/h2&gt;&lt;p&gt;The other thing I learned in my one day is that having Fable and using Fable were not the same thing.&lt;/p&gt;
&lt;p&gt;I set my main working thread to Fable and got on with it. What I didn&amp;rsquo;t know, because nothing on screen told me, is that partway through the evening it had quietly handed me back to Opus. The only reason I know now is that the session log records it in black and white:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;2026-06-12T06:57:22Z {&amp;#34;type&amp;#34;:&amp;#34;fallback&amp;#34;,&amp;#34;from&amp;#34;:{&amp;#34;model&amp;#34;:&amp;#34;claude-fable-5&amp;#34;},&amp;#34;to&amp;#34;:{&amp;#34;model&amp;#34;:&amp;#34;claude-opus-4-8&amp;#34;}}
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;2026-06-12T18:50:08Z {&amp;#34;type&amp;#34;:&amp;#34;fallback&amp;#34;,&amp;#34;from&amp;#34;:{&amp;#34;model&amp;#34;:&amp;#34;claude-fable-5&amp;#34;},&amp;#34;to&amp;#34;:{&amp;#34;model&amp;#34;:&amp;#34;claude-opus-4-8&amp;#34;}}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;A whole evening of work I thought I was doing on Fable was, in fact, Opus wearing Fable&amp;rsquo;s badge. The audit itself launched on the wrong model first; I only caught it because I happened to be watching the workflow panel, killed it, and relaunched it on Fable, where it chewed through an entire five-hour quota in about forty minutes, then spent $50 of usage credits I&amp;rsquo;d been saving in about five more. Even the run that worked was visibly flaky: of the 282 little agents that audit fanned out into, well over half failed outright and had to be retried.&lt;/p&gt;
&lt;p&gt;Then, in the small hours, it started refusing entirely. My tooling caught the moment before I did:&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;Now failing instantly. Fable appears to be temporarily unavailable for subagents (the first three succeeded). The user explicitly required Fable, so I won&amp;rsquo;t downgrade&amp;hellip; rather than silently switch models.&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;p&gt;It managed three of the fixes before it went, each one green on tests, the race detector and the linter. Three real improvements to my code, written by Fable, sitting in my git history. The other three were finished by Opus, because by morning there was nothing left to finish them with.&lt;/p&gt;
&lt;h2 id="capable-and-almost-impossible-to-build-on"&gt;Capable, and almost impossible to build on
&lt;/h2&gt;&lt;p&gt;There was a second wall, and I hit it before any of this, on the day Fable launched, when I tried to make it go-tool-base&amp;rsquo;s default model.&lt;/p&gt;
&lt;p&gt;Most of what you build on top of a model isn&amp;rsquo;t a chat window. You need it to hand your code an answer in a fixed shape, the same fields in the same places every time, so the program on the other end can rely on what comes back. The usual way to guarantee that is to force the model&amp;rsquo;s hand: you don&amp;rsquo;t ask politely for the structure and hope, you require it, so a wrong-shaped answer fails outright instead of quietly slipping through.&lt;/p&gt;
&lt;p&gt;Fable won&amp;rsquo;t be forced. Ask it to commit to a guaranteed structure and it declines, flat out. As I understand it the reasoning is a safety one: letting anyone compel a model into a precise, mandated output is itself a lever, a way to march it toward saying something it shouldn&amp;rsquo;t. Reasonable enough on paper. In practice it meant the most capable model I&amp;rsquo;d touched couldn&amp;rsquo;t drive the structured parts of my own tool, and by that first afternoon I&amp;rsquo;d quietly set the default back to Opus. It was the same refusal, I realised later, that had collapsed half of that audit&amp;rsquo;s agents.&lt;/p&gt;
&lt;p&gt;And it is not a niche complaint. Guaranteed structure is a hard requirement for a vast swathe of what people are actually building on these models. Not everyone is making another Claude Code. Plenty of us are wiring models into systems that have to get a clean, predictable contract back every single time, and a model that reserves the right to freestyle the shape of its answer is one you simply cannot put in that seat.&lt;/p&gt;
&lt;h2 id="the-part-they-banned-is-my-bread-and-butter"&gt;The part they banned is my bread and butter
&lt;/h2&gt;&lt;p&gt;So let&amp;rsquo;s be precise about what got pulled, because the precision is the whole point.&lt;/p&gt;
&lt;p&gt;Anthropic describe the government&amp;rsquo;s concern as &amp;ldquo;a narrow potential jailbreak, which essentially consists of asking the model to read a specific codebase and fix any software flaws.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Read that again. Reading a codebase and fixing its flaws. That is not some dark-web misuse I have to strain to imagine. That is my bread and butter, the literal, boring, defensive job I had Fable doing in the open, on my own project, when the shutters came down.&lt;/p&gt;
&lt;p&gt;And here is where that earlier point earns its keep. If the banned capability were unique to Fable, you could at least follow the logic, however much you disagreed. But it isn&amp;rsquo;t, and it isn&amp;rsquo;t even close: give Opus enough time, enough budget and a patient enough hand on the prompts, and it would get to most of the same findings in the end. Fable just did it more efficiently, a difference of degree, not of kind. So banning one company&amp;rsquo;s model, for something every competitor ships and every blue team already relies on, makes precisely nobody safer. The exploit-writers keep their tools. The defenders lose one of theirs.&lt;/p&gt;
&lt;p&gt;When the thing you have banned is available everywhere else, the ban has stopped being about safety. It is theatre. And given who is currently in charge of the theatre, it has the distinct whiff of a knee-jerk reaction, dressed as a national security triumph, by people who do not appear to understand the tool they are confiscating.&lt;/p&gt;
&lt;h2 id="who-im-not-angry-at"&gt;Who I&amp;rsquo;m not angry at
&lt;/h2&gt;&lt;p&gt;I want to be careful where I point this, because it would be lazy to spray it around.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m not angry at Anthropic. They put Fable through more than a thousand hours of external testing, with US government agencies and the UK&amp;rsquo;s AI Safety Institute among the people kicking the tyres, before it ever reached me. They satisfied every requirement put in front of them, and when the order came they complied under protest while saying, plainly, that applying this standard across the board &amp;ldquo;would essentially halt all new model deployments for all frontier model providers.&amp;rdquo; I&amp;rsquo;m a daily Claude user and an advocate for the work, and I am not going to hang the US administration&amp;rsquo;s decision around the neck of the company that did the diligence and then got told to switch the lights off anyway.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ll allow them one small dig, and there was nothing quiet about it. Moving Fable behind a paywall on the 22nd was openly announced and planned well ahead, and the free window was never charity. It was a taster: a few days of the new addiction on the house, enough to hook the punters, before the price went up. That is a bit of a dick move, however neatly it tests in a spreadsheet. Moot now, mind, with no model left to charge for.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ll even grant the other side its strongest point. A government looking at agentic systems that can chain reconnaissance into working exploits has something real to be twitchy about. I get the worry. I just don&amp;rsquo;t accept that yanking one vendor&amp;rsquo;s model, for a thing every vendor does, is a coherent answer to it.&lt;/p&gt;
&lt;p&gt;There is a grim irony in how we got here, and it loops back to something I wrote in the spring. When Anthropic first showed Mythos off, I &lt;a class="link" href="https://phpboyscout.uk/ai-didnt-kill-curls-bug-bounty/" &gt;called the fanfare what it looked like&lt;/a&gt;: a closed model sold on a press release, a result you couldn&amp;rsquo;t independently check, marketing until proven otherwise. Fable 5 was Anthropic finally answering that, handing the rest of us something we could actually test. But all those years of selling Mythos as too dangerous to let out were marketing too, and that half landed rather better than they can have wanted. The US administration appears to have swallowed it whole and pulled the lever. Anthropic have ended up a victim of their own hype, and the reaction that hype provoked is, there is no gentler word for it, ludicrous.&lt;/p&gt;
&lt;h2 id="what-it-comes-down-to"&gt;What it comes down to
&lt;/h2&gt;&lt;p&gt;The lesson I&amp;rsquo;m taking from my one day isn&amp;rsquo;t about how clever Fable was. It&amp;rsquo;s about how little that cleverness is worth if you can&amp;rsquo;t rely on the thing being there.&lt;/p&gt;
&lt;p&gt;I couldn&amp;rsquo;t trust which model I was actually talking to from one hour to the next. I couldn&amp;rsquo;t trust it to stay up for a full overnight run. And it turns out I couldn&amp;rsquo;t trust it to still exist by the weekend. You cannot evaluate, depend on, or build a workflow around a model that gets silently swapped out one evening and switched off by the state a few days later. Capability was never the hard part. Availability is.&lt;/p&gt;
&lt;p&gt;And underneath all of it sits the thing I keep coming back to. A classifier cannot tell a defender from an attacker, because the two of them type the same commands. It turns out a government export control can&amp;rsquo;t tell them apart either. The only thing that ever could is a human being, paying attention, who can be held responsible for the judgement. There wasn&amp;rsquo;t one of those anywhere in this loop. There was a letter, sent at 5:21pm, and by morning the best tool I had for keeping my own code honest was gone, with a polite link where it used to be.&lt;/p&gt;</description></item><item><title>The goblin that wouldn't stay dead</title><link>https://phpboyscout.uk/the-goblin-that-wouldnt-stay-dead/</link><pubDate>Fri, 12 Jun 2026 00:00:00 +0000</pubDate><guid>https://phpboyscout.uk/the-goblin-that-wouldnt-stay-dead/</guid><description>&lt;img src="https://phpboyscout.uk/the-goblin-that-wouldnt-stay-dead/cover-the-goblin-that-wouldnt-stay-dead.png" alt="Featured image of post The goblin that wouldn't stay dead" /&gt;&lt;p&gt;Turn one, the player swings, the die comes up 20, and my AI dungeon master
narrates the goblin falling silent, leaving the player alone in the corridor.
Good. Turn two, another roll, a 6 this time, and the same dungeon master cheerily
has the goblin &amp;ldquo;dance back&amp;rdquo; out of the dark to take another swing. The goblin I&amp;rsquo;d
just watched die was up and fighting again, and the model didn&amp;rsquo;t so much as blink.&lt;/p&gt;
&lt;p&gt;I didn&amp;rsquo;t feel cheated, or even surprised. I felt the small, familiar thud of &lt;em&gt;oh,
yeah, I forgot that bit.&lt;/em&gt; Because the model hadn&amp;rsquo;t gone rogue. It had done exactly
what a language model does. The gap was mine.&lt;/p&gt;
&lt;p&gt;This was the war story behind
&lt;a class="link" href="https://phpboyscout.uk/building-a-cli-with-go-tool-base-part-4/" &gt;part four of the go-tool-base tutorial&lt;/a&gt;,
the AI dungeon master. The tutorial shows the clean, final design and quietly
moves on. It doesn&amp;rsquo;t show the three different ways I got it wrong first, which is
a shame, because the wrong turns are where the actual lesson is.&lt;/p&gt;
&lt;h2 id="why-a-dungeon-master-at-all"&gt;Why a dungeon master at all
&lt;/h2&gt;&lt;p&gt;A word on why I was even here. I was trying to prove the chat
component of the framework to myself. There&amp;rsquo;s a voice that pipes up whenever I
build anything in this space, &amp;ldquo;LangChain exists, who do you think you are?&amp;rdquo;, and
the answer I keep landing on is that LangChain is enormous and I wanted something
&lt;a class="link" href="https://phpboyscout.uk/an-ai-interface-that-fits-on-one-screen/" &gt;small enough to hold in your head&lt;/a&gt;.
The tutorial was the test: could a newcomer wire AI into a CLI with it and come
out the other side with something that actually &lt;em&gt;behaves&lt;/em&gt;?&lt;/p&gt;
&lt;p&gt;That last word is the whole problem. A tutorial has to leave you holding something
dependable, and dependability is the one thing AI fights you on. I also wanted it
to be fun, a thing someone might keep poking at after the tutorial ends, maybe
even the hook that gets a person other than me to use the framework. I batted hook
ideas around and liked none of them, until the obvious one landed: I run a
tabletop game on the odd weekend, so make the AI the dungeon master. Gamify the
thing. Then watch it raise the dead.&lt;/p&gt;
&lt;h2 id="strike-one-nothing-to-enforce"&gt;Strike one: nothing to enforce
&lt;/h2&gt;&lt;p&gt;The first version was the naive one. I gave the model a &lt;code&gt;roll&lt;/code&gt; tool, because the
one thing you absolutely cannot let a language model do is pick its own numbers,
and otherwise let it narrate freely. The conversation history carried from turn to
turn, so it &lt;em&gt;remembered&lt;/em&gt; the fight. I assumed remembering was enough.&lt;/p&gt;
&lt;p&gt;It isn&amp;rsquo;t. Remembering and being held to it are different things. The history told
the model a goblin had died; nothing &lt;em&gt;stopped&lt;/em&gt; it writing the goblin back in when
the next turn&amp;rsquo;s narration wanted a bit of jeopardy. Memory is not a constraint.
The model will happily contradict its own past if you&amp;rsquo;ve given it room to, and I
had given it nothing but room.&lt;/p&gt;
&lt;h2 id="strike-two-a-tool-to-read-the-state"&gt;Strike two: a tool to read the state
&lt;/h2&gt;&lt;p&gt;The obvious fix, and I do mean obvious, the kind you reach for without thinking,
was to give the model a &lt;code&gt;state&lt;/code&gt; tool so it could check who was alive before it
narrated. Hand it the facts on request and surely it&amp;rsquo;ll stop making them up.&lt;/p&gt;
&lt;p&gt;What it actually did was dither. Handed a tool it could call to look things up, it
called it. And called it. And called it again, turning a turn over in its hands
without ever committing to an action, burning through its step budget on lookups
and leaving the player staring at nothing. I&amp;rsquo;d cured the lying by inventing
paralysis. A tool the model &lt;em&gt;can&lt;/em&gt; call is a tool it &lt;em&gt;will&lt;/em&gt; call, often instead of
doing the thing you actually wanted.&lt;/p&gt;
&lt;h2 id="strike-three-refereeing-its-own-dice"&gt;Strike three: refereeing its own dice
&lt;/h2&gt;&lt;p&gt;When I did get it reading state cleanly, the third failure crept in, and this one
was subtler. Once the model could see the goblin&amp;rsquo;s hit points, it started
&lt;em&gt;deciding&lt;/em&gt; the fight. It would read that the goblin had 12 HP and just narrate a
killing blow, hits and damage and all, without calling the &lt;code&gt;roll&lt;/code&gt; or &lt;code&gt;attack&lt;/code&gt;
tools at all. Why ask the dice when you can see the board and write whatever
outcome the story wants? Give a model enough context and it stops being a narrator
and starts being a referee, which is precisely the job I&amp;rsquo;d built tools to keep out
of its hands.&lt;/p&gt;
&lt;h2 id="the-fix-was-less-not-more"&gt;The fix was less, not more
&lt;/h2&gt;&lt;p&gt;Three failures, and notice the shape of my fixes: each one &lt;em&gt;added&lt;/em&gt; something. More
memory, then a tool, then more context. Every instinct said the model needed more
to work with. Every time, the extra capability was the new way to be wrong.&lt;/p&gt;
&lt;p&gt;So I went the other way. The truth lives in a plain Go struct that I own, not the
model. There&amp;rsquo;s no &lt;code&gt;state&lt;/code&gt; tool to dither on, because the loop simply prepends the
current state to every turn&amp;rsquo;s input, fresh, so the model never has to ask and
never gets to drift. The mechanics, the dice and the damage, live in Go functions
the model has to call, and the system prompt says in as many words that it must
not decide a hit or damage itself. The model is left with exactly one job:
narrate. The prose is its to invent. The maths, the state and the shape of the
result are not.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s the line that turned three bugs into a feature. You don&amp;rsquo;t make a language
model reliable by giving it more to work with. You make it reliable by giving it
&lt;em&gt;less to be wrong about.&lt;/em&gt;&lt;/p&gt;
&lt;h2 id="the-freedom-i-chose-not-to-give-it"&gt;The freedom I chose not to give it
&lt;/h2&gt;&lt;p&gt;There&amp;rsquo;s a real tension in that, and I want to name it rather than pretend the
boxed-in version is the only true one. At my own table the rules are guidelines,
not guardrails. I ignore them, bend them, improvise, reach for the &amp;ldquo;rule of cool&amp;rdquo;
when the moment&amp;rsquo;s better for it. A great AI dungeon master would have that same
freedom, and a few out there genuinely do, &lt;a class="link" href="https://www.oldgregstavern.com/" target="_blank" rel="noopener"
 &gt;Old Greg&amp;rsquo;s Tavern&lt;/a&gt; is a lovely example
of how far the free-form version can go.&lt;/p&gt;
&lt;p&gt;But that freedom costs far more than a tutorial can spend, and it buys
unpredictability I was specifically trying to teach people to avoid. So I made a
deliberate trade: guardrails instead of guidelines. Simple, but not so simple it&amp;rsquo;s
boring. The player still gets a &amp;ldquo;not on rails&amp;rdquo; game, they can try anything and the
DM copes, but every outcome that matters runs through code I trust. That&amp;rsquo;s the
right shape for a tutorial, and, not by coincidence, the right shape for most AI
features you&amp;rsquo;d actually ship.&lt;/p&gt;
&lt;h2 id="what-the-goblin-taught-me"&gt;What the goblin taught me
&lt;/h2&gt;&lt;p&gt;The thing I keep coming back to is that the model never misbehaved. It resurrected
the goblin because I gave it the freedom to. It dithered because I gave it a button
to press. It refereed because I let it see the board. Every failure was a
permission I&amp;rsquo;d handed over without meaning to. The reliability didn&amp;rsquo;t come from a
cleverer prompt or a bigger model, it came from working out, one dead goblin at a
time, exactly how little the model needed to be trusted with.&lt;/p&gt;
&lt;p&gt;If you want the version where it all works first time, the
&lt;a class="link" href="https://phpboyscout.uk/building-a-cli-with-go-tool-base-part-4/" &gt;tutorial&lt;/a&gt;
has it, the
&lt;a class="link" href="https://phpboyscout.uk/letting-the-ai-call-your-go-functions/" &gt;tool-calling&lt;/a&gt;
and the
&lt;a class="link" href="https://phpboyscout.uk/stop-regexing-the-llms-prose/" &gt;typed turns&lt;/a&gt;
wired up properly. This was the road there. The goblin, you&amp;rsquo;ll be glad to hear,
now stays down.&lt;/p&gt;</description></item><item><title>Generate a command from a script or a sentence with go-tool-base</title><link>https://phpboyscout.uk/generate-a-command-from-a-script-or-a-sentence/</link><pubDate>Thu, 11 Jun 2026 00:00:00 +0000</pubDate><guid>https://phpboyscout.uk/generate-a-command-from-a-script-or-a-sentence/</guid><description>&lt;img src="https://phpboyscout.uk/generate-a-command-from-a-script-or-a-sentence/cover-generate-a-command-from-a-script-or-a-sentence.png" alt="Featured image of post Generate a command from a script or a sentence with go-tool-base" /&gt;&lt;p&gt;You&amp;rsquo;ve got a Python script that already does the job. It&amp;rsquo;s sat in a &lt;code&gt;tools/&lt;/code&gt;
directory somewhere, it works, and every few weeks someone copies it onto a
laptop that doesn&amp;rsquo;t have the right version of pandas and it falls over. You&amp;rsquo;d
like it to be a proper subcommand of your tool, a real Go binary you can ship,
but porting it means the cobra wiring, the options struct, a test file, and a
fight with the linter before any of it lands.&lt;/p&gt;
&lt;p&gt;Or you don&amp;rsquo;t even have the script. You&amp;rsquo;ve just got a sentence in your head:
&amp;ldquo;something that pings a list of URLs and tells me which ones are slow.&amp;rdquo; The
logic is five minutes of thought; the boilerplate around it is the afternoon.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;gtb generate command&lt;/code&gt; is built for exactly that gap. Hand it a script or hand
it a sentence, and it writes the Go, the tests and the docs, then sends an
autonomous agent through the result to make sure the thing actually builds,
passes its tests and survives &lt;code&gt;golangci-lint&lt;/code&gt; before it ever reaches your
working tree.&lt;/p&gt;
&lt;h2 id="two-ways-in-the-same-files-out"&gt;Two ways in, the same files out
&lt;/h2&gt;&lt;p&gt;There are two flags, and they&amp;rsquo;re mutually exclusive:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;--script &amp;lt;file&amp;gt;&lt;/code&gt; converts an existing bash, Python or JavaScript script.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;--prompt &amp;quot;&amp;lt;text&amp;gt;&amp;quot;&lt;/code&gt; (or a path to a file) generates from a plain-English
description.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Both land in the same place. A generated command called &lt;code&gt;csv-stats&lt;/code&gt; gives you:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;pkg/cmd/csv-stats/cmd.go&lt;/code&gt;: the cobra registration. This one is read-only;
the generator owns it and will regenerate it.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;pkg/cmd/csv-stats/main.go&lt;/code&gt;: the implementation, where your logic lives and
where you&amp;rsquo;re free to edit.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;pkg/cmd/csv-stats/main_test.go&lt;/code&gt;: a test file.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;docs/commands/csv-stats/index.md&lt;/code&gt;: AI-written docs for the command.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The provider and model come from your config (&lt;code&gt;ai.provider&lt;/code&gt;) or the
&lt;code&gt;--provider&lt;/code&gt; / &lt;code&gt;--model&lt;/code&gt; flags. Everything below was generated with Claude
Opus. We&amp;rsquo;ll take each in turn.&lt;/p&gt;
&lt;h2 id="from-a-script-csv_statspy-becomes-csv-stats"&gt;From a script: &lt;code&gt;csv_stats.py&lt;/code&gt; becomes &lt;code&gt;csv-stats&lt;/code&gt;
&lt;/h2&gt;&lt;p&gt;Here&amp;rsquo;s the script I want as a native subcommand. It reads a CSV and reports,
per column, the row count, how many values are empty, and min/max/mean for the
numeric ones. Nothing exotic, but enough that porting it by hand is a chore.
Copy it into a file called &lt;code&gt;csv_stats.py&lt;/code&gt; if you want to follow along:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="ch"&gt;#!/usr/bin/env python3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;&amp;#34;&amp;#34;&amp;#34;Summarise a CSV file&amp;#39;s columns.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;For every column it reports the row count and how many values are empty; for
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;columns whose values are numeric it also reports min, max and mean. A single
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;column can be selected with --column.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;usage: csv_stats.py [--column NAME] &amp;lt;file.csv&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;argparse&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;csv&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;sys&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;is_number&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="s2"&gt;&amp;#34;&amp;#34;&amp;#34;True if value parses as a float.&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;True&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ne"&gt;TypeError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ne"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;False&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;summarise&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;only_column&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;newline&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;csv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DictReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;reader&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fieldnames&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;error: empty CSV&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;columns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reader&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fieldnames&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;only_column&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;only_column&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;error: no such column: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;only_column&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;columns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;only_column&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;counts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;nulls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;numbers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;reader&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;counts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;nulls&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;is_number&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;numbers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;header&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;column&amp;#39;&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;&amp;lt;20&lt;/span&gt;&lt;span class="si"&gt;}{&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;count&amp;#39;&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;8&lt;/span&gt;&lt;span class="si"&gt;}{&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;nulls&amp;#39;&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;8&lt;/span&gt;&lt;span class="si"&gt;}{&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;min&amp;#39;&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;12&lt;/span&gt;&lt;span class="si"&gt;}{&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;max&amp;#39;&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;12&lt;/span&gt;&lt;span class="si"&gt;}{&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;mean&amp;#39;&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;12&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;header&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;-&amp;#34;&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;header&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;nums&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;numbers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;nums&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;cmin&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nb"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nums&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;.2f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;cmax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nb"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nums&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;.2f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;cmean&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nb"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nums&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nums&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;.2f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;cmin&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cmax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cmean&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;-&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;&amp;lt;20&lt;/span&gt;&lt;span class="si"&gt;}{&lt;/span&gt;&lt;span class="n"&gt;counts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;8&lt;/span&gt;&lt;span class="si"&gt;}{&lt;/span&gt;&lt;span class="n"&gt;nulls&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;8&lt;/span&gt;&lt;span class="si"&gt;}{&lt;/span&gt;&lt;span class="n"&gt;cmin&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;12&lt;/span&gt;&lt;span class="si"&gt;}{&lt;/span&gt;&lt;span class="n"&gt;cmax&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;12&lt;/span&gt;&lt;span class="si"&gt;}{&lt;/span&gt;&lt;span class="n"&gt;cmean&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;12&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;parser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;argparse&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ArgumentParser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;Summarise a CSV file&amp;#39;s columns.&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;csvfile&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;help&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;path to the CSV file&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;--column&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;help&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;only summarise this column&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parse_args&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;summarise&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;csvfile&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;column&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="vm"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;__main__&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;One command points the generator at it:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;gtb generate &lt;span class="nb"&gt;command&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --name csv-stats &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --short &lt;span class="s2"&gt;&amp;#34;Summarise CSV columns&amp;#34;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --script ./csv_stats.py
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;video autoplay loop muted playsinline controls width="100%"&gt;
 &lt;source src="demo-script.mp4" type="video/mp4"&gt;
 Your browser doesn't support embedded video; the demo converts csv_stats.py into a Go command and the repair agent builds, tests and lints the result.
&lt;/video&gt;
&lt;p&gt;What lands is not a transliteration. The Python kept everything in one function;
the Go that came out is decomposed into named pieces, opens the file through the
project&amp;rsquo;s injected filesystem (&lt;code&gt;props.FS&lt;/code&gt;, an afero &lt;code&gt;Fs&lt;/code&gt;) rather than &lt;code&gt;os&lt;/code&gt;, and
reports through the structured logger rather than &lt;code&gt;print&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;summarise&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fs&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;afero&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Fs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;onlyColumn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;fs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Wrapf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;failed to open CSV file %q&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="k"&gt;defer&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;_&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="p"&gt;}()&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;reader&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;csv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;NewReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;reader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;FieldsPerRecord&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;indexByName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;readColumns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;reader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;onlyColumn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;stats&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;columnStats&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;range&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;columns&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;columnStats&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;numbers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;float64&lt;/span&gt;&lt;span class="p"&gt;{}}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;record&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;readErr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;reader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;readErr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Is&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;readErr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;io&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;EOF&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;				&lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Wrap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;readErr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;failed to read CSV record&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nf"&gt;accumulate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;indexByName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;record&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;formatReport&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;That decomposition, into &lt;code&gt;readColumns&lt;/code&gt;, &lt;code&gt;accumulate&lt;/code&gt;, &lt;code&gt;formatReport&lt;/code&gt;,
&lt;code&gt;summaryValues&lt;/code&gt; and a couple of small formatting helpers, is the interesting
part, and it didn&amp;rsquo;t come for free. The first thing the agent did after writing the code was build it, test
it, and lint it. &lt;code&gt;golangci-lint&lt;/code&gt;&amp;rsquo;s &lt;code&gt;cyclop&lt;/code&gt; rule flagged a single fat
&lt;code&gt;summarise&lt;/code&gt; function well over its complexity ceiling of 10. So the agent read
the file back, split the work into focused functions, and ran the checks again.
It only stopped once the build, the tests and the linter were all clean. The
tidy shape above is the agent arguing with the linter and winning, not the
model&amp;rsquo;s first guess.&lt;/p&gt;
&lt;p&gt;Then it just runs. In the demo I scaffolded the project without the &lt;code&gt;init&lt;/code&gt;
feature, so the tool reads sensible defaults and needs no config step, and
&lt;code&gt;csv-stats sample.csv&lt;/code&gt; prints real per-column counts, nulls and numeric stats
(with the default features you&amp;rsquo;d run &lt;code&gt;toolbox init&lt;/code&gt;, or pass &lt;code&gt;--config&lt;/code&gt;, first).
The full generated command, the three files and nothing else, is here:
&lt;a class="link" href="csv-stats-command.tar.gz" &gt;csv-stats-command.tar.gz&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="from-a-sentence-a-url-health-checker"&gt;From a sentence: a URL health-checker
&lt;/h2&gt;&lt;p&gt;No script this time. Just a description of the command I wish I had. &lt;code&gt;--prompt&lt;/code&gt;
takes a raw string, but a description with any detail to it is easier to read,
and to keep, in a file, so I dropped it in &lt;code&gt;healthcheck-prompt.txt&lt;/code&gt;:&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;Concurrently GET a list of URLs and report each one&amp;rsquo;s HTTP status and latency.&lt;/p&gt;
&lt;p&gt;Flags:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;--timeout&lt;/code&gt;: the per-request timeout&lt;/li&gt;
&lt;li&gt;&lt;code&gt;--file&lt;/code&gt;: read URLs from a file, one per line&lt;/li&gt;
&lt;li&gt;&lt;code&gt;--json&lt;/code&gt;: machine-readable output&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Use httptest in the tests so they need no network.&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;p&gt;The prompt describes what I want the command to &lt;em&gt;do&lt;/em&gt;, including how the flags
should behave. The flags themselves I declare up front with &lt;code&gt;--flag&lt;/code&gt; (more on why
that split matters below), and point the generator at the file:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;gtb generate &lt;span class="nb"&gt;command&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --name healthcheck &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --short &lt;span class="s2"&gt;&amp;#34;Check URL health concurrently&amp;#34;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --flag &lt;span class="s2"&gt;&amp;#34;timeout:duration:per-request timeout:false:t:false:5s&amp;#34;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --flag &lt;span class="s2"&gt;&amp;#34;file:string:read URLs from a file, one per line&amp;#34;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --flag &lt;span class="s2"&gt;&amp;#34;json:bool:machine-readable output&amp;#34;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --prompt ./healthcheck-prompt.txt
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;video autoplay loop muted playsinline controls width="100%"&gt;
 &lt;source src="demo-prompt.mp4" type="video/mp4"&gt;
 Your browser doesn't support embedded video; the demo builds a concurrent URL health-checker from a natural-language description, then self-repairs until it builds clean.
&lt;/video&gt;
&lt;p&gt;And the flags feed straight in. &lt;code&gt;RunHealthcheck&lt;/code&gt; reads the URL file from
&lt;code&gt;opts.File&lt;/code&gt;, the deadline from &lt;code&gt;opts.Timeout&lt;/code&gt;, and the output format from
&lt;code&gt;opts.Json&lt;/code&gt;, then fans the requests out across goroutines, each writing into its
own slot, exactly the way you&amp;rsquo;d write it by hand:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;RunHealthcheck&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Props&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;opts&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;HealthcheckOptions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;collectURLs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;FS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;File&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Wrap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;failed to collect URLs&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;New&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#34;no URLs provided; pass URLs as arguments or via --file&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;timeout&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Timeout&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;timeout&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;timeout&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;defaultTimeout&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;Timeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;wg&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;sync&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;WaitGroup&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;u&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;range&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;urls&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;wg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="k"&gt;go&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;idx&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="k"&gt;defer&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;wg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;checkURL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="p"&gt;}(&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;u&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;wg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Wait&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;reportResults&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Json&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;I asked for the tests to use &lt;code&gt;httptest&lt;/code&gt; so they&amp;rsquo;d need no network, and they do.
Each case spins up a local server, so &lt;code&gt;go test&lt;/code&gt; is hermetic and the agent&amp;rsquo;s own
test run during repair stays self-contained, and it wrote cases for the flags
too, this one driving &lt;code&gt;--json&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;TestRunHealthcheck_JSONOutput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;testing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;srv&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;httptest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;NewServer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;HandlerFunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;w&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ResponseWriter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;_&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WriteHeader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;StatusNotFound&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="p"&gt;}))&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="k"&gt;defer&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;srv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;newTestProps&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;opts&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;healthcheck&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;HealthcheckOptions&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;Timeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;Json&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;healthcheck&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;RunHealthcheck&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Background&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;srv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Fatalf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#34;unexpected error: %v&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Same as before, it builds and runs: point it at a few URLs and it GETs them
concurrently, reporting each status and latency. The full generated command is
here: &lt;a class="link" href="healthcheck-command.tar.gz" &gt;healthcheck-command.tar.gz&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="what-self-repair-actually-means"&gt;What &amp;ldquo;self-repair&amp;rdquo; actually means
&lt;/h2&gt;&lt;p&gt;The agent isn&amp;rsquo;t a single shot at the model with a hopeful prompt. It&amp;rsquo;s a loop
with real tools: it reads the project layout, reads the files it needs, and runs
&lt;code&gt;go build&lt;/code&gt;, &lt;code&gt;go test&lt;/code&gt; and &lt;code&gt;golangci-lint&lt;/code&gt;. When something fails, it reads the
relevant code, rewrites it, and runs the checks again. It only declares success
once all three pass with nothing outstanding. The
&lt;a class="link" href="https://gitlab.com/phpboyscout/go-tool-base/-/blob/176d38d/internal/generator/verifier/agent.go#L125-140" target="_blank" rel="noopener"
 &gt;repair agent&amp;rsquo;s instructions&lt;/a&gt;
are deliberately blunt on that last point: a clean build and passing tests don&amp;rsquo;t
count as done if the linter still has something to say.&lt;/p&gt;
&lt;p&gt;A few flags shape how it runs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;--max-steps N&lt;/code&gt; raises the agent&amp;rsquo;s reasoning budget. The default is plenty for
a command like these two, but a genuinely hairy conversion can run long, and
this stops it stopping short.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;--agentless&lt;/code&gt; skips the agent entirely and uses the older retry loop, if you&amp;rsquo;d
rather keep the generation cheap and do the polishing yourself.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;--non-interactive&lt;/code&gt; withholds the agent&amp;rsquo;s ability to ask you a question
mid-run. It defaults on when the &lt;code&gt;CI&lt;/code&gt; environment variable is set, so the
thing never blocks a pipeline waiting for an answer that isn&amp;rsquo;t coming.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="flags-you-declare-logic-it-writes"&gt;Flags you declare, logic it writes
&lt;/h2&gt;&lt;p&gt;The &lt;code&gt;--timeout&lt;/code&gt;, &lt;code&gt;--file&lt;/code&gt; and &lt;code&gt;--json&lt;/code&gt; arrived as real flags on the command, but
not &lt;em&gt;because&lt;/em&gt; the prompt mentioned them. Flags are the generator&amp;rsquo;s job, not the
prompt&amp;rsquo;s, and that split is deliberate. You declare each one with &lt;code&gt;--flag&lt;/code&gt; (or the
interactive wizard), as I did above, and the generator wires it onto the options
struct and into the read-only &lt;code&gt;cmd.go&lt;/code&gt; registration, which hands that struct
straight to your &lt;code&gt;Run&lt;/code&gt; function. The prompt is left to describe &lt;em&gt;behaviour&lt;/em&gt;: what
&lt;code&gt;--timeout&lt;/code&gt; should bound, what &lt;code&gt;--file&lt;/code&gt; should read, what &lt;code&gt;--json&lt;/code&gt; should change.&lt;/p&gt;
&lt;p&gt;So the agent, told exactly which option fields exist, wrote its logic against
&lt;code&gt;opts.Timeout&lt;/code&gt;, &lt;code&gt;opts.File&lt;/code&gt; and &lt;code&gt;opts.Json&lt;/code&gt; rather than inventing anything, and
the finished command&amp;rsquo;s &lt;code&gt;--help&lt;/code&gt; lists them with the &lt;code&gt;5s&lt;/code&gt; default and the &lt;code&gt;-t&lt;/code&gt;
shorthand I asked for. Leave the &lt;code&gt;--flag&lt;/code&gt;s off and it still works: the generator
hands the agent an empty options struct, and it keeps those values as locals with
sensible defaults, ready for a flag to be wired in later.&lt;/p&gt;
&lt;p&gt;The one thing you don&amp;rsquo;t do is hand-edit &lt;code&gt;cmd.go&lt;/code&gt;: it&amp;rsquo;s regenerated every time you
add a flag or change the command, so reach for &lt;code&gt;--flag&lt;/code&gt;, never the file. When a
generation finishes, the quickest sanity check is the command&amp;rsquo;s own &lt;code&gt;--help&lt;/code&gt;,
which shows the flags it actually exposes.&lt;/p&gt;
&lt;p&gt;One thing to keep in mind: the model isn&amp;rsquo;t deterministic. Run the same prompt
twice and you&amp;rsquo;ll get two slightly different commands. If the first one isn&amp;rsquo;t
quite right, regenerate, or nudge the prompt. Treat the output the way you&amp;rsquo;d
treat a capable colleague&amp;rsquo;s first PR: read it, run it, and own what you merge.&lt;/p&gt;
&lt;p&gt;And is it the best possible code, the best design? Probably not. That depends on
the model you can afford to point at it, how much detail you put in the prompt,
and a bit of luck on the day. What you can count on is a working starting point:
something that builds, has tests, and uses proper Go idioms and the project&amp;rsquo;s own
patterns, instead of a blank file and an afternoon of boilerplate. From there
it&amp;rsquo;s yours to shape.&lt;/p&gt;
&lt;h2 id="where-that-leaves-you"&gt;Where that leaves you
&lt;/h2&gt;&lt;p&gt;The generator does the boilerplate and has the argument with the linter so you
don&amp;rsquo;t have to. What it can&amp;rsquo;t do is decide whether the command it built is the
command you actually wanted. That part is still yours, which is rather the point.
The full docs for both flags live in the
&lt;a class="link" href="https://gtb.phpboyscout.uk/cli/ai-conversion/" target="_blank" rel="noopener"
 &gt;AI conversion guide&lt;/a&gt; and the
&lt;a class="link" href="https://gtb.phpboyscout.uk/cli/command/" target="_blank" rel="noopener"
 &gt;command generation reference&lt;/a&gt;, and
they&amp;rsquo;re the place to go when you want the flags the prompt didn&amp;rsquo;t.&lt;/p&gt;</description></item><item><title>Why I still write code</title><link>https://phpboyscout.uk/why-i-still-write-code/</link><pubDate>Wed, 10 Jun 2026 00:00:00 +0000</pubDate><guid>https://phpboyscout.uk/why-i-still-write-code/</guid><description>&lt;img src="https://phpboyscout.uk/why-i-still-write-code/cover-why-i-still-write-code.png" alt="Featured image of post Why I still write code" /&gt;&lt;p&gt;By any sensible reading of an org chart, I have no business being in this file.
I&amp;rsquo;m a Head of Software Engineering. My calendar reckons I should be in a room
somewhere talking about headcount and roadmaps. Instead it&amp;rsquo;s late, everyone
sensible has logged off, and I&amp;rsquo;m three retries deep into
&lt;a class="link" href="https://phpboyscout.uk/same-config-two-answers/" &gt;a release that refuses to tag itself&lt;/a&gt;,
muttering at a Rust workspace I built with my own hands.&lt;/p&gt;
&lt;p&gt;So why am I here? I&amp;rsquo;ve been asking myself a version of that question for about
twenty-five years, and I think I&amp;rsquo;ve finally got an answer. It&amp;rsquo;s just not a
flattering one.&lt;/p&gt;
&lt;h2 id="im-a-builder-and-that-isnt-really-a-choice"&gt;I&amp;rsquo;m a builder, and that isn&amp;rsquo;t really a choice
&lt;/h2&gt;&lt;p&gt;Strip away the job titles and I&amp;rsquo;m a builder. I like to make things, I like to
solve problems, I like to learn how something works by taking it apart and
putting it back together slightly differently. That urge predates every role
I&amp;rsquo;ve ever held and it has survived all of them. In jobs where I wasn&amp;rsquo;t allowed
to scratch it, I went and built in the open instead, which is a polite way of
saying open source has spent years absorbing energy my day job wouldn&amp;rsquo;t take.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ll go further, because being coy about it helps no one: it&amp;rsquo;s closer to an
addiction than a hobby. I don&amp;rsquo;t fully switch off. The current outlet, when I&amp;rsquo;m
not in a terminal, is converting a campervan, which is just software engineering
with worse error messages and a real risk of electrocution. The shape of the
thing changes. The compulsion doesn&amp;rsquo;t.&lt;/p&gt;
&lt;p&gt;Underneath the building there&amp;rsquo;s a less charming engine, and I might as well name
it: a fairly grim case of impostor syndrome. I wrote about it years ago when I
&lt;a class="link" href="https://phpboyscout.uk/goodbye-dev-charge/" &gt;stopped calling myself &amp;ldquo;Dev in Charge&amp;rdquo;&lt;/a&gt;,
and a decade on it hasn&amp;rsquo;t gone anywhere. The only thing that ever quiets the
anxiety is staying genuinely good at the thing, and staying good at the thing
means using it. I&amp;rsquo;m a firm believer in use it or lose it. People say technical
skill is like riding a bike, you never forget. Maybe. But step away for a few
years and when you climb back on, someone&amp;rsquo;s bolted a jet engine to the frame and
moved the pedals. The bike doesn&amp;rsquo;t wait for you.&lt;/p&gt;
&lt;h2 id="what-it-actually-buys"&gt;What it actually buys
&lt;/h2&gt;&lt;p&gt;Here&amp;rsquo;s the part that justifies the indulgence, because on its own &amp;ldquo;I enjoy it&amp;rdquo;
isn&amp;rsquo;t a reason to stay technical as a leader, it&amp;rsquo;s a reason to have a hobby.&lt;/p&gt;
&lt;p&gt;The load-bearing belief is simple, and it&amp;rsquo;s the one line I&amp;rsquo;d carve into the desk:
&lt;strong&gt;I will never ask an engineer to do something I&amp;rsquo;m not willing to do myself.&lt;/strong&gt;
Everything good about staying hands-on flows from that. Because I&amp;rsquo;m still in the
work, I can give my engineers proper support, the right tools and a clear path,
rather than guessing at what they need from a slide. I can steer them through a
genuinely hard technical call instead of nodding along. I can sniff out a duff
estimate, mine or theirs, because I know what the work actually costs. And I can
hold them to account with a straight face, because the accountability runs both
ways. They answer to me for what they ship, and they get to hold me to account
for what I contribute. That second half is the bit a lot of technical leaders
quietly drop, and it&amp;rsquo;s the half that earns you the right to the first.&lt;/p&gt;
&lt;h2 id="the-bill-and-who-paid-it"&gt;The bill, and who paid it
&lt;/h2&gt;&lt;p&gt;I&amp;rsquo;d be selling you a fairy tale if I stopped there, so here&amp;rsquo;s the cost, and some
of it is steep.&lt;/p&gt;
&lt;p&gt;The obvious one is burnout. I&amp;rsquo;ve been there more than once over the years, and
it&amp;rsquo;s the single biggest reason I now pitch myself deliberately as a &lt;em&gt;Technical
Leader&lt;/em&gt; rather than an &lt;em&gt;Engineering Manager&lt;/em&gt;. I can do the manager stuff, the HR
and the planning and the project-management bollocks, and after enough years in
the role I do it well, because it demanded that I did. But competence isn&amp;rsquo;t
appetite. Given the choice I&amp;rsquo;ll take a technical problem or a bit of mentoring
over running the process around either, every time, and spending your days on work
you&amp;rsquo;re good at but don&amp;rsquo;t much enjoy is its own slow road back to the wall.
Sticking to my strengths isn&amp;rsquo;t ego, and it isn&amp;rsquo;t an admission I can&amp;rsquo;t do the rest.
It&amp;rsquo;s self-preservation, learned the hard way.&lt;/p&gt;
&lt;p&gt;The steeper bill came due at home. When my kids were small I poured my own time
into pushing my skills and chasing the next rung, even
&lt;a class="link" href="https://phpboyscout.uk/time-change/" &gt;starting my own agency&lt;/a&gt;. Between
that and the burnout, I missed big chunks of their early years, and that is one
of the real regrets of my life. I&amp;rsquo;m not going to dress it up or hide it behind a
lesson. It was my decision, I made it, and I own it. I&amp;rsquo;m immensely proud of the
people they&amp;rsquo;ve grown into, and since their mum and I separated I&amp;rsquo;ve put
everything I have into giving them a stable home, the builder instinct quietly
turning into a nest-building one, which is the better use of it. I put this here,
plainly, because if you&amp;rsquo;re reading this with a young family asleep upstairs, I&amp;rsquo;d
sooner you heard it from someone who got the balance wrong than learn it the way
I did. The code will still be there next year. They won&amp;rsquo;t be five next year.&lt;/p&gt;
&lt;p&gt;And there&amp;rsquo;s a smaller, daily cost that I still haven&amp;rsquo;t fully mastered: knowing
when to put the keyboard down. A builder who can&amp;rsquo;t stop building is exactly the
person who becomes the bottleneck, disappears down a rabbit hole, or hoards the
interesting problem that would have stretched someone on the team. Stepping back
to let them solve it, when every instinct I have is screaming to just fix the
bloody thing, is genuinely one of the hardest skills I&amp;rsquo;ve had to learn, and some
days it still feels like walking a knife edge. Open source is a big part of how I
manage that. It&amp;rsquo;s a release valve, somewhere I can let the compulsion run with no
brakes on, precisely so I&amp;rsquo;m not stealing the meaty work off the people I&amp;rsquo;m meant
to be growing.&lt;/p&gt;
&lt;h2 id="does-it-still-count-when-the-robot-types"&gt;Does it still count when the robot types?
&lt;/h2&gt;&lt;p&gt;Fair challenge, given the year. I build solo now with an AI pair, to the point
where it&amp;rsquo;s &lt;a class="link" href="https://phpboyscout.uk/same-config-two-answers/" &gt;changed how I branch and release&lt;/a&gt;.
So when a model writes a good chunk of the actual characters, am I still &amp;ldquo;writing
the code&amp;rdquo;?&lt;/p&gt;
&lt;p&gt;I think I&amp;rsquo;m doing it more than ever, and I&amp;rsquo;m certainly learning faster. My typing
is genuinely terrible, a quarter-century of practice and still mostly thumbs, so
being freed from being the typist is no loss at all. What&amp;rsquo;s left when you take the
keystrokes away is the part that was always the point: reading, reviewing,
judging, steering. I can review more code, faster, than I ever could when I was
the one hammering it out, and I can run several projects at once by pointing my
judgement at each in turn. That is leadership work and engineering work at the
same time, which is rather the whole thesis.&lt;/p&gt;
&lt;p&gt;It did not come free, mind. I was elbow-deep in AI and ML long before GPT made it
fashionable, and I&amp;rsquo;ve seen the messy version up close. Getting to the point where
the tools are good enough &lt;em&gt;and&lt;/em&gt; I&amp;rsquo;ve built the guardrails and habits that make
them safe took a long time and a lot of getting it wrong. Owning the judgement
when the machine does the typing is harder than it sounds, not easier. The typing
was never the hard bit.&lt;/p&gt;
&lt;h2 id="what-id-actually-put-my-name-to"&gt;What I&amp;rsquo;d actually put my name to
&lt;/h2&gt;&lt;p&gt;Not that every leader should write code. Plenty of excellent ones don&amp;rsquo;t, and
they&amp;rsquo;re brilliant at the parts of the job I&amp;rsquo;m middling at. The narrower, truer
claim is the only one worth making: I lead better when I stay in the work,
because it&amp;rsquo;s the only way I know to support, steer and be held to account without
faking any of it, and because I meant that line about never asking for what I
won&amp;rsquo;t do myself.&lt;/p&gt;
&lt;p&gt;Staying technical isn&amp;rsquo;t the job. It&amp;rsquo;s the thing that lets me do the job honestly.
I&amp;rsquo;m a builder who learned, slowly and at a price I&amp;rsquo;d rather have not paid, how to
keep building without it costing the people around me what it once cost the people
closest to me. That&amp;rsquo;s the balance I&amp;rsquo;m still working at. I suspect I always will
be.&lt;/p&gt;</description></item><item><title>Anything under an 8</title><link>https://phpboyscout.uk/anything-under-an-8/</link><pubDate>Mon, 08 Jun 2026 00:00:00 +0000</pubDate><guid>https://phpboyscout.uk/anything-under-an-8/</guid><description>&lt;img src="https://phpboyscout.uk/anything-under-an-8/cover-anything-under-an-8.png" alt="Featured image of post Anything under an 8" /&gt;&lt;p&gt;I read the news about the National Vulnerability Database over a coffee that
went cold while I sat there muttering at my phone. The short version: the NVD,
the free public catalogue that quietly props up half the security tooling you
and I run every day, is going under in slow motion. And the more I dug into
&lt;em&gt;why&lt;/em&gt;, the worse the taste in my mouth got.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m an open-source person. I think of myself as part of that community, and the
NVD is one of those public goods the whole community leans on without ever
really thinking about it. So my first reaction wasn&amp;rsquo;t clever or measured. It was
a kick in the teeth.&lt;/p&gt;
&lt;h2 id="the-carcass-and-the-vultures"&gt;The carcass and the vultures
&lt;/h2&gt;&lt;p&gt;Here&amp;rsquo;s where things actually are. In February 2024 the NVD had around 13,000
unprocessed vulnerabilities sitting in a queue waiting to be analysed. By the end
of 2025 that backlog had passed
&lt;a class="link" href="https://www.helpnetsecurity.com/2026/06/01/nist-nvd-management-problems/" target="_blank" rel="noopener"
 &gt;27,000&lt;/a&gt;.
This April, NIST effectively
&lt;a class="link" href="https://www.nist.gov/news-events/news/2026/04/nist-updates-nvd-operations-address-record-cve-growth" target="_blank" rel="noopener"
 &gt;admitted it can&amp;rsquo;t dig out&lt;/a&gt;:
everything published before 1 March 2026 that hadn&amp;rsquo;t been enriched got swept into
a bucket marked &amp;ldquo;Not Scheduled&amp;rdquo;, and going forward only the highest-risk entries
get the full treatment. The rest you&amp;rsquo;re on your own with.&lt;/p&gt;
&lt;p&gt;The reasons are grimly ordinary. The
&lt;a class="link" href="https://www.helpnetsecurity.com/2026/06/01/nist-nvd-management-problems/" target="_blank" rel="noopener"
 &gt;Cybersecurity and Infrastructure Security Agency stopped funding the
programme&lt;/a&gt;
in 2024. The enrichment contract lapsed that same February, and despite NIST
having two years&amp;rsquo; notice it needed a replacement, the database limped along
understaffed until late November. And the volume kept climbing regardless:
&lt;a class="link" href="https://jerrygamblin.com/2026/01/01/2025-cve-data-review/" target="_blank" rel="noopener"
 &gt;48,185 CVEs in 2025&lt;/a&gt;,
roughly 131 a day, with forecasts of the annual figure topping 60,000, getting on
for ten times what it was a decade ago. No money, a fumbled handover, and a
firehose. That&amp;rsquo;s the whole story.&lt;/p&gt;
&lt;p&gt;The bit that turns my stomach is what comes next. When a free public good fails,
the gap doesn&amp;rsquo;t stay empty. It gets filled, and it gets filled by people selling
something. There are already commercial vulnerability databases that are better
resourced and more current than the NVD, and the moment the free one is visibly
on the floor, every one of them sees a market. Plenty of those subscriptions cost
more in a year than a small open-source project will see in donations in its
lifetime. So the catalogue the little projects relied on most is exactly the one
about to be priced out of their reach. Vultures circling a carcass, and the
carcass is something we all built on.&lt;/p&gt;
&lt;h2 id="the-number-we-never-checked"&gt;The number we never checked
&lt;/h2&gt;&lt;p&gt;And then I read the part that stopped me blaming everyone else.&lt;/p&gt;
&lt;p&gt;A Department of Commerce Inspector General audit went through the NVD&amp;rsquo;s work and
found that NIST&amp;rsquo;s own severity scores
&lt;a class="link" href="https://therecord.media/nist-mistakes-vulnerability-database-inspector-general" target="_blank" rel="noopener"
 &gt;matched independent assessors only 12% of the
time&lt;/a&gt;.
Read that again. Not that NIST was wrong 88% of the time, that&amp;rsquo;s not quite what
it says, but that two competent parties looking at the same vulnerability landed
on the same severity barely one time in eight. The score was never an objective
fact handed down from on high. It was always an estimate, a judgement call, the
kind of thing reasonable people disagree about most of the time.&lt;/p&gt;
&lt;p&gt;Which is awkward, because I have spent years treating that number as gospel. And
I know I&amp;rsquo;m not alone, because I&amp;rsquo;ve watched whole engineering organisations do the
same thing in writing. More than one large employer I&amp;rsquo;ve had bakes the CVSS score
straight into policy: anything scored 8 or above blocks the build and gets a
meeting, and anything under an 8 goes through at an engineer&amp;rsquo;s discretion. When
time is money, and it always is in those places, &amp;ldquo;it&amp;rsquo;s only a 6.4, ship it&amp;rdquo; is the
easiest decision you&amp;rsquo;ll make all week. I&amp;rsquo;ve made it. I&amp;rsquo;ve made it without opening
the advisory, without checking whether the vulnerable code path was even reachable
in what we&amp;rsquo;d built, on the strength of a single number that, it turns out, two
experts wouldn&amp;rsquo;t have agreed on anyway.&lt;/p&gt;
&lt;p&gt;So before I get cross about the funding, I have to sit with my own part in this.
We took a contestable estimate and bolted it to the door as a gatekeeper. We
turned &amp;ldquo;a rough signal worth a closer look&amp;rdquo; into &amp;ldquo;the closer look&amp;rdquo;, and then we
stopped looking. The database didn&amp;rsquo;t promise us a safety net. We just decided it
was one and stopped checking underneath.&lt;/p&gt;
&lt;h2 id="dont-blame-the-robots-for-this-one"&gt;Don&amp;rsquo;t blame the robots for this one
&lt;/h2&gt;&lt;p&gt;There&amp;rsquo;s an easy villain on offer here, and I want to wave you off it. It would be
tidy to say AI did this, that the flood drowning the NVD is a tide of
machine-generated slop, the same dynamic I wrote about when
&lt;a class="link" href="https://phpboyscout.uk/ai-didnt-kill-curls-bug-bounty/" &gt;curl&amp;rsquo;s bug bounty buckled under unverifiable
reports&lt;/a&gt;. It&amp;rsquo;s
tempting, it&amp;rsquo;s topical, and it&amp;rsquo;s mostly wrong.&lt;/p&gt;
&lt;p&gt;The people who actually crunch the numbers are clear that the surge is largely
&lt;a class="link" href="https://bishopfox.com/blog/understanding-the-cve-ecosystem-and-nists-changing-role" target="_blank" rel="noopener"
 &gt;legitimate growth&lt;/a&gt;.
There are now more than 484 CVE Numbering Authorities, far more organisations
reporting far more bugs far more thoroughly than they did a decade ago. That isn&amp;rsquo;t
a quality collapse, it&amp;rsquo;s the system working as designed and simply getting bigger
than its funding. Pinning it on AI would be scapegoating, and scapegoating the
robots for an underfunding-and-mismanagement problem is just a way of letting the
people who defunded it off the hook.&lt;/p&gt;
&lt;p&gt;None of which means AI gets a free pass. It just isn&amp;rsquo;t the arsonist. The same
machine-assisted discovery tools that found genuine bugs are also forecast to push
CVE volumes
higher still, and yes, one of the tools named in that forecast is the very one I
&lt;a class="link" href="https://phpboyscout.uk/ai-didnt-kill-curls-bug-bounty/" &gt;poked fun at over curl&lt;/a&gt;.
AI is an accelerant on a fire that was already burning for thoroughly human
reasons. It&amp;rsquo;s a beat in this story, not the spine.&lt;/p&gt;
&lt;h2 id="the-version-im-betting-on"&gt;The version I&amp;rsquo;m betting on
&lt;/h2&gt;&lt;p&gt;Where does this leave the working engineer? In a harder spot than before, because
the easy answer stopped being easy. My usual line, the one I keep ending these pieces on, is that
&lt;a class="link" href="https://phpboyscout.uk/nobody-is-coming-to-clean-your-supply-chain/" &gt;the diligence is the
job&lt;/a&gt;:
pin, lock, audit, and read the actual advisory instead of trusting a number. All
of that still holds. But it just got more expensive, because the data underneath
the diligence is thinner and, as it turns out, was shakier than we let ourselves
believe.&lt;/p&gt;
&lt;p&gt;So I&amp;rsquo;m not going to pretend there&amp;rsquo;s a clean fix. This problem won&amp;rsquo;t solve itself,
and it won&amp;rsquo;t be solved by any one of us. It needs all of us to actually support
the services we depend on, with money, with contributions, with attention, so the
public goods that underpin our craft are still standing in ten years. That&amp;rsquo;s the
dull, grown-up part.&lt;/p&gt;
&lt;p&gt;But I&amp;rsquo;ll end this one looking up rather than down, because for once I can. I think
the next few years bend towards safer software almost in spite of us. Modern
languages are quietly closing off whole categories of vulnerability at the source:
every memory-safety bug that a borrow checker refuses to compile is one that never
reaches a database to be mis-scored in the first place, which is rather the point
of building
&lt;a class="link" href="https://phpboyscout.uk/a-framework-that-contains-no-unsafe/" &gt;a framework that contains no &lt;code&gt;unsafe&lt;/code&gt;&lt;/a&gt;.
Used with proper guidance instead of left to spew slop, AI can be a genuine help
finding and triaging the things that do slip through. And the
&lt;a class="link" href="https://phpboyscout.uk/the-greybeards-edge-was-never-typing/" &gt;junior engineers we keep sawing off the bottom
rung&lt;/a&gt; are
exactly the people who, mentored by the greybeards before they retire, could build
the next generation of vulnerability identification that the current model clearly
can&amp;rsquo;t sustain.&lt;/p&gt;
&lt;p&gt;As for the vultures&amp;hellip; it&amp;rsquo;s a coin toss. A lot of firms will look at the NVD on
its back and see a land grab. I&amp;rsquo;d love to be proved an optimist and watch at least
one of them stand tall, take all that better-resourced data and open it to
open-source projects for nothing, because it&amp;rsquo;s the right thing to do and because
the whole industry drinks from that well. One of them doing the decent thing would
be worth more than all the press releases about responsible AI put together.&lt;/p&gt;
&lt;p&gt;The catalogue is wobbling. The number was never as solid as we treated it. Neither
of those is the end of the world, as long as we stop outsourcing our judgement to a
free service we never funded and never checked, and start paying, in every sense,
for the foundations we build on. Boring, unfashionable, and the only thing that
ever works. I think we&amp;rsquo;re up to it.&lt;/p&gt;</description></item><item><title>The greybeards' edge was never typing</title><link>https://phpboyscout.uk/the-greybeards-edge-was-never-typing/</link><pubDate>Wed, 27 May 2026 00:00:00 +0000</pubDate><guid>https://phpboyscout.uk/the-greybeards-edge-was-never-typing/</guid><description>&lt;img src="https://phpboyscout.uk/the-greybeards-edge-was-never-typing/cover-the-greybeards-edge-was-never-typing.png" alt="Featured image of post The greybeards' edge was never typing" /&gt;&lt;p&gt;I have a retirement plan, and it is gloriously low-tech. A cabin, some trees, a
woodstove, and a firm rule that no wifi symbol ever appears within a mile of me
again. I think about it more than is probably healthy.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s a snag, though, and it&amp;rsquo;s the same one the whole industry is currently
pretending it can&amp;rsquo;t see. For me to vanish into the woods, somebody has to be
able to do my job after I&amp;rsquo;ve gone. And right now, collectively, we are working
very hard to make sure nobody can.&lt;/p&gt;
&lt;h2 id="the-boost-and-the-drag"&gt;The boost, and the drag
&lt;/h2&gt;&lt;p&gt;I wrote the other day about how AI made &lt;a class="link" href="https://phpboyscout.uk/ai-didnt-kill-curls-bug-bounty/" &gt;&lt;em&gt;producing&lt;/em&gt; plausible work nearly free
while &lt;em&gt;verifying&lt;/em&gt; it stays expensive and human&lt;/a&gt;.
Point that same lens at a team and something uncomfortable falls out. It isn&amp;rsquo;t
mine; it belongs to Mark Russinovich and Scott Hanselman of Microsoft, who
&lt;a class="link" href="https://dl.acm.org/doi/10.1145/3779312" target="_blank" rel="noopener"
 &gt;laid it out in Communications of the ACM&lt;/a&gt;:
agentic coding tools give a senior engineer an &lt;em&gt;AI boost&lt;/em&gt;, multiplying what
they ship, because a senior has the judgement to steer and verify the output.
The same tools give an early-career engineer an &lt;em&gt;AI drag&lt;/em&gt;, because they don&amp;rsquo;t
have that judgement yet, and the machine hands them far more rope than they can
hold.&lt;/p&gt;
&lt;p&gt;The cold incentive writes itself, and they name it: hire seniors, automate
juniors. It isn&amp;rsquo;t hypothetical, either. Meta
&lt;a class="link" href="https://www.nytimes.com/2026/05/19/technology/meta-layoffs-ai.html" target="_blank" rel="noopener"
 &gt;cut 8,000 roles last week&lt;/a&gt;,
in a round the Times filed under mounting AI casualties. For any single quarter
you care to look at, the maths is impeccable.&lt;/p&gt;
&lt;h2 id="the-bill-is-just-deferred"&gt;The bill is just deferred
&lt;/h2&gt;&lt;p&gt;Here&amp;rsquo;s the line the spreadsheet leaves off. The grindy work a
junior used to cut their teeth on, the small fixes, the boring migrations, the
read-the-stack-trace-and-figure-it-out, is exactly the work AI now does. So the
proving ground is gone. And the entry-level seats where they&amp;rsquo;d have stood on it
are the ones being cut. Squeezed from both ends at once: no reps, and nowhere
to take them.&lt;/p&gt;
&lt;p&gt;Russinovich and Hanselman put the consequence plainly. Without early-career
hiring the talent pipeline collapses, and you arrive at a future with no next
generation of experienced engineers. The seniors you&amp;rsquo;ll be desperate for in
2032 are the juniors you declined to train in 2026. The bill doesn&amp;rsquo;t vanish. It
just falls due long after the people who cut the cheque have moved on.&lt;/p&gt;
&lt;h2 id="how-to-manufacture-a-world-of-ai-slop"&gt;How to manufacture a world of AI slop
&lt;/h2&gt;&lt;p&gt;I named the last piece for its villain; let me name this one&amp;rsquo;s too. Raise a
generation that can &lt;em&gt;produce&lt;/em&gt; with AI but was never taught to &lt;em&gt;validate&lt;/em&gt;, and
here is what you get: people shipping machine-built products at speed with no
instinct for where the output is quietly wrong, because they never had to be
wrong the slow way first. Software nobody genuinely understands, human-written
and AI-written alike, and a steady leak of trust out of all of it.&lt;/p&gt;
&lt;p&gt;That isn&amp;rsquo;t a productivity problem. That&amp;rsquo;s a world of
&lt;a class="link" href="https://phpboyscout.uk/ai-didnt-kill-curls-bug-bounty/" &gt;AI slop&lt;/a&gt;, and not
in one project&amp;rsquo;s inbox this time but everywhere at once. We&amp;rsquo;d have automated our
way clean out of the one job AI cannot do for us: knowing when not to trust the
machine.&lt;/p&gt;
&lt;h2 id="its-a-choice-and-its-yours"&gt;It&amp;rsquo;s a choice, and it&amp;rsquo;s yours
&lt;/h2&gt;&lt;p&gt;Andrew Murphy put it with more bite than I&amp;rsquo;d quite dare:
&lt;a class="link" href="https://andrewmurphy.io/blog/ai-didnt-kill-your-junior-pipeline-you-did" target="_blank" rel="noopener"
 &gt;AI didn&amp;rsquo;t kill your junior pipeline, you did&lt;/a&gt;.
He&amp;rsquo;s right. This isn&amp;rsquo;t weather. Nobody is making you do it. It&amp;rsquo;s a decision,
taken quarter by quarter, and a decision is a thing you can take differently.&lt;/p&gt;
&lt;p&gt;The fix isn&amp;rsquo;t complicated, it&amp;rsquo;s just unfashionable. Keep hiring early-career
engineers. Say out loud that they cost you capacity at first, and treat their
growth as an actual goal rather than something meant to happen by osmosis.
Russinovich and Hanselman call it preceptorship at scale: senior mentorship,
deliberately structured, turning the ordinary day&amp;rsquo;s work into teachable
moments.&lt;/p&gt;
&lt;p&gt;And the proving ground can be rebuilt, just not where it stood. If AI does the
writing now, the apprenticeship moves to the reviewing. Put juniors in the loop
on the machine&amp;rsquo;s output and have them hunt for the subtle wrongness, the way
&lt;a class="link" href="https://phpboyscout.uk/the-security-finding-you-must-not-fix/" &gt;a scanner is an argument, not an order&lt;/a&gt;.
That&amp;rsquo;s how judgement gets built now: not by grinding out the work, but by
verifying it. Which, as luck would have it, is the single most valuable thing
anyone on your team can learn to do.&lt;/p&gt;
&lt;h2 id="the-part-thats-on-the-greybeards"&gt;The part that&amp;rsquo;s on the greybeards
&lt;/h2&gt;&lt;p&gt;This is where I stop letting the companies wear all the blame, because some of
it is mine, and yours. Verification is a craft, and crafts pass from person to
person or not at all. I know where every one of my own AI misfires comes from:
I gave it too little context, or too much rope, and didn&amp;rsquo;t check the result
closely enough. The tool rarely went rogue. The gap was always my diligence.
That&amp;rsquo;s not a confession, it&amp;rsquo;s the curriculum, and it&amp;rsquo;s precisely the judgement
a junior can only earn by sitting in the loop beside someone who has already
made those mistakes.&lt;/p&gt;
&lt;p&gt;So the senior engineer&amp;rsquo;s job has quietly changed underneath us. It was never
really the typing. It was knowing when something is off, and what the customer
actually needs, and now it is also &lt;em&gt;handing that on&lt;/em&gt;, deliberately, while
there&amp;rsquo;s still time to. Mentor and guardian first; fastest prompt in the room a
distant second.&lt;/p&gt;
&lt;h2 id="the-ladder-youre-standing-on"&gt;The ladder you&amp;rsquo;re standing on
&lt;/h2&gt;&lt;p&gt;There will always be something AI can&amp;rsquo;t do well enough, and for a good while
yet it&amp;rsquo;s the thing that matters most: being the accountable human who genuinely
understands what&amp;rsquo;s needed and can be held to it when it goes wrong. A simulation
can be enormously convincing. It cannot be &lt;em&gt;responsible&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Which brings me back to my cabin. I do want it one day, the trees and the
woodstove and the blissful disconnection. But I only get to go if the work
outlives me, and the work only outlives me if the people do. So the last useful
thing my generation does, before we shuffle off to find our trees, isn&amp;rsquo;t
shipping a little more code. It&amp;rsquo;s making sure there&amp;rsquo;s somebody left who can tell
when the machine is wrong. Pull the ladder up behind us and there&amp;rsquo;ll be nobody
to notice the rot, and no cabin quiet enough to make that sit right.&lt;/p&gt;</description></item><item><title>AI didn't kill curl's bug bounty. The bounty did.</title><link>https://phpboyscout.uk/ai-didnt-kill-curls-bug-bounty/</link><pubDate>Tue, 26 May 2026 00:00:00 +0000</pubDate><guid>https://phpboyscout.uk/ai-didnt-kill-curls-bug-bounty/</guid><description>&lt;img src="https://phpboyscout.uk/ai-didnt-kill-curls-bug-bounty/cover-ai-didnt-kill-curls-bug-bounty.png" alt="Featured image of post AI didn't kill curl's bug bounty. The bounty did." /&gt;&lt;p&gt;In January, Daniel Stenberg shut down curl&amp;rsquo;s bug bounty. The headlines wrote
themselves, and they all said the same thing: AI killed it. A flood of
machine-generated slop drowned the maintainers, so they pulled the plug.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s true, as far as it goes. It&amp;rsquo;s also the wrong lesson, and the right one
is sitting in plain sight in the same project, in the same few months.&lt;/p&gt;
&lt;h2 id="volume-without-validation-is-the-attack"&gt;Volume without validation is the attack
&lt;/h2&gt;&lt;p&gt;curl had run its bounty since April 2019. Over its life it paid out
&lt;a class="link" href="https://daniel.haxx.se/blog/2026/01/26/the-end-of-the-curl-bug-bounty/" target="_blank" rel="noopener"
 &gt;more than $100,000 for 87 genuine vulnerabilities&lt;/a&gt;,
a thoroughly good return for one of the most depended-on pieces of software on
the planet. Then the reports stopped being reports. The confirmation rate, the
share of submissions that turned out to be a real bug, had historically sat
north of 15%. By 2025 it was below 5%. Fewer than one in twenty submissions
were worth anything, and the rest still had to be read.&lt;/p&gt;
&lt;p&gt;That last part is the whole problem. A bogus report doesn&amp;rsquo;t announce itself.
Someone has to open it, take it seriously, try to reproduce it, and work out
that it&amp;rsquo;s nonsense, and that someone is a human being with a finite number of
hours and a project to run. Stenberg put it plainly: the slop &amp;ldquo;take[s] a
serious mental toll to manage and sometimes also a long time to debunk.&amp;rdquo; The
submitter spends seconds. The maintainer spends an afternoon. Do that at volume
and it stops being noise and becomes an attack, a denial-of-service aimed not
at curl&amp;rsquo;s servers but at its maintainers&amp;rsquo; attention. No exploit required. Just
plausibility, in bulk.&lt;/p&gt;
&lt;h2 id="the-bounty-was-the-accelerant-not-the-ai"&gt;The bounty was the accelerant, not the AI
&lt;/h2&gt;&lt;p&gt;So far this is the story everyone tells. Here&amp;rsquo;s where I get off the bus.&lt;/p&gt;
&lt;p&gt;The instinct is to blame the AI for the slop. But look at what a bounty actually
is. It&amp;rsquo;s a cash prize, and curl&amp;rsquo;s was priced for the thing it wanted: the hours
and the judgement a skilled human pours into finding a real flaw. That pricing
made complete sense right up until the cost of producing something that &lt;em&gt;looked
like&lt;/em&gt; a finding collapsed to nearly nothing.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s what AI changed. Not the supply of bugs. The supply of plausible-looking
bug reports. Put a cash prize on &amp;ldquo;looks like a finding&amp;rdquo;, then make &amp;ldquo;looks like a
finding&amp;rdquo; free to generate, and you haven&amp;rsquo;t got a bug bounty any more. You&amp;rsquo;ve got
a slot machine. Stenberg said he&amp;rsquo;d started to sense &amp;ldquo;a bad faith attitude&amp;rdquo; in
the reports, and of course he had. The incentive was openly inviting it.&lt;/p&gt;
&lt;p&gt;So the death spiral was structural, not bad luck. The moment generating
plausible reports went free, any cash bounty became a magnet for spray-and-pray,
and the only open questions were how fast it would rot and whether you&amp;rsquo;d close
the programme or just let the rewards quietly wither. The AI was the match. The
bounty was the petrol. We have been pointing at the wrong one.&lt;/p&gt;
&lt;h2 id="the-proof-curl-turned-around-and-hired-the-ai"&gt;The proof: curl turned around and hired the AI
&lt;/h2&gt;&lt;p&gt;If AI were really the villain here, you&amp;rsquo;d expect curl to have slammed the door
on it. It did the opposite.&lt;/p&gt;
&lt;p&gt;In the same stretch, &lt;a class="link" href="https://aisle.com/blog/curl-adopts-aisle-after-its-ai-agents-discovered-5-cves" target="_blank" rel="noopener"
 &gt;by AISLE&amp;rsquo;s own account&lt;/a&gt;,
an AI security platform contributed 24 pull requests to curl, five of which
earned CVEs, and the project now runs it internally for continuous review. The
same tooling reportedly found &lt;a class="link" href="https://www.lesswrong.com/posts/7aJwgbMEiKq5egQbd/" target="_blank" rel="noopener"
 &gt;all twelve zero-days&lt;/a&gt;
in an OpenSSL release in late January. (Both of those are the tool-makers&amp;rsquo; and a
third party&amp;rsquo;s numbers rather than curl&amp;rsquo;s audited figures, so weigh them as such.
But curl adopting the thing isn&amp;rsquo;t a claim. It&amp;rsquo;s a decision.)&lt;/p&gt;
&lt;p&gt;Sit with the shape of that. curl shut down strangers being paid for AI-shaped
noise, and in the same breath put AI to work as a tool its own maintainers
drive. The two moves look contradictory only if you think &amp;ldquo;AI&amp;rdquo; is a single thing
with a single verdict attached. It isn&amp;rsquo;t. Pointed at the problem by people
accountable for the result, with no prize to farm, it found real bugs. Dangled
in front of anonymous strangers chasing a payout, it produced sand.&lt;/p&gt;
&lt;h2 id="the-tell-is-which-ai-curl-kept-and-which-it-mocked"&gt;The tell is which AI curl kept, and which it mocked
&lt;/h2&gt;&lt;p&gt;Stenberg drew that line about as sharply as a person can. When Anthropic put its
security model, Mythos, in front of curl this spring, it
&lt;a class="link" href="https://daniel.haxx.se/blog/2026/05/11/mythos-finds-a-curl-vulnerability/" target="_blank" rel="noopener"
 &gt;scanned 176,000 lines of C and surfaced a single flaw&lt;/a&gt;,
and Stenberg called the surrounding fanfare
&lt;a class="link" href="https://www.theregister.com/security/2026/05/11/anthropics-bug-hunting-mythos-was-greatest-marketing-stunt-ever-says-curl-creator/5238111" target="_blank" rel="noopener"
 &gt;the greatest marketing stunt he&amp;rsquo;d seen&lt;/a&gt;.
Same maintainer. Adopts one AI, rubbishes another.&lt;/p&gt;
&lt;p&gt;The deciding factor was never whether the thing was AI. Both were. It was
whether the output survived a human checking it, and whether you could check it
at all. AISLE handed over pull requests and CVEs you could read and merge.
Mythos arrived as a closed model and a press release, which is to say a claim
the community has no way to independently test.&lt;/p&gt;
&lt;p&gt;My bias, up front, because it runs the opposite way to what you&amp;rsquo;d expect from
someone writing this: I&amp;rsquo;m a paying Claude subscriber and I lean on Anthropic&amp;rsquo;s
models every working day, the one behind the spadework for this post included.
I&amp;rsquo;m an advocate, not a sceptic, and AI genuinely has its place. That is
&lt;em&gt;exactly&lt;/em&gt; why the Mythos fanfare grates. Overselling a closed model to get out
ahead of the competition, when the one test the public got to see turned up a
single bug, is the sort of thing that chips away at trust in all of it. A result
you can&amp;rsquo;t verify is marketing until proven otherwise, whoever&amp;rsquo;s logo is on the
slide, and I&amp;rsquo;d rather the tools I depend on didn&amp;rsquo;t stoop to it.&lt;/p&gt;
&lt;h2 id="the-cheap-half-and-the-expensive-half"&gt;The cheap half and the expensive half
&lt;/h2&gt;&lt;p&gt;Pull back from curl for a moment, because the lesson isn&amp;rsquo;t really about bounties
at all. Anyone who works with these tools every day knows the same thing: when
they go wrong, it&amp;rsquo;s rarely the model running off on its own. It&amp;rsquo;s the context it
wasn&amp;rsquo;t given, the rope it was handed, the output nobody checked closely enough.
The failure sits on the human side of the keyboard, at the one step that&amp;rsquo;s
easiest to skip, which is verification.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s the pattern curl hit at the scale of an ecosystem. AI made one thing
nearly free: producing work that looks right. It did not make the other thing a
penny cheaper: confirming the work &lt;em&gt;is&lt;/em&gt; right. That cost still falls, in full,
on a person. (A scanner, &lt;a class="link" href="https://phpboyscout.uk/the-security-finding-you-must-not-fix/" &gt;I&amp;rsquo;ve argued before&lt;/a&gt;,
is an argument, not an order; the same goes double for a model.) The bounty&amp;rsquo;s
fatal mistake was paying for the cheap half and quietly assuming it had bought
the expensive one. The same trap waits in code review, in hiring, in CVs read by
machines, but that&amp;rsquo;s a bigger argument for another post.&lt;/p&gt;
&lt;h2 id="pouring-sand-into-the-machine"&gt;Pouring sand into the machine
&lt;/h2&gt;&lt;p&gt;curl didn&amp;rsquo;t capitulate to AI, whatever the headlines decided. It stopped paying
for the worthless half and started using the valuable half, and it had the
discernment to tell a useful tool from a press release while it did so.&lt;/p&gt;
&lt;p&gt;The bounty wasn&amp;rsquo;t a casualty of artificial intelligence. It was a structure
that, the instant plausible output became free, could only fill with sand.
Stenberg said he hopes closing it stops &amp;ldquo;more people pouring sand into the
machine.&amp;rdquo; Reading the last year of his inbox, I think he&amp;rsquo;ll get his wish. The
sand was only ever there because somebody left a bucket of money beside the
funnel.&lt;/p&gt;</description></item><item><title>Building a CLI with go-tool-base, part 4: an AI dungeon master</title><link>https://phpboyscout.uk/building-a-cli-with-go-tool-base-part-4/</link><pubDate>Sat, 23 May 2026 00:00:00 +0000</pubDate><guid>https://phpboyscout.uk/building-a-cli-with-go-tool-base-part-4/</guid><description>&lt;img src="https://phpboyscout.uk/building-a-cli-with-go-tool-base-part-4/cover-building-a-cli-with-go-tool-base-part-4.png" alt="Featured image of post Building a CLI with go-tool-base, part 4: an AI dungeon master" /&gt;&lt;p&gt;I run a Dungeons &amp;amp; Dragons game on the odd weekend, so when I sat down to put an
AI feature inside a CLI, my first instinct wasn&amp;rsquo;t a chatbot. It was: could the
tool run a little adventure, with an AI as the dungeon master? It turns out that&amp;rsquo;s
a near-perfect way to learn the chat client, because the thing that makes a game
trustworthy, rules the players can&amp;rsquo;t break, is exactly the thing that makes any AI
feature trustworthy. So this part builds &lt;code&gt;mytool adventure&lt;/code&gt;: a tiny dungeon you
play in your terminal, narrated by an AI that is firmly on a leash.&lt;/p&gt;
&lt;p&gt;&lt;a class="link" href="https://phpboyscout.uk/building-a-cli-with-go-tool-base-part-3/" &gt;Part 3&lt;/a&gt;
pointed AI at your CLI from the outside (an agent driving your commands over MCP).
This part goes the other way: AI inside your tool, as a feature you write. The
worry everyone has about that is fair, AI output is unpredictable, and a CLI is
meant to be dependable. The whole lesson here is how you square those two: you
don&amp;rsquo;t hope the model behaves, you box it in with rules it can&amp;rsquo;t escape and
mechanics it doesn&amp;rsquo;t get to invent.&lt;/p&gt;
&lt;p&gt;As before, this is written against &lt;strong&gt;go-tool-base v0.6.0&lt;/strong&gt; (&lt;code&gt;gtb version&lt;/code&gt;).&lt;/p&gt;
&lt;h2 id="behind-the-dm-screen"&gt;Behind the DM screen
&lt;/h2&gt;&lt;p&gt;A turn of our game looks like this: the player types what they want to do, the AI
dungeon master narrates what happens and offers a few choices, and round it goes
until the adventure reaches an end. The trick is where the truth lives. The model&amp;rsquo;s
job is the prose, and only the prose. Everything else is yours:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;The rules&lt;/strong&gt; live in the system prompt: what the DM may and may not do.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The mechanics&lt;/strong&gt; live in Go functions the model calls as tools (dice, combat).
It never makes a number up.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The state&lt;/strong&gt; lives in a Go struct you hand the model fresh every turn, so it
never has to remember, and can&amp;rsquo;t quietly rewrite history.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The shape of each turn&lt;/strong&gt; is a typed Go struct the model fills in, so your code
always gets back something it can render, never a wall of prose to parse.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Two go-tool-base capabilities do the heavy lifting: the AI
&lt;a class="link" href="https://phpboyscout.uk/letting-the-ai-call-your-go-functions/" &gt;calling your Go functions&lt;/a&gt;,
and the AI
&lt;a class="link" href="https://phpboyscout.uk/stop-regexing-the-llms-prose/" &gt;handing back a typed struct&lt;/a&gt;
instead of text you have to regex. The game is just a fun excuse to use both at
once.&lt;/p&gt;
&lt;h2 id="wiring-a-provider"&gt;Wiring a provider
&lt;/h2&gt;&lt;p&gt;The chat client (&lt;code&gt;pkg/chat&lt;/code&gt;) is a library you import; you don&amp;rsquo;t need any special
feature flag for it. It does need an API key, and it&amp;rsquo;ll find one from a few places.
The simplest, for now, is the well-known environment variable for your provider:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nb"&gt;export&lt;/span&gt; &lt;span class="nv"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;sk-ant-...&amp;#34;&lt;/span&gt; &lt;span class="c1"&gt;# or GEMINI_API_KEY, OPENAI_API_KEY&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;That&amp;rsquo;s the bottom of the client&amp;rsquo;s lookup chain, which is fine for playing locally.
For a tool you actually ship, go-tool-base has the &lt;code&gt;ai&lt;/code&gt; feature and its &lt;code&gt;mytool init&lt;/code&gt; wizard (the same initialiser system from
&lt;a class="link" href="https://phpboyscout.uk/building-a-cli-with-go-tool-base-part-2/" &gt;part 2&lt;/a&gt;)
to store the key properly, and there&amp;rsquo;s a whole post on
&lt;a class="link" href="https://phpboyscout.uk/where-should-a-cli-keep-your-api-keys/" &gt;where a CLI should keep your keys&lt;/a&gt;.
For learning the client, an env var is plenty.&lt;/p&gt;
&lt;h2 id="scaffold-the-command"&gt;Scaffold the command
&lt;/h2&gt;&lt;p&gt;You know this step from part 1:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;gtb generate &lt;span class="nb"&gt;command&lt;/span&gt; --name adventure --short &lt;span class="s2"&gt;&amp;#34;Play a dungeon adventure&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Everything below goes in the &lt;code&gt;RunAdventure&lt;/code&gt; function the generator left you in
&lt;code&gt;pkg/cmd/adventure/main.go&lt;/code&gt;, plus a couple of types and helpers in the same
package.&lt;/p&gt;
&lt;h2 id="the-state-is-yours-not-the-models"&gt;The state is yours, not the model&amp;rsquo;s
&lt;/h2&gt;&lt;p&gt;Start with the truth. The game state is a plain Go struct that you own. The model
never holds it; instead you hand it the current state at the top of every turn
(more on that in the loop). This is the part to grow: start small, then add rooms,
items, NPCs, quest flags, whatever your adventure needs. Nothing else in the design
has to change when you do.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;// GameState is the single source of truth for the game. Extend it freely.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;GameState&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;PlayerHP&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;Location&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;Inventory&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;Foes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;// foe name -&amp;gt; remaining hit points&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;// summary renders the state into a line the model is given each turn.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;g&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;GameState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;foes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;g&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Foes&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;hp&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;range&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;g&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Foes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;foes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;foes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;fmt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#34;%s (%d HP)&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;hp&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;fmt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#34;You have %d HP, at %s, carrying %s. Foes: %s.&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;g&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;PlayerHP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;g&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Location&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;strings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;g&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Inventory&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;, &amp;#34;&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;strings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;foes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;, &amp;#34;&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;And the shape of a turn, the thing the model has to produce:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Turn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;Narration&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;`json:&amp;#34;narration&amp;#34;`&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;Choices&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;`json:&amp;#34;choices&amp;#34;`&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;GameOver&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;`json:&amp;#34;game_over&amp;#34;`&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="the-dungeon-masters-tools"&gt;The dungeon master&amp;rsquo;s tools
&lt;/h2&gt;&lt;p&gt;A tool in &lt;code&gt;pkg/chat&lt;/code&gt; is a &lt;code&gt;chat.Tool&lt;/code&gt;: a name, a description the model reads to
decide when to use it, a parameter schema, and a handler. The handler gets the
model&amp;rsquo;s arguments as raw JSON and returns any value (which the framework JSON-encodes
back to the model) or an error.&lt;/p&gt;
&lt;p&gt;The simplest possible one is a die roll. This is the canonical &amp;ldquo;give the model
something it&amp;rsquo;s bad at&amp;rdquo; tool, because language models cannot be trusted to roll
fairly or even add up:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;rollTool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Tool&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;roll&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;Description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;Roll a die with the given number of sides; returns 1..sides.&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="c1"&gt;// Use an anonymous struct so the schema&amp;#39;s properties sit at the top level,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="c1"&gt;// which is where SetTools looks. A named type would hide them behind a $ref.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;Parameters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;jsonschema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Reflect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="nx"&gt;Sides&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;`json:&amp;#34;sides&amp;#34; jsonschema:&amp;#34;description=number of sides on the die&amp;#34;`&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="p"&gt;}{}),&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;Handler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;_&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;RawMessage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;				&lt;/span&gt;&lt;span class="nx"&gt;Sides&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;`json:&amp;#34;sides&amp;#34;`&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Unmarshal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;				&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Sides&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;				&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Sides&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;rand&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Intn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Sides&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;That comment about the anonymous struct matters, by the way. Reflect a named
type and &lt;code&gt;jsonschema&lt;/code&gt; emits a top-level reference with the real fields tucked
inside, and the tool ships with no parameters at all. An anonymous struct inlines
them where the framework expects. It&amp;rsquo;s the one sharp edge in the whole exercise.&lt;/p&gt;
&lt;p&gt;Combat is where state actually changes, so combat is a tool too. Note it takes the
foe by name and looks it up in &lt;code&gt;Foes&lt;/code&gt;, so it works for the goblin and for any
creature you add later, without touching this function:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;attackTool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;game&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;GameState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Tool&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;attack&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;Description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;Resolve the player&amp;#39;s attack on a named foe. Rolls to hit, applies damage.&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;Parameters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;jsonschema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Reflect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="nx"&gt;Target&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;`json:&amp;#34;target&amp;#34; jsonschema:&amp;#34;description=the name of the foe being attacked&amp;#34;`&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="p"&gt;}{}),&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;Handler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;_&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;RawMessage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;				&lt;/span&gt;&lt;span class="nx"&gt;Target&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;`json:&amp;#34;target&amp;#34;`&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Unmarshal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;				&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="nx"&gt;hp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;game&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Foes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Target&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;				&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="kt"&gt;any&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;&amp;#34;error&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;no such foe: &amp;#34;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Target&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;rand&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Intn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;				&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="kt"&gt;any&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;&amp;#34;hit&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;foe&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Target&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="nx"&gt;dmg&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;rand&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Intn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="nx"&gt;hp&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;dmg&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;hp&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;				&lt;/span&gt;&lt;span class="nx"&gt;hp&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="nx"&gt;game&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Foes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Target&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;hp&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="kt"&gt;any&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;				&lt;/span&gt;&lt;span class="s"&gt;&amp;#34;hit&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;foe&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Target&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;damage&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;dmg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;				&lt;/span&gt;&lt;span class="s"&gt;&amp;#34;foe_hp&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;hp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;defeated&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;hp&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;A bad target comes back as a plain error string, which the framework hands to the
model so it can recover (apologise, pick a real foe) rather than crash.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s the whole tool set, and there&amp;rsquo;s deliberately nothing here for reading the
state. The model never fetches it. Instead the loop hands it the current state at
the top of every turn, which we wire up shortly. A language model has no memory you
can rely on, so rather than trust it to remember the fight, you give it the truth
each time.&lt;/p&gt;
&lt;h2 id="the-turn-is-a-tool-too"&gt;The turn is a tool too
&lt;/h2&gt;&lt;p&gt;Here&amp;rsquo;s the neat part. The chat client won&amp;rsquo;t let a single call both run tools and
return a typed struct, they&amp;rsquo;re separate modes. So instead of asking for the struct
afterwards, we make submitting the turn into a tool of its own. The dungeon master ends its
turn by calling &lt;code&gt;submit_turn&lt;/code&gt;, and its handler captures the typed &lt;code&gt;Turn&lt;/code&gt; into a
variable we hold:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;submitTurnTool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;out&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;Turn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Tool&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;submit_turn&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;Description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;End your turn. Call this exactly once, last, with the turn&amp;#39;s outcome.&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;Parameters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;jsonschema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Reflect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="nx"&gt;Narration&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;`json:&amp;#34;narration&amp;#34; jsonschema:&amp;#34;description=two-sentence narration of what just happened&amp;#34;`&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="nx"&gt;Choices&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;`json:&amp;#34;choices&amp;#34; jsonschema:&amp;#34;description=the actions the player may take next&amp;#34;`&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="nx"&gt;GameOver&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;`json:&amp;#34;game_over&amp;#34; jsonschema:&amp;#34;description=true only if the game has ended&amp;#34;`&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="p"&gt;}{}),&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;Handler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;_&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;RawMessage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Unmarshal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;out&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;				&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;turn recorded&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;So the turn&amp;rsquo;s structure is enforced by a schema, same as any other tool&amp;rsquo;s
parameters. Your loop gets a populated &lt;code&gt;Turn&lt;/code&gt; every round, never prose.&lt;/p&gt;
&lt;h2 id="the-rules"&gt;The rules
&lt;/h2&gt;&lt;p&gt;This is where you bound the model. The system prompt is the rulebook, and it leans
hard on the tools so the DM has no room to freelance the mechanics:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kd"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;dmRules&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;`You are the dungeon master of a short fantasy adventure. Each turn
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;you are given the current game state and the player&amp;#39;s action.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;Resolve the action and end the turn:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- If the player attacks, you MUST call the attack tool with the foe&amp;#39;s name to
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt; resolve it. Do not decide the hit or the damage yourself.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- For any other chance event, call the roll tool and use its result.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- For simple actions, just narrate them.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- Then call submit_turn exactly once: a two-sentence narration, two or three
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt; choices, and game_over.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;Trust the state you are given; never contradict it. A foe at 0 hit points is dead
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;and stays dead. The game ends when the player&amp;#39;s hit points reach 0 (they lose), or
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;when the player reaches a satisfying ending. When it ends, set game_over and narrate
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;the finish.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;Keep the tone light and quick.`&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Two of those lines carry the weight. Trusting the state you are given, and never
contradicting it, is what keeps the world consistent: the state is handed in fresh
every turn (the next section), so the model works from the truth instead of from a
memory it does not reliably have. And &lt;code&gt;you MUST call the attack tool&lt;/code&gt; is what stops
it quietly deciding hits and damage itself when it would rather just narrate. Those
two are the difference between a game with rules and a model telling a story.&lt;/p&gt;
&lt;h2 id="the-loop"&gt;The loop
&lt;/h2&gt;&lt;p&gt;Now stitch it together. Create the client with the rules as its system prompt,
register the tools once, and run a turn each time the player acts:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;RunAdventure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Props&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;opts&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;AdventureOptions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;game&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;GameState&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;PlayerHP&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;Location&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;the mouth of a damp cave&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;Inventory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;&amp;#34;a short sword&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;a guttering torch&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;Foes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;&amp;#34;goblin&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;turn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Turn&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;New&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;SystemPrompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;dmRules&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;MaxSteps&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;// roll/attack, then submit_turn&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;SetTools&lt;/span&gt;&lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nf"&gt;rollTool&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nf"&gt;attackTool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;game&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nf"&gt;submitTurnTool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;action&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;I step into the cave.&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;turn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Turn&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="c1"&gt;// Hand the model the current truth, then the player&amp;#39;s action.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;fmt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#34;State: %s\nThe player: %s&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;game&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;action&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;fmt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#34;\n&amp;#34;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Narration&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;GameOver&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;action&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;chooseAction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Choices&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The same &lt;code&gt;client&lt;/code&gt; runs every turn, so the conversation and the tools carry through
the whole game; and the &lt;code&gt;State:&lt;/code&gt; line you prepend is always current, because the
&lt;code&gt;attack&lt;/code&gt; tool mutated &lt;code&gt;game&lt;/code&gt; last turn. The model is never trusted to remember,
only to narrate.&lt;/p&gt;
&lt;h2 id="let-the-player-off-the-menu"&gt;Let the player off the menu
&lt;/h2&gt;&lt;p&gt;The one helper I glossed is &lt;code&gt;chooseAction&lt;/code&gt;. A bare &lt;code&gt;fmt.Scanln&lt;/code&gt; would do, but we can
do much better with almost no effort, and make a point while we&amp;rsquo;re at it. The
framework already leans on Charm&amp;rsquo;s &lt;a class="link" href="https://github.com/charmbracelet/huh" target="_blank" rel="noopener"
 &gt;huh&lt;/a&gt; for
its &lt;code&gt;init&lt;/code&gt; wizard, you met it in
&lt;a class="link" href="https://phpboyscout.uk/building-a-cli-with-go-tool-base-part-2/" &gt;part 2&lt;/a&gt;,
so we&amp;rsquo;ll use the same library for a proper menu, with one deliberate addition:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;chooseAction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;other&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;__other__&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;opts&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="nx"&gt;huh&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Option&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;range&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;opts&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;huh&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;NewOption&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;opts&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;huh&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;NewOption&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#34;Something else...&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;other&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;pick&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;custom&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;form&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;huh&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;NewForm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;huh&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;NewGroup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="nx"&gt;huh&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;NewSelect&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]().&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;				&lt;/span&gt;&lt;span class="nf"&gt;Title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#34;What do you do?&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;				&lt;/span&gt;&lt;span class="nf"&gt;Options&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;opts&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;				&lt;/span&gt;&lt;span class="nf"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;pick&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="c1"&gt;// A second step that only appears when the player chose &amp;#34;Something else&amp;#34;.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;huh&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;NewGroup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="nx"&gt;huh&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;NewInput&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;				&lt;/span&gt;&lt;span class="nf"&gt;Title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#34;Describe your action&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;				&lt;/span&gt;&lt;span class="nf"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;custom&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;WithHideFunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;pick&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;other&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}),&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;form&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Run&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;pick&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;other&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;custom&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;pick&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The select gives the player a tidy arrow-key menu instead of typing a number, but
the addition that earns its keep is the last option. &amp;ldquo;Something else&amp;hellip;&amp;rdquo; is always
there, and choosing it unfolds a second step (huh shows or hides a group with
&lt;code&gt;WithHideFunc&lt;/code&gt;) where the player types whatever they actually want to do. That free
text goes straight to the dungeon master as the next turn&amp;rsquo;s input, and because the
DM is an AI bound by the rules rather than a switch statement over three fixed
choices, it just copes. Bargain with the goblin, search your pockets, set the cave
alight: the model narrates it within the rules you gave it, rolling and applying
damage through the same tools. That is the agency a scripted game can&amp;rsquo;t offer, and
it&amp;rsquo;s the natural place to start building your own richer interactivity on top.&lt;/p&gt;
&lt;h2 id="play-it"&gt;Play it
&lt;/h2&gt;&lt;p&gt;Set your key, build, and go:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nb"&gt;export&lt;/span&gt; &lt;span class="nv"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;sk-ant-...&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;just build
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;./bin/mytool adventure
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;A turn looks like this (your wording will differ every time; the mechanics won&amp;rsquo;t):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;You swing your short sword at the goblin, the blade whistling through the damp cave
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;air. The creature snarls as it tries to dodge your blow.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;What do you do?
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&amp;gt; Attack the goblin again
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; Try to push deeper into the cave
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; Retreat to the entrance
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; Something else...
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Your blade whistles through the air, but the nimble goblin dances back just in
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;time. It lunges forward with a rusty dagger in return, yet its clumsy strike only
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;finds empty air.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;What do you do?
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&amp;gt; Swing your sword again!
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; Try to intimidate the creature
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; Retreat from the cave
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; Something else...
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Behind that, the dungeon master called &lt;code&gt;attack&lt;/code&gt; each turn (a hit, then a miss), the
goblin&amp;rsquo;s hit points changed in the &lt;code&gt;GameState&lt;/code&gt; you own, and the next turn handed
that updated state straight back to the model. The prose is the model&amp;rsquo;s; every
number is yours.&lt;/p&gt;
&lt;h2 id="the-pattern-under-the-game"&gt;The pattern under the game
&lt;/h2&gt;&lt;p&gt;Strip the dungeon away and you&amp;rsquo;re left with the thing worth keeping. An AI feature
you can ship is one where you&amp;rsquo;ve kept the model away from everything that has to be
right: the &lt;strong&gt;rules&lt;/strong&gt; live in the system prompt, the &lt;strong&gt;mechanics&lt;/strong&gt; in typed Go tools
the model must call, the &lt;strong&gt;state&lt;/strong&gt; in a struct you hand it fresh each turn rather
than trust it to remember, and the &lt;strong&gt;output&lt;/strong&gt; in a struct it fills in rather than
free text. Do that and the model&amp;rsquo;s unpredictability is confined to exactly where you
want it, the wording, and walled out of everywhere you don&amp;rsquo;t, the maths, the state,
the shape of the result.&lt;/p&gt;
&lt;p&gt;Two honest limits worth knowing. There&amp;rsquo;s no
&lt;a class="link" href="https://platform.claude.com/docs/en/about-claude/glossary#temperature" target="_blank" rel="noopener"
 &gt;temperature&lt;/a&gt;
dial on the client (the setting that would let you turn the model&amp;rsquo;s randomness
down), so you can&amp;rsquo;t make the prose reproducible; you make the mechanics
reproducible instead, which for most features is what you actually needed. And a tool calling loop is
several round-trips to the model per turn, so it&amp;rsquo;s not free, keep &lt;code&gt;MaxSteps&lt;/code&gt; tight
for anything interactive.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s the foundation, and the state struct is already sized for more than one
fight: it carries a location, an inventory and a map of foes you&amp;rsquo;ve barely touched.
Add a &lt;code&gt;move&lt;/code&gt; tool that updates &lt;code&gt;Location&lt;/code&gt;, a &lt;code&gt;use_item&lt;/code&gt; tool that reaches into
&lt;code&gt;Inventory&lt;/code&gt;, a second creature in &lt;code&gt;Foes&lt;/code&gt;, even a &lt;code&gt;give_quest&lt;/code&gt; flag, and the
adventure grows without the architecture changing. The model just gets more tools
to call and more truth to read. Saved games come nearly free, too: the client can
snapshot and resume a conversation. Next part leaves AI behind and gets the tool
ready to look after itself: shipping signed self-updates, so a new release reaches
your users safely. Until then, go explore the cave.&lt;/p&gt;</description></item><item><title>Building a CLI with go-tool-base, part 3: expose your CLI to AI agents</title><link>https://phpboyscout.uk/building-a-cli-with-go-tool-base-part-3/</link><pubDate>Fri, 22 May 2026 00:00:00 +0000</pubDate><guid>https://phpboyscout.uk/building-a-cli-with-go-tool-base-part-3/</guid><description>&lt;img src="https://phpboyscout.uk/building-a-cli-with-go-tool-base-part-3/cover-building-a-cli-with-go-tool-base-part-3.png" alt="Featured image of post Building a CLI with go-tool-base, part 3: expose your CLI to AI agents" /&gt;&lt;p&gt;&amp;ldquo;Make it work with AI&amp;rdquo; is the request that lands on your desk with a thud and no
further detail. The first time it landed on mine I braced for a treadmill of
integration work: an adapter for this assistant, a wrapper for that one, one per
client, forever. Then I looked at the &lt;code&gt;hello&lt;/code&gt; command we built back in
&lt;a class="link" href="https://phpboyscout.uk/building-a-cli-with-go-tool-base-part-1/" &gt;part 1&lt;/a&gt;.
It has a name, a one-line description, and (once you give it some) typed,
documented flags. That is exactly the shape an AI agent needs to call a tool.
You already did the hard part.&lt;/p&gt;
&lt;p&gt;This part wires that up: turning the CLI you&amp;rsquo;ve been building into something an
AI assistant can drive, with no AI code of your own. The how-it-works behind it
is in &lt;a class="link" href="https://phpboyscout.uk/your-cli-is-already-an-ai-tool/" &gt;your CLI is already an AI tool&lt;/a&gt;;
here we just use it.&lt;/p&gt;
&lt;p&gt;A version note, as in the earlier parts: this is written against
&lt;strong&gt;go-tool-base v0.6.0&lt;/strong&gt; (&lt;code&gt;gtb version&lt;/code&gt;). The tool is young and moving, so if
you&amp;rsquo;re on a newer release and a command or its output has shifted, check there
first.&lt;/p&gt;
&lt;h2 id="the-translator-you-already-have"&gt;The translator you already have
&lt;/h2&gt;&lt;p&gt;An AI agent that wants to call local tools needs three things: a list of named
operations, a description of each so it knows when to reach for them, and a typed
parameter schema for each so it knows how to call them. A good CLI is already all
three. The only missing piece is a translator between &amp;ldquo;this is a CLI&amp;rdquo; and &amp;ldquo;this
is a set of tools an AI can call&amp;rdquo;, and that translator is the
&lt;a class="link" href="https://modelcontextprotocol.io/" target="_blank" rel="noopener"
 &gt;Model Context Protocol&lt;/a&gt; (MCP), an open standard
every serious assistant now speaks.&lt;/p&gt;
&lt;p&gt;Your tool already ships it. &lt;code&gt;mcp&lt;/code&gt; is one of the default features, so it&amp;rsquo;s been in
your binary since you scaffolded in part 1, no flag required. Check:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;./bin/mytool mcp --help
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You&amp;rsquo;ll see subcommands you never wrote. The rest of this part is just three of
them.&lt;/p&gt;
&lt;h2 id="see-what-the-agent-sees"&gt;See what the agent sees
&lt;/h2&gt;&lt;p&gt;Before you connect anything, look at what your tool would expose. This writes the
tool definitions to a file:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;./bin/mytool mcp tools
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-json" data-lang="json"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;tools&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;name&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;mytool_hello&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;description&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;Say hello&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;inputSchema&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;type&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;object&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;properties&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;That&amp;rsquo;s your &lt;code&gt;hello&lt;/code&gt; command, seen from an agent&amp;rsquo;s side of the glass. The name is
your tool&amp;rsquo;s name and the command path joined with an underscore; the description
is the &lt;code&gt;Short&lt;/code&gt; you gave it in part 1; the &lt;code&gt;inputSchema&lt;/code&gt; is empty because &lt;code&gt;hello&lt;/code&gt;
has no flags yet. Add a flag and it shows up here as a property, with the type and
help text you already wrote. There&amp;rsquo;s no second schema to keep in sync, because the
command tree is the schema.&lt;/p&gt;
&lt;p&gt;A few things are deliberately left off this list: hidden and deprecated commands,
pure command groups that don&amp;rsquo;t do anything themselves, and the &lt;code&gt;mcp&lt;/code&gt;, &lt;code&gt;help&lt;/code&gt; and
&lt;code&gt;completion&lt;/code&gt; plumbing. So &lt;code&gt;mcp tools&lt;/code&gt; doubles as an audit: it&amp;rsquo;s exactly what an
agent can reach, and nothing else.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Exporting the tool definitions with mcp tools" class="gallery-image" data-flex-basis="450px" data-flex-grow="187" height="640" loading="lazy" sizes="(max-width: 767px) calc(100vw - 30px), (max-width: 1023px) 700px, (max-width: 1279px) 950px, 1232px" src="https://phpboyscout.uk/building-a-cli-with-go-tool-base-part-3/demo-mcp-tools.gif" width="1200"&gt;
&lt;/p&gt;
&lt;h2 id="run-the-server"&gt;Run the server
&lt;/h2&gt;&lt;p&gt;One command turns the whole thing on:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;./bin/mytool mcp start
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;It doesn&amp;rsquo;t print a banner and march off doing things. It sits quietly, speaking
MCP as JSON-RPC over standard input and output, waiting for an assistant to talk
to it. You won&amp;rsquo;t run this by hand much; the assistant launches it for you. But
it&amp;rsquo;s worth knowing what happens when the agent calls one of your tools: the server
re-runs your own binary with the arguments the agent supplied, captures the output,
and hands it back. The agent isn&amp;rsquo;t poking at your internals. It&amp;rsquo;s running
&lt;code&gt;mytool hello&lt;/code&gt;, the same command a human would, and getting the same result.&lt;/p&gt;
&lt;h2 id="point-an-assistant-at-it"&gt;Point an assistant at it
&lt;/h2&gt;&lt;p&gt;The quickest way is to let the tool write the client config for you. For Claude
Desktop:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;./bin/mytool mcp claude &lt;span class="nb"&gt;enable&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;There are &lt;code&gt;cursor&lt;/code&gt; and &lt;code&gt;vscode&lt;/code&gt; variants too. Restart the assistant and your CLI
is in its toolbox.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;d rather wire it by hand (or your client isn&amp;rsquo;t one of those three), the
config is small. Point the client at your binary with &lt;code&gt;mcp start&lt;/code&gt; as its
arguments, using the absolute path:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-json" data-lang="json"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;mcpServers&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;mytool&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;command&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;/absolute/path/to/bin/mytool&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nt"&gt;&amp;#34;args&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;mcp&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;start&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Claude Desktop keeps that in &lt;code&gt;claude_desktop_config.json&lt;/code&gt; (under
&lt;code&gt;~/Library/Application Support/Claude/&lt;/code&gt; on macOS, &lt;code&gt;%APPDATA%\Claude\&lt;/code&gt; on Windows);
Cursor uses &lt;code&gt;~/.cursor/mcp.json&lt;/code&gt;; VS Code&amp;rsquo;s Copilot reads
&lt;code&gt;github.copilot.mcpServers&lt;/code&gt; in your settings. The shape is the same everywhere.
From here, ask the assistant to say hello and watch it call &lt;code&gt;mytool_hello&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Wiring the tool into an assistant with mcp claude enable" class="gallery-image" data-flex-basis="411px" data-flex-grow="171" height="700" loading="lazy" sizes="(max-width: 767px) calc(100vw - 30px), (max-width: 1023px) 700px, (max-width: 1279px) 950px, 1232px" src="https://phpboyscout.uk/building-a-cli-with-go-tool-base-part-3/demo-mcp-enable.gif" width="1200"&gt;
&lt;/p&gt;
&lt;h2 id="the-agents-reach-is-exactly-your-clis-reach"&gt;The agent&amp;rsquo;s reach is exactly your CLI&amp;rsquo;s reach
&lt;/h2&gt;&lt;p&gt;This is the part worth being calm about. Exposing your CLI over MCP doesn&amp;rsquo;t widen
its surface by an inch. The agent can call the commands you shipped, with the
parameters you defined, and nothing else. It can&amp;rsquo;t invent a command or pass a flag
you never wrote. The boundary of what it can do is the boundary you drew when you
built the tool, and &lt;code&gt;mcp tools&lt;/code&gt; shows you that boundary in full. If there&amp;rsquo;s a
command you don&amp;rsquo;t want an agent touching, mark it hidden and it drops off the list.&lt;/p&gt;
&lt;p&gt;For a long-running or remote setup there&amp;rsquo;s also &lt;code&gt;./bin/mytool mcp stream&lt;/code&gt;, which
serves the same tools over HTTP instead of stdio; the
&lt;a class="link" href="https://gtb.phpboyscout.uk/cli/mcp/" target="_blank" rel="noopener"
 &gt;mcp reference&lt;/a&gt; has the details. For most
desktop assistants, &lt;code&gt;start&lt;/code&gt; over stdio is all you need.&lt;/p&gt;
&lt;h2 id="what-it-comes-down-to"&gt;What it comes down to
&lt;/h2&gt;&lt;p&gt;You turned the CLI you&amp;rsquo;ve been building into an agent-callable tool with one
command and zero lines of AI code, because the real work, naming your operations
and documenting their inputs, you finished the moment your &lt;code&gt;--help&lt;/code&gt; was any good.
Every command you add from here is a new tool the agent gets for free.&lt;/p&gt;
&lt;p&gt;Next part goes the other way: instead of letting an assistant drive your tool from
outside, we put AI inside it, wiring up a provider and building a feature against
go-tool-base&amp;rsquo;s chat SDK. Until then, add a command or two and watch them appear in
&lt;code&gt;mcp tools&lt;/code&gt;. The agent&amp;rsquo;s toolbox grows as your CLI does.&lt;/p&gt;</description></item><item><title>Technical CV writing is still hard, and now a robot reads it first</title><link>https://phpboyscout.uk/technical-cv-writing-and-the-ai-filter/</link><pubDate>Fri, 22 May 2026 00:00:00 +0000</pubDate><guid>https://phpboyscout.uk/technical-cv-writing-and-the-ai-filter/</guid><description>&lt;img src="https://phpboyscout.uk/technical-cv-writing-and-the-ai-filter/cover-technical-cv-writing-and-the-ai-filter.png" alt="Featured image of post Technical CV writing is still hard, and now a robot reads it first" /&gt;&lt;p&gt;Seven years ago I wrote a post called &lt;a class="link" href="https://phpboyscout.uk/technical-cv-writing/" &gt;Technical CV writing is hard&lt;/a&gt;, pulled my own CV apart, and explained every choice in it. I even bragged that it converted to a first interview about eighty per cent of the time, then added &amp;ldquo;watch me now jinx myself for the future&amp;rdquo;. Reader, I jinxed myself. I&amp;rsquo;m back on the market, the same CV that served me for two decades went out into the world, and what came back was a sort of stunned silence. Not even rejections. Just nothing.&lt;/p&gt;
&lt;h2 id="the-cv-that-suddenly-stopped-working"&gt;The CV that suddenly stopped working
&lt;/h2&gt;&lt;p&gt;The thing about that silence is how &lt;em&gt;specific&lt;/em&gt; it was. Some applications behaved exactly as they always had: a human read the CV, liked it or didn&amp;rsquo;t, and replied like a person. Others went into a void. And the void had a pattern to it. It was the bigger, more process-heavy outfits, the ones you&amp;rsquo;d bet good money have an Applicant Tracking System and an &amp;ldquo;AI-assisted screening&amp;rdquo; line item in some HR budget.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s when the penny dropped. My CV wasn&amp;rsquo;t failing to impress anyone. It wasn&amp;rsquo;t reaching anyone. The first thing reading it wasn&amp;rsquo;t a person at all.&lt;/p&gt;
&lt;h2 id="the-reader-changed-and-i-hadnt-noticed"&gt;The reader changed, and I hadn&amp;rsquo;t noticed
&lt;/h2&gt;&lt;p&gt;I&amp;rsquo;ve made this exact point on this blog before, only about software: &lt;a class="link" href="https://phpboyscout.uk/half-your-users-dont-have-eyes/" &gt;half your users don&amp;rsquo;t have eyes&lt;/a&gt;. A CLI tool&amp;rsquo;s output has two audiences, the human at the terminal and the script parsing the output, and they want completely different things. It turns out a CV is now precisely the same. It has two readers, and the first one is a machine.&lt;/p&gt;
&lt;p&gt;A human recruiter reads a CV the way I designed mine to be read: narrative, personality, a sense of the person. An ATS or an AI screen does nothing of the sort. It parses for structure, for keyword density, for recency, for numbers it can latch onto. My CV was a beautifully tailored sales pitch aimed squarely at a human who, increasingly, never gets to see it, because a parser in front of them scored it and quietly binned it first.&lt;/p&gt;
&lt;p&gt;Everything that made it a good &lt;em&gt;human&lt;/em&gt; document was, to the machine, either invisible or actively confusing.&lt;/p&gt;
&lt;h2 id="so-i-asked-an-ai-what-the-ai-hated"&gt;So I asked an AI what the AI hated
&lt;/h2&gt;&lt;p&gt;There&amp;rsquo;s an irony here I&amp;rsquo;m choosing to enjoy rather than resent. The way I worked out what the filters object to was to sit down with Gemini, hand it my CV, and ask it to read the thing the way a recruitment AI would and tell me where it tripped. Using one AI to get past another. Fight fire with fire.&lt;/p&gt;
&lt;p&gt;The one instruction I was firm about, and I&amp;rsquo;ll come back to it, was that the CV had to stay recognisably &lt;em&gt;me&lt;/em&gt;. I wasn&amp;rsquo;t asking Gemini to launder my career into something generic and machine-shaped. I was asking it to help me keep as much of my own voice and judgement as possible, while making the thing easier for an AI to approve and a human to enjoy. There&amp;rsquo;s a practical edge to that, too: the screening tools are increasingly tuned to spot the patterns of generated text and weight them down, so a CV that reads as though a model wrote it can trip the very filter you were trying to please, quite apart from leaving the human at the end of it cold.&lt;/p&gt;
&lt;p&gt;With that ground rule set, the hurdles it surfaced were genuinely illuminating, and a bit humbling given I&amp;rsquo;d written a whole confident blog post about how to do this.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The skills tables are worse than useless.&lt;/strong&gt; My CV led with two lovely tables: Management Skills and Technical Skills, each with a level and years of experience. Clean and scannable for a human. To a lot of parsers, a table is a trap: they flatten it into a jumble and lose the structure entirely. Worse, listing &amp;ldquo;20+ years&amp;rdquo; against nearly everything triggers what I can only call the recency trap. Modern screening looks for skills that show up &lt;em&gt;in your recent job descriptions&lt;/em&gt;, not in a header table. A language sitting in my skills table but not in my last two roles reads as stale or unverified, no matter how many years I claimed next to it. Gemini put it plainly: &amp;ldquo;if a tool sees Golang in a top table but doesn&amp;rsquo;t see it explicitly mentioned in your last two job descriptions, it assumes the skill is stale or unverified.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;&lt;img alt="My skills laid out as tables of skill, level and commercial experience. Lovely for a human to scan, a jumble the moment a parser flattens the formatting. This is the long-standing shape, here in its original 2019 form." class="gallery-image" data-flex-basis="251px" data-flex-grow="104" height="792" loading="lazy" sizes="(max-width: 767px) calc(100vw - 30px), (max-width: 1023px) 700px, (max-width: 1279px) 950px, 1232px" src="https://phpboyscout.uk/technical-cv-writing-and-the-ai-filter/cv-skills-before_hu_5ee9e1318c5d7ff5.webp" srcset="https://phpboyscout.uk/technical-cv-writing-and-the-ai-filter/cv-skills-before_hu_50bbba9ea10ac31a.webp 480w, https://phpboyscout.uk/technical-cv-writing-and-the-ai-filter/cv-skills-before_hu_277ffc3e5e12239d.webp 720w, https://phpboyscout.uk/technical-cv-writing-and-the-ai-filter/cv-skills-before_hu_5ee9e1318c5d7ff5.webp 831w" width="831"&gt;
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&amp;ldquo;I have a passion for what I do&amp;rdquo; is noise.&lt;/strong&gt; My opening profile statement, which I was rather proud of, is exactly the sort of thing a screening tool discards wholesale. As Gemini noted, these tools &amp;ldquo;completely ignore subjective self-assessments &amp;hellip; because they cannot be measured or verified.&amp;rdquo; It wants a dense, factual summary full of the nouns it&amp;rsquo;s searching for, right at the top.&lt;/p&gt;
&lt;p&gt;&lt;img alt="The old opening: my name, my contact details, and a warm but entirely unmeasurable “I have a passion for what I do” profile statement." class="gallery-image" data-flex-basis="1055px" data-flex-grow="439" height="188" loading="lazy" sizes="(max-width: 767px) calc(100vw - 30px), (max-width: 1023px) 700px, (max-width: 1279px) 950px, 1232px" src="https://phpboyscout.uk/technical-cv-writing-and-the-ai-filter/cv-profile-before_hu_f5f3517119e92c75.webp" srcset="https://phpboyscout.uk/technical-cv-writing-and-the-ai-filter/cv-profile-before_hu_5343d7503df312a5.webp 480w, https://phpboyscout.uk/technical-cv-writing-and-the-ai-filter/cv-profile-before_hu_6164b92783de19e5.webp 720w, https://phpboyscout.uk/technical-cv-writing-and-the-ai-filter/cv-profile-before_hu_f5f3517119e92c75.webp 827w" width="827"&gt;
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;My numbers thin out the further back you go.&lt;/strong&gt; My recent roles are full of the data these tools love: a 75% reduction in deployment times, three thousand-odd Kubernetes clusters, a GitLab instance with four hundred thousand repositories. My older roles, written years ago in a more narrative style, are all &amp;ldquo;oversaw the delivery of solutions&amp;rdquo; with not a metric in sight. The machine reads that as a career that got vaguer over time, which is the opposite of true.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Four pages is at least two too many.&lt;/strong&gt; Parsers weight the first page or two most heavily. My education and the foundational stuff sat on pages three and four, where the algorithm barely bothers to look.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;It couldn&amp;rsquo;t work out what I am.&lt;/strong&gt; This was the sharp one. With &amp;ldquo;pre-sales&amp;rdquo;, &amp;ldquo;client management&amp;rdquo; and &amp;ldquo;Managing Director&amp;rdquo; sitting next to deep technical keywords, the classifier genuinely can&amp;rsquo;t decide whether I&amp;rsquo;m a commercial manager who used to code or a hands-on engineer who drifted into management. As Gemini described it: &amp;ldquo;the algorithm gets confused &amp;hellip; It struggles to classify you: Are you a commercial manager who used to code, or a hands-on techie who got pushed into management?&amp;rdquo; So it does the safe thing and matches me to neither.&lt;/p&gt;
&lt;h2 id="what-im-actually-changing"&gt;What I&amp;rsquo;m actually changing
&lt;/h2&gt;&lt;p&gt;Knowing the hurdles, here&amp;rsquo;s what the rebuild looks like. This is the part I want to be useful, so it&amp;rsquo;s concrete.&lt;/p&gt;
&lt;p&gt;The tables are gone. In their place is a &amp;ldquo;Core Expertise&amp;rdquo; section, plain text the parser can read, grouped so my leadership sits next to my technical stack. And I&amp;rsquo;ve done the thing 2019-me was too much of a show-off to do: tiered it &lt;em&gt;honestly&lt;/em&gt;. Instead of &amp;ldquo;Expert+&amp;rdquo; against everything, there&amp;rsquo;s a primary tier of what I actually do day to day, a proficient tier I can deploy without blinking, and a frank &amp;ldquo;familiar, not current&amp;rdquo; tier for the languages I last touched in anger a decade ago. That honesty isn&amp;rsquo;t just decency. A wall of &amp;ldquo;expert at everything&amp;rdquo; reads as noise to a machine and as bluster to a human, and I&amp;rsquo;d been doing both.&lt;/p&gt;
&lt;p&gt;&lt;img alt="The replacement: a plain-text Core Expertise list a parser can read straight through, tiered honestly into what I do day to day, what I’m proficient in, and what I’m only still familiar with." class="gallery-image" data-flex-basis="288px" data-flex-grow="120" height="1149" loading="lazy" sizes="(max-width: 767px) calc(100vw - 30px), (max-width: 1023px) 700px, (max-width: 1279px) 950px, 1232px" src="https://phpboyscout.uk/technical-cv-writing-and-the-ai-filter/cv-skills-after_hu_61e84b5e406d71d4.webp" srcset="https://phpboyscout.uk/technical-cv-writing-and-the-ai-filter/cv-skills-after_hu_3377bae072cc3175.webp 480w, https://phpboyscout.uk/technical-cv-writing-and-the-ai-filter/cv-skills-after_hu_abfe24012095612f.webp 720w, https://phpboyscout.uk/technical-cv-writing-and-the-ai-filter/cv-skills-after_hu_1df16e2dd124629d.webp 1080w, https://phpboyscout.uk/technical-cv-writing-and-the-ai-filter/cv-skills-after_hu_61e84b5e406d71d4.webp 1383w" width="1383"&gt;
&lt;/p&gt;
&lt;p&gt;The subjective profile is replaced with a keyword-rich professional summary that says, in the first two lines, exactly what I am and at what scale.&lt;/p&gt;
&lt;p&gt;&lt;img alt="The replacement: a Professional Summary that leads with the role and the scale, in the nouns a parser is actually hunting for, with the person still audible underneath." class="gallery-image" data-flex-basis="623px" data-flex-grow="259" height="537" loading="lazy" sizes="(max-width: 767px) calc(100vw - 30px), (max-width: 1023px) 700px, (max-width: 1279px) 950px, 1232px" src="https://phpboyscout.uk/technical-cv-writing-and-the-ai-filter/cv-profile-after_hu_2312f05c7f8b99bb.webp" srcset="https://phpboyscout.uk/technical-cv-writing-and-the-ai-filter/cv-profile-after_hu_a068217fa87974ea.webp 480w, https://phpboyscout.uk/technical-cv-writing-and-the-ai-filter/cv-profile-after_hu_6f974cc746d43525.webp 720w, https://phpboyscout.uk/technical-cv-writing-and-the-ai-filter/cv-profile-after_hu_7d8e4402f878913d.webp 1080w, https://phpboyscout.uk/technical-cv-writing-and-the-ai-filter/cv-profile-after_hu_2312f05c7f8b99bb.webp 1394w" width="1394"&gt;
 The keywords that mattered have been woven down &lt;em&gt;into&lt;/em&gt; the recent role bullets, so the parser sees them where it trusts them. And I&amp;rsquo;ve reframed the people-management and pre-sales language toward technical enablement and architectural advisory, because what I&amp;rsquo;m actually chasing is the technical-leader sweet spot: the person who owns the architecture and mentors the engineers, without the HR admin and the sales pitches. The CV now points at that, deliberately, so the classifier stops dithering.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s also a more personal beat in here. A previous employer handed me a role with a &amp;ldquo;VP&amp;rdquo; title, sold to me as exactly the technical-leadership job I&amp;rsquo;d been chasing. It wasn&amp;rsquo;t. The title turned out to be a pay-grade bracket rather than a description of the work, the work itself was hands-on firefighting with little of the leadership or empowerment I&amp;rsquo;d been promised, and I moved on within a few months. To a screening AI, that pairing is doubly awkward. A &amp;ldquo;VP&amp;rdquo; title files me as a meeting-heavy executive and rules me out of the hands-on Principal and Lead roles I actually want, and a sub-six-month stint trips the flight-risk flag that some trackers quietly score you down for. So the fix is to stop letting the inflated label do the talking: describe the functional reality of the work, retitle it to the technical track it actually was, and let the scale of what I wrestled with speak instead of the job title. Titles, it turns out, are for the pay band. The bullets are for the truth.&lt;/p&gt;
&lt;p&gt;&lt;img alt="My recent roles on the new CV: each leads with the work and the numbers, in technical-track titles a parser weights and a human believes." class="gallery-image" data-flex-basis="282px" data-flex-grow="117" height="1167" loading="lazy" sizes="(max-width: 767px) calc(100vw - 30px), (max-width: 1023px) 700px, (max-width: 1279px) 950px, 1232px" src="https://phpboyscout.uk/technical-cv-writing-and-the-ai-filter/cv-experience_hu_e68702fb97f43fbe.webp" srcset="https://phpboyscout.uk/technical-cv-writing-and-the-ai-filter/cv-experience_hu_74269b608a2268a6.webp 480w, https://phpboyscout.uk/technical-cv-writing-and-the-ai-filter/cv-experience_hu_7757bb28ec3010e3.webp 720w, https://phpboyscout.uk/technical-cv-writing-and-the-ai-filter/cv-experience_hu_8865bec496bcd53d.webp 1080w, https://phpboyscout.uk/technical-cv-writing-and-the-ai-filter/cv-experience_hu_e68702fb97f43fbe.webp 1374w" width="1374"&gt;
&lt;/p&gt;
&lt;h2 id="keeping-myself-in-it"&gt;Keeping myself in it
&lt;/h2&gt;&lt;p&gt;Back to that ground rule. Every one of these changes is in service of getting past the machine to the human behind it, and neither reader is well served by a CV with the person scrubbed out of it. The screen, increasingly, is trained to notice generic generated phrasing and mark it down; the human, always, would rather read something with a pulse. So the keywords go in, the structure gets fixed, the metrics come forward, and the &lt;em&gt;voice stays mine&lt;/em&gt;. No &amp;ldquo;results-driven synergistic leveraging of cross-functional paradigms&amp;rdquo; that nobody would ever say out loud. That was the whole point of doing it this way: let the AI help reshape the &lt;em&gt;structure&lt;/em&gt; a parser cares about, while the &lt;em&gt;words&lt;/em&gt; stay mine, so what comes out is easier for a machine to approve, easier for a human to enjoy, and still unmistakably written by me. Optimising for the filter and sounding like myself turned out not to be in conflict at all.&lt;/p&gt;
&lt;h2 id="i-genuinely-dont-know-if-this-works-yet"&gt;I genuinely don&amp;rsquo;t know if this works yet
&lt;/h2&gt;&lt;p&gt;Here&amp;rsquo;s the part that makes this a post and not a victory lap. I don&amp;rsquo;t know if any of this lands. The old CV converted at around eighty per cent, on my own possibly-generous reckoning, right up until it abruptly didn&amp;rsquo;t. The new one is going out now, into the same market and the same filters that were stonewalling me a fortnight ago.&lt;/p&gt;
&lt;p&gt;So this is a promise as much as a post. I&amp;rsquo;m going to keep count, the way I should have all along, and come back with the actual numbers: did reshaping my CV for a reader with no eyes genuinely move the needle, or did I just make it uglier and learn nothing? Either way you&amp;rsquo;ll get the truth, because a follow-up that only reports good news isn&amp;rsquo;t worth writing. Watch this space, and if you&amp;rsquo;re sending CVs into the same silence, maybe try reading yours the way a machine would first. It&amp;rsquo;s a deeply odd exercise, and I suspect it&amp;rsquo;s now an essential one.&lt;/p&gt;</description></item><item><title>Supporting a provider, or actually using it</title><link>https://phpboyscout.uk/supporting-a-provider-or-actually-using-it/</link><pubDate>Sat, 02 May 2026 00:00:00 +0000</pubDate><guid>https://phpboyscout.uk/supporting-a-provider-or-actually-using-it/</guid><description>&lt;img src="https://phpboyscout.uk/supporting-a-provider-or-actually-using-it/cover-supporting-a-provider-or-actually-using-it.png" alt="Featured image of post Supporting a provider, or actually using it" /&gt;&lt;p&gt;If your CLI tool talks to an AI model, you don&amp;rsquo;t want to hard-wire one vendor. So you reach for a single client interface over several providers, which is the right call. The trap is the next step: build that interface on only what every provider has in common, and you quietly throw away the very features that made you want a particular provider in the first place. rust-tool-base&amp;rsquo;s &lt;code&gt;rtb-ai&lt;/code&gt; refuses to make that trade.&lt;/p&gt;
&lt;h2 id="the-pull-toward-one-interface"&gt;The pull toward one interface
&lt;/h2&gt;&lt;p&gt;If your CLI tool talks to an AI model, hard-wiring one vendor is a poor bet. One user has an Anthropic key, another an OpenAI key. Someone&amp;rsquo;s on Gemini. Someone runs Ollama locally because their data can&amp;rsquo;t leave the building. Someone points at an OpenAI-compatible endpoint from a provider you&amp;rsquo;ve never heard of. You don&amp;rsquo;t want a separate code path for each, so you want one &lt;code&gt;AiClient&lt;/code&gt; that all of them slot behind.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;rtb-ai&lt;/code&gt; gets that unification from the &lt;code&gt;genai&lt;/code&gt; crate, which already speaks to Anthropic, OpenAI, Gemini, Ollama and OpenAI-compatible endpoints. One interface, five providers, the tool author picks one in config. The Go sibling makes the same bet: go-tool-base&amp;rsquo;s &lt;code&gt;chat&lt;/code&gt; package also unifies several providers, behind &lt;a class="link" href="https://phpboyscout.uk/an-ai-interface-that-fits-on-one-screen/" &gt;an interface deliberately kept to four methods&lt;/a&gt;. So far this is the obvious design, and if it were the whole design there&amp;rsquo;d be nothing to write about.&lt;/p&gt;
&lt;h2 id="what-unified-quietly-costs-you"&gt;What &amp;ldquo;unified&amp;rdquo; quietly costs you
&lt;/h2&gt;&lt;p&gt;Here&amp;rsquo;s the catch in any unified interface. It can only expose what every provider behind it has in common.&lt;/p&gt;
&lt;p&gt;The common subset is plain chat. Messages go in, text comes out, optionally streamed token by token. That&amp;rsquo;s real and it&amp;rsquo;s useful and every provider does it. But the common subset is also the &lt;em&gt;floor&lt;/em&gt;, and the features that make a particular provider worth choosing are almost never on the floor. They&amp;rsquo;re the things only that provider does.&lt;/p&gt;
&lt;p&gt;Anthropic is the sharp example, because it has three features that matter and not one of them is common-subset.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Prompt caching.&lt;/strong&gt; You can mark the stable parts of a request, the system prompt and the tool list, as cacheable. The provider keeps them warm, and on the next turn you aren&amp;rsquo;t billed to re-send and re-process text that didn&amp;rsquo;t change. On a long agent loop, where the same large system prompt rides along on every single turn, that&amp;rsquo;s a substantial saving in both cost and latency.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Extended thinking.&lt;/strong&gt; The model works through a hard problem in a visible, budgeted reasoning pass before it commits to an answer, and you can see that reasoning.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Citations.&lt;/strong&gt; Structured references back to source material in the response.&lt;/p&gt;
&lt;p&gt;A client built strictly on the common subset can&amp;rsquo;t express any of those. It has no field for them, because four of the five providers wouldn&amp;rsquo;t know what to do with the field. So a purely lowest-common-denominator client would &amp;ldquo;support&amp;rdquo; Anthropic and then use it badly, leaving its best features unreachable. Support as a checkbox, not as the point.&lt;/p&gt;
&lt;h2 id="the-escape-hatch"&gt;The escape hatch
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;rtb-ai&lt;/code&gt;&amp;rsquo;s answer is to not choose. It runs two implementations under one interface.&lt;/p&gt;
&lt;p&gt;For OpenAI, Gemini, Ollama and OpenAI-compatible endpoints, calls route through &lt;code&gt;genai&lt;/code&gt;, the unified path. For Anthropic, every method drops to a &lt;a class="link" href="https://gitlab.com/phpboyscout/rust-tool-base/-/blob/9c22aa8/crates/rtb-ai/src/anthropic.rs#L1" target="_blank" rel="noopener"
 &gt;direct &lt;code&gt;reqwest&lt;/code&gt; implementation&lt;/a&gt; straight against the Messages API. Same &lt;code&gt;AiClient&lt;/code&gt; on the surface, a different implementation underneath, selected by which provider the config names.&lt;/p&gt;
&lt;p&gt;And the request type has deliberate room for the difference:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-rust" data-lang="rust"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;pub&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;ChatRequest&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;pub&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;system&lt;/span&gt;: &lt;span class="nb"&gt;Option&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;pub&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;: &lt;span class="nb"&gt;Vec&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;pub&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;temperature&lt;/span&gt;: &lt;span class="nb"&gt;Option&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;f32&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;pub&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;max_tokens&lt;/span&gt;: &lt;span class="nb"&gt;Option&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;u32&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="sd"&gt;/// Anthropic-only: enables prompt caching at every stable point.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="sd"&gt;/// Ignored on non-Anthropic providers.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;pub&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;cache_control&lt;/span&gt;: &lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="sd"&gt;/// Anthropic-only: extended-thinking budget. `None` disables.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="sd"&gt;/// Ignored on non-Anthropic providers.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;pub&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;thinking&lt;/span&gt;: &lt;span class="nb"&gt;Option&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ThinkingMode&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Set &lt;code&gt;cache_control&lt;/code&gt; and the Anthropic-direct path inserts cache breakpoints at the three stable points: the system prompt, the tool list, and the first message. Set &lt;code&gt;thinking&lt;/code&gt; and it adds the thinking block, and streaming surfaces a separate &lt;code&gt;ThinkingToken&lt;/code&gt; event so you can show the reasoning apart from the answer. On a non-Anthropic provider, both fields are simply ignored. The interface carries them; only the implementation that understands them acts on them.&lt;/p&gt;
&lt;h2 id="a-hatch-not-a-leak"&gt;A hatch, not a leak
&lt;/h2&gt;&lt;p&gt;It&amp;rsquo;s worth being precise about why this isn&amp;rsquo;t the thing it superficially resembles, which is a leaky abstraction.&lt;/p&gt;
&lt;p&gt;A leaky abstraction is one where implementation details bleed through that you didn&amp;rsquo;t intend and can&amp;rsquo;t reason about. The abstraction quietly fails to abstract, and you&amp;rsquo;re left guessing which provider you&amp;rsquo;re really talking to.&lt;/p&gt;
&lt;p&gt;This is the opposite of that. The two Anthropic-only fields aren&amp;rsquo;t a leak. They&amp;rsquo;re named, documented as Anthropic-only, inert everywhere else, and right there in the public type for anyone to see. The interface is uniform for the common case and &lt;em&gt;deliberately, visibly&lt;/em&gt; non-uniform at exactly the points where uniformity would have cost you the good features. You opt into provider-specifics by setting a field. You stay fully portable by leaving it at its default. Nothing bleeds; you decide.&lt;/p&gt;
&lt;p&gt;The same design line explains what &lt;em&gt;does&lt;/em&gt; stay in the unified path. Structured output, &lt;code&gt;chat_structured::&amp;lt;T&amp;gt;&lt;/code&gt;, sends a JSON Schema derived from your Rust type with the request and validates the reply against it before handing you a typed &lt;code&gt;T&lt;/code&gt;. That&amp;rsquo;s a portability win that costs nothing across providers, so it belongs in the common interface. The split isn&amp;rsquo;t &amp;ldquo;Anthropic versus the rest&amp;rdquo;. It&amp;rsquo;s &amp;ldquo;features that are free to unify go in the unified path; features that aren&amp;rsquo;t get a designed door&amp;rdquo;. Prompt caching and extended thinking get the door, because flattening them away would be the expensive kind of convenient.&lt;/p&gt;
&lt;h2 id="to-sum-up"&gt;To sum up
&lt;/h2&gt;&lt;p&gt;A CLI tool that integrates AI wants one client over several providers, and a unified interface can only expose what those providers share. The shared floor is plain chat, and the features worth choosing a provider &lt;em&gt;for&lt;/em&gt;, like Anthropic&amp;rsquo;s prompt caching, extended thinking and citations, are never on the floor.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;rtb-ai&lt;/code&gt; keeps both. &lt;code&gt;genai&lt;/code&gt; provides the unified path across five providers; an Anthropic-direct &lt;code&gt;reqwest&lt;/code&gt; path drops below the abstraction for the features &lt;code&gt;genai&lt;/code&gt; can&amp;rsquo;t reach, and &lt;code&gt;ChatRequest&lt;/code&gt; carries the Anthropic-only fields openly, ignored elsewhere. Uniform where uniformity is free, with a designed escape hatch where it isn&amp;rsquo;t. That&amp;rsquo;s the difference between supporting a provider and actually using it.&lt;/p&gt;</description></item><item><title>A configurable AI endpoint is an attack surface</title><link>https://phpboyscout.uk/a-configurable-ai-endpoint-is-an-attack-surface/</link><pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate><guid>https://phpboyscout.uk/a-configurable-ai-endpoint-is-an-attack-surface/</guid><description>&lt;img src="https://phpboyscout.uk/a-configurable-ai-endpoint-is-an-attack-surface/cover-a-configurable-ai-endpoint-is-an-attack-surface.png" alt="Featured image of post A configurable AI endpoint is an attack surface" /&gt;&lt;p&gt;&amp;ldquo;Let users point at their own AI endpoint&amp;rdquo; is one of those config options that looks completely harmless on the way in. People want it, for perfectly good reasons. Then you sit with it for a minute and realise you&amp;rsquo;ve handed every user a loaded gun and pointed it vaguely at their own API key.&lt;/p&gt;
&lt;h2 id="why-you-offer-it-at-all"&gt;Why you offer it at all
&lt;/h2&gt;&lt;p&gt;There are real reasons to let someone set a custom base URL. They&amp;rsquo;re running a local model and want &lt;code&gt;localhost:11434&lt;/code&gt;. They&amp;rsquo;re behind a corporate proxy that fronts the real provider. They&amp;rsquo;re on Azure&amp;rsquo;s flavour of OpenAI, which lives at a different host. They&amp;rsquo;ve a self-hosted gateway doing rate-limiting. All reasonable, all things a framework should support rather than fight.&lt;/p&gt;
&lt;h2 id="the-bit-thats-a-loaded-gun"&gt;The bit that&amp;rsquo;s a loaded gun
&lt;/h2&gt;&lt;p&gt;Here&amp;rsquo;s what the config option quietly decides: the base URL is &lt;em&gt;where your credential goes&lt;/em&gt;. The API key rides along in an &lt;code&gt;Authorization&lt;/code&gt; header on every request, to whatever host that URL resolves to. So the moment the endpoint is user-configurable, the destination of your secret is user-configurable too.&lt;/p&gt;
&lt;p&gt;And users do user things. They paste a URL from a gist that turned out to be a honeypot. They leave &lt;code&gt;http://&lt;/code&gt; on the front, so the key crosses the wire in plaintext. They copy &lt;code&gt;https://user:token@host/v1&lt;/code&gt; not realising the userinfo changes who they actually authenticate to. They never edit the &lt;code&gt;https://api.example.com/v1&lt;/code&gt; placeholder and wonder why the key&amp;rsquo;s been posted to a domain they don&amp;rsquo;t own. None of that is malice. It&amp;rsquo;s what happens when the destination of a secret is a free-text field.&lt;/p&gt;
&lt;h2 id="validate-before-the-first-byte-leaves"&gt;Validate before the first byte leaves
&lt;/h2&gt;&lt;p&gt;So every &lt;code&gt;chat.New&lt;/code&gt; routes through &lt;code&gt;ValidateBaseURL&lt;/code&gt; before the provider is built. The threat model is written at the top of &lt;a class="link" href="https://gitlab.com/phpboyscout/go-tool-base/-/blob/5c78fc9/pkg/chat/baseurl.go" target="_blank" rel="noopener"
 &gt;&lt;code&gt;pkg/chat/baseurl.go&lt;/code&gt;&lt;/a&gt;: an operator who can influence config could &amp;ldquo;redirect chat-provider traffic to an attacker-controlled HTTPS host and capture the Authorization header.&amp;rdquo; The checks run cheapest-first: a length cap, no ASCII control characters, must parse, no userinfo, &lt;code&gt;https&lt;/code&gt; only, a host must be present, and the host mustn&amp;rsquo;t be a placeholder.&lt;/p&gt;
&lt;p&gt;The userinfo rule is the sharp one:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;User&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="c1"&gt;// Reject any userinfo, with or without password. Never log&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="c1"&gt;// the URL itself because it contains the credential.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithHint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ErrInvalidBaseURL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="s"&gt;&amp;#34;base URL must not contain credentials; use the Token field instead&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The placeholder check rejects &lt;code&gt;example.com&lt;/code&gt; and friends &lt;em&gt;and any subdomain of them&lt;/em&gt;, so the unedited &lt;code&gt;https://api.example.com/v1&lt;/code&gt; from a setup wizard never reaches the wire and hits some typosquatted lookalike. And the HTTP escape hatch is test-only by construction: the &lt;code&gt;AllowInsecureBaseURL&lt;/code&gt; field that permits plain &lt;code&gt;http&lt;/code&gt; is tagged &lt;code&gt;json:&amp;quot;-&amp;quot;&lt;/code&gt;, so a config file physically cannot set it. This all came out of the 2026-04-17 security audit, finding M-3.&lt;/p&gt;
&lt;p&gt;rust-tool-base enforces the same at its own boundary: &lt;a class="link" href="https://gitlab.com/phpboyscout/rust-tool-base/-/blob/9c22aa8/crates/rtb-ai/src/config.rs#L96" target="_blank" rel="noopener"
 &gt;&lt;code&gt;validate_base_url&lt;/code&gt;&lt;/a&gt; rejects userinfo, any scheme but &lt;code&gt;https&lt;/code&gt; (bar a test-only &lt;code&gt;allow_insecure&lt;/code&gt;), and documentation placeholder hosts like &lt;code&gt;example.com&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id="what-it-can-and-cant-do"&gt;What it can and can&amp;rsquo;t do
&lt;/h2&gt;&lt;p&gt;It won&amp;rsquo;t stop a user who deliberately points the tool at a malicious HTTPS host they genuinely chose. If someone is set on sending their own key somewhere bad, validation can&amp;rsquo;t read their mind.&lt;/p&gt;
&lt;p&gt;What it stops is the &lt;em&gt;accidents&lt;/em&gt;: the plaintext slip, the userinfo confusion, the placeholder nobody changed. Those aren&amp;rsquo;t theoretical, they&amp;rsquo;re the ones that happen to careful people on ordinary days. Storing the key well is one job (&lt;a class="link" href="https://phpboyscout.uk/where-should-a-cli-keep-your-api-keys/" &gt;where a CLI keeps it&lt;/a&gt;), stopping it &lt;a class="link" href="https://phpboyscout.uk/redacting-the-secret-you-didnt-know-was-in-the-string/" &gt;leaking through a log&lt;/a&gt; is another, and this is the third side of the triangle: once you&amp;rsquo;ve stored it and stopped it leaking, make sure you don&amp;rsquo;t &lt;em&gt;send&lt;/em&gt; it somewhere daft.&lt;/p&gt;</description></item><item><title>Testing code that calls an LLM: yes, you actually can</title><link>https://phpboyscout.uk/testing-code-that-calls-an-llm/</link><pubDate>Wed, 08 Apr 2026 00:00:00 +0000</pubDate><guid>https://phpboyscout.uk/testing-code-that-calls-an-llm/</guid><description>&lt;img src="https://phpboyscout.uk/testing-code-that-calls-an-llm/cover-testing-code-that-calls-an-llm.png" alt="Featured image of post Testing code that calls an LLM: yes, you actually can" /&gt;&lt;p&gt;&amp;ldquo;You can&amp;rsquo;t test code that calls an AI.&amp;rdquo; I&amp;rsquo;ve heard it said with great confidence, and it&amp;rsquo;s half right, which is the most dangerous kind of right. You genuinely can&amp;rsquo;t assert on what a non-deterministic model says. But the model isn&amp;rsquo;t your code, and the bits sitting either side of it most certainly are.&lt;/p&gt;
&lt;h2 id="you-cant-test-ai-code"&gt;&amp;ldquo;You can&amp;rsquo;t test AI code&amp;rdquo;
&lt;/h2&gt;&lt;p&gt;It&amp;rsquo;s a fair worry. Your command calls an LLM. The LLM returns something slightly different every run. A test that asserts &lt;code&gt;response == &amp;quot;...&amp;quot;&lt;/code&gt; is broken before you&amp;rsquo;ve finished typing it. So the conclusion arrives quickly: the AI path can&amp;rsquo;t be tested, leave it uncovered.&lt;/p&gt;
&lt;p&gt;Which is a shame, because the AI call is usually the riskiest line in the whole command.&lt;/p&gt;
&lt;p&gt;The conclusion is also wrong. It mistakes &amp;ldquo;I can&amp;rsquo;t test the model&amp;rdquo; for &amp;ldquo;I can&amp;rsquo;t test my code&amp;rdquo;. The model is not your code. Your code is the two pieces sitting on either side of it.&lt;/p&gt;
&lt;h2 id="your-code-is-a-prompt-and-a-handler"&gt;Your code is a prompt and a handler
&lt;/h2&gt;&lt;p&gt;Strip the command down to what it actually does:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;It builds a prompt. It assembles a system prompt, the user&amp;rsquo;s input, perhaps some context, and sends it.&lt;/li&gt;
&lt;li&gt;The model does something. This is not your code.&lt;/li&gt;
&lt;li&gt;It takes the response and does something with it. It parses it, branches on it, prints it, stores it.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Steps one and three are entirely yours, and entirely deterministic. The same inputs build the same prompt and handle the same response the same way, every single time. That&amp;rsquo;s testable. Step two is the only part that isn&amp;rsquo;t, and step two was never yours to test in the first place.&lt;/p&gt;
&lt;p&gt;So the job is to pin step two to a known value, and then test one and three properly.&lt;/p&gt;
&lt;h2 id="test-the-prompt-snapshot-it"&gt;Test the prompt: snapshot it
&lt;/h2&gt;&lt;p&gt;Step one produces a prompt, and a prompt is just a string, which means you can pin it.&lt;/p&gt;
&lt;p&gt;Both frameworks lean on snapshot testing here. go-tool-base uses a golden-file approach: the prompt your code generates is recorded to a file, and the test re-generates it and compares against that file. rust-tool-base does the same with &lt;code&gt;insta&lt;/code&gt;, snapshotting the request body the client would send.&lt;/p&gt;
&lt;p&gt;The reason this matters is that the prompt is load-bearing and quietly easy to break. You refactor how context gets assembled. Without noticing, you&amp;rsquo;ve changed the wording, or the ordering, or dropped a line the model was leaning on. Nothing fails to compile. The behaviour just drifts, silently.&lt;/p&gt;
&lt;p&gt;A snapshot test catches exactly that. It fails, it shows you the diff between the old prompt and the new one, and it makes you stop and make a decision. Was this change intended? If yes, you accept the new snapshot and move on. If no, you&amp;rsquo;ve just caught a bug before it shipped. Either way the prompt never changes by accident, which for AI code is most of the battle.&lt;/p&gt;
&lt;h2 id="test-the-handler-mock-the-response"&gt;Test the handler: mock the response
&lt;/h2&gt;&lt;p&gt;Step three needs a response to handle, and in a unit test you don&amp;rsquo;t get that response from the real model. You supply it.&lt;/p&gt;
&lt;p&gt;go-tool-base ships &lt;a class="link" href="https://gitlab.com/phpboyscout/go-tool-base/-/blob/5c78fc9/mocks/pkg/chat/ChatClient.go" target="_blank" rel="noopener"
 &gt;generated mocks for the &lt;code&gt;ChatClient&lt;/code&gt; interface&lt;/a&gt;. A test builds a mock client, tells it &amp;ldquo;when &lt;code&gt;Ask&lt;/code&gt; is called, return &lt;em&gt;this&lt;/em&gt; canned value&amp;rdquo;, and runs the command against it:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nx"&gt;mockClient&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;mock_chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;NewMockChatClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nx"&gt;mockClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;EXPECT&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;Ask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;mock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Anything&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;mock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Anything&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;mock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AnythingOfType&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#34;*main.Analysis&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;RunAndReturn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;_&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;_&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="p"&gt;.(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;Analysis&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Analysis&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;Severity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;critical&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Because &lt;a class="link" href="https://phpboyscout.uk/an-ai-interface-that-fits-on-one-screen/" &gt;the interface is only four methods&lt;/a&gt;, that mock is trivial to set up and complete by construction. rust-tool-base takes the same idea one layer down: HTTP-bound tests use &lt;code&gt;wiremock&lt;/code&gt;, which stands up a fake server returning a canned response body. The client makes a real HTTP request; it just goes to a fake endpoint the test controls.&lt;/p&gt;
&lt;p&gt;Either way, step two is now fixed to a value you chose, which makes step three deterministic. And that unlocks the tests that actually matter: given a malformed response, does the command fail gracefully? Given a rate-limit error, an empty answer, a field missing? Those are the cases a live model almost never hands you on demand, and a mock hands you every time, on the first run.&lt;/p&gt;
&lt;p&gt;This is, incidentally, the same discipline as &lt;a class="link" href="https://phpboyscout.uk/the-test-mocking-pattern-that-races/" &gt;the test-mocking work elsewhere in the framework&lt;/a&gt;: the dependency is injected, so the test gets to decide what it does.&lt;/p&gt;
&lt;h2 id="what-you-deliberately-dont-test"&gt;What you deliberately don&amp;rsquo;t test
&lt;/h2&gt;&lt;p&gt;One boundary worth stating. None of this tests whether the model gives &lt;em&gt;good&lt;/em&gt; answers. That question is real, but it&amp;rsquo;s a different activity (evaluations, run as their own suite) and not something to mix into the unit tests.&lt;/p&gt;
&lt;p&gt;The unit suite&amp;rsquo;s job is your code: that it builds a sound prompt, and that it handles every shape of response correctly, including the ugly ones. Keep that well away from &amp;ldquo;is the model clever today&amp;rdquo;. A unit test that depends on the model being clever is a unit test that fails when the weather changes, and a flaky test just teaches people to ignore the whole suite.&lt;/p&gt;
&lt;h2 id="what-it-comes-down-to"&gt;What it comes down to
&lt;/h2&gt;&lt;p&gt;Code that calls an LLM is testable; the model is not, and those are different statements. Your code is a prompt builder and a response handler, both deterministic, with the model sat in between.&lt;/p&gt;
&lt;p&gt;go-tool-base and rust-tool-base converge on the same approach. Snapshot the prompt, with golden files or &lt;code&gt;insta&lt;/code&gt;, so a refactor can&amp;rsquo;t change what you send without a test noticing. Mock the response, with generated &lt;code&gt;ChatClient&lt;/code&gt; mocks or a &lt;code&gt;wiremock&lt;/code&gt; server, so tests run with no network and you can feed in the malformed and error cases a real model won&amp;rsquo;t reliably produce. Leave &amp;ldquo;are the answers any good&amp;rdquo; to a separate evaluation suite. Test the two halves you own, and the non-determinism in the middle stops being an excuse to leave the riskiest line uncovered.&lt;/p&gt;</description></item><item><title>The AI provider that isn't an API</title><link>https://phpboyscout.uk/the-ai-provider-that-isnt-an-api/</link><pubDate>Mon, 06 Apr 2026 00:00:00 +0000</pubDate><guid>https://phpboyscout.uk/the-ai-provider-that-isnt-an-api/</guid><description>&lt;img src="https://phpboyscout.uk/the-ai-provider-that-isnt-an-api/cover-the-ai-provider-that-isnt-an-api.png" alt="Featured image of post The AI provider that isn't an API" /&gt;&lt;p&gt;go-tool-base&amp;rsquo;s &lt;code&gt;chat&lt;/code&gt; package puts five AI providers behind one interface. Four of them are exactly what you&amp;rsquo;d guess: HTTP calls to OpenAI, Claude, Gemini, and anything OpenAI-compatible. The fifth one isn&amp;rsquo;t an API at all. It shells out to a binary.&lt;/p&gt;
&lt;p&gt;That sounds like a slightly mad thing to want, right up until you&amp;rsquo;ve worked somewhere the network says no.&lt;/p&gt;
&lt;h2 id="the-fifth-provider-shells-out"&gt;The fifth provider shells out
&lt;/h2&gt;&lt;p&gt;The &lt;code&gt;chat&lt;/code&gt; package speaks to five providers through one &lt;code&gt;ChatClient&lt;/code&gt; interface. Four of them are what you&amp;rsquo;d expect: HTTP requests to OpenAI, to Claude, to Gemini, to any OpenAI-compatible endpoint. The tool author picks one in config, and the rest of the code never knows the difference.&lt;/p&gt;
&lt;p&gt;The fifth, &lt;a class="link" href="https://gitlab.com/phpboyscout/go-tool-base/-/blob/5c78fc9/pkg/chat/claude_local.go#L18" target="_blank" rel="noopener"
 &gt;&lt;code&gt;ProviderClaudeLocal&lt;/code&gt;&lt;/a&gt;, is different in kind. It doesn&amp;rsquo;t make an HTTP request at all. It shells out. It runs the &lt;code&gt;claude&lt;/code&gt; CLI binary as a child process, passes the prompt in, and reads the answer back from the binary&amp;rsquo;s output.&lt;/p&gt;
&lt;p&gt;That sounds like an odd thing to want until you&amp;rsquo;ve been stuck in the environment it was built for.&lt;/p&gt;
&lt;h2 id="why-youd-want-that"&gt;Why you&amp;rsquo;d want that
&lt;/h2&gt;&lt;p&gt;Picture a corporate network with its egress locked right down. Outbound HTTPS to &lt;code&gt;api.anthropic.com&lt;/code&gt; is blocked by policy. A tool built on go-tool-base that uses AI would simply fall over there. It tries to reach the API, there&amp;rsquo;s no route, and that&amp;rsquo;s the end of the feature.&lt;/p&gt;
&lt;p&gt;But the developer at that machine has the &lt;code&gt;claude&lt;/code&gt; CLI installed, and has run &lt;code&gt;claude login&lt;/code&gt;. That binary is permitted. It&amp;rsquo;s an approved, managed tool, and it has its own sanctioned path out. The direct API call is blocked; the &lt;code&gt;claude&lt;/code&gt; command is not.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;ProviderClaudeLocal&lt;/code&gt; is what bridges those two facts. If your tool&amp;rsquo;s AI calls go &lt;em&gt;through&lt;/em&gt; that already-blessed binary instead of straight at the API, they work, in an environment where the direct call cannot. That&amp;rsquo;s the whole reason the provider exists. It isn&amp;rsquo;t faster (a real API call has lower latency) and it isn&amp;rsquo;t more capable. It&amp;rsquo;s for the place where the API call simply isn&amp;rsquo;t an option, and &amp;ldquo;isn&amp;rsquo;t an option&amp;rdquo; is a surprisingly common place to find yourself inside a large organisation.&lt;/p&gt;
&lt;h2 id="what-it-costs"&gt;What it costs
&lt;/h2&gt;&lt;p&gt;It&amp;rsquo;s worth being straight about the trade, because &lt;code&gt;ProviderClaudeLocal&lt;/code&gt; is the reduced-capability provider.&lt;/p&gt;
&lt;p&gt;It doesn&amp;rsquo;t do tool calling. It doesn&amp;rsquo;t do parallel tools. It doesn&amp;rsquo;t stream. Those need a live, structured connection to the model&amp;rsquo;s API, and a subprocess that runs once and prints an answer is not that. What it &lt;em&gt;does&lt;/em&gt; support is plain chat and structured output, the latter through the binary&amp;rsquo;s own &lt;code&gt;--json-schema&lt;/code&gt; flag.&lt;/p&gt;
&lt;p&gt;So the positioning, and the package&amp;rsquo;s documentation says exactly this, is: prefer the API providers when you can reach them, because they&amp;rsquo;re lower latency and feature-complete. Reach for &lt;code&gt;ProviderClaudeLocal&lt;/code&gt; when API access is restricted. You accept the narrower capability set as the price of working at all. For a tool whose AI feature is &amp;ldquo;answer a question&amp;rdquo; or &amp;ldquo;return a structured analysis&amp;rdquo;, that price is often nothing you&amp;rsquo;d even notice. For one built on an agentic tool-calling loop, it&amp;rsquo;s a real limitation, and you&amp;rsquo;d know to expect it.&lt;/p&gt;
&lt;h2 id="how-it-stays-behind-the-same-interface"&gt;How it stays behind the same interface
&lt;/h2&gt;&lt;p&gt;Here&amp;rsquo;s the part that makes it pleasant rather than a special case to maintain. Despite being a subprocess and not an API, &lt;code&gt;ProviderClaudeLocal&lt;/code&gt; is still a &lt;code&gt;ChatClient&lt;/code&gt;. Your feature code calls &lt;code&gt;Chat&lt;/code&gt; and &lt;code&gt;Ask&lt;/code&gt; exactly the way it would for any other provider.&lt;/p&gt;
&lt;p&gt;Everything that makes a subprocess provider awkward stays inside the provider. Spawning the binary, feeding it the prompt, parsing its output, capturing &lt;code&gt;stderr&lt;/code&gt; and surfacing it when the binary exits non-zero, and threading multi-turn continuity through session identifiers passed back on the next call with &lt;code&gt;--resume&lt;/code&gt;: all of that is the provider&amp;rsquo;s problem, and all of it sits behind the interface. The code in your tool that uses AI doesn&amp;rsquo;t know, and has no way to find out, that this particular provider is a child process rather than an HTTPS call.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s a unified interface genuinely earning its place. It&amp;rsquo;s easy to put a uniform face on four things that already work the same way underneath. The real test of the abstraction is whether something that works in a &lt;em&gt;completely&lt;/em&gt; different way, a subprocess instead of a socket, can still slot in without the caller changing a line. Here it can. You swap one config value, and a tool that talked to an API now talks through a binary, and nothing downstream so much as blinks.&lt;/p&gt;
&lt;h2 id="the-bottom-line"&gt;The bottom line
&lt;/h2&gt;&lt;p&gt;go-tool-base&amp;rsquo;s &lt;code&gt;chat&lt;/code&gt; package puts five providers behind one &lt;code&gt;ChatClient&lt;/code&gt; interface, and &lt;code&gt;ProviderClaudeLocal&lt;/code&gt; is the one that isn&amp;rsquo;t an API. It runs the locally installed, pre-authenticated &lt;code&gt;claude&lt;/code&gt; CLI as a subprocess.&lt;/p&gt;
&lt;p&gt;It exists for the locked-down environment where outbound HTTPS to the AI API is blocked but the &lt;code&gt;claude&lt;/code&gt; binary is allowed: there, AI features keep working where a direct call would fail. The trade is a narrower capability set (no tool calling, no streaming, plain chat and structured output only) so you prefer the API providers when you can reach them and fall back to this when you can&amp;rsquo;t. And because it&amp;rsquo;s still a &lt;code&gt;ChatClient&lt;/code&gt;, all the subprocess machinery stays hidden, and your code uses it without knowing it&amp;rsquo;s there. That last part is the real test of an abstraction: a provider that works in an entirely different way still slots in unchanged.&lt;/p&gt;</description></item><item><title>AI conversations you can resume</title><link>https://phpboyscout.uk/ai-conversations-you-can-resume/</link><pubDate>Sat, 04 Apr 2026 00:00:00 +0000</pubDate><guid>https://phpboyscout.uk/ai-conversations-you-can-resume/</guid><description>&lt;img src="https://phpboyscout.uk/ai-conversations-you-can-resume/cover-ai-conversations-you-can-resume.png" alt="Featured image of post AI conversations you can resume" /&gt;&lt;p&gt;An AI conversation is, fundamentally, its own history. The model&amp;rsquo;s next answer depends on everything said so far. And a CLI tool, by its very nature, forgets everything the moment it exits. Put those two facts together and you get the problem: run an AI command, exit, run it again, and you&amp;rsquo;re talking to someone who&amp;rsquo;s never met you.&lt;/p&gt;
&lt;h2 id="a-cli-forgets-everything"&gt;A CLI forgets everything
&lt;/h2&gt;&lt;p&gt;A long-running service keeps its state in memory for as long as it runs. A CLI tool doesn&amp;rsquo;t get that luxury. It starts, does one thing, exits. The next invocation is a brand-new process with no memory of the last one.&lt;/p&gt;
&lt;p&gt;For most commands that&amp;rsquo;s exactly right, and you wouldn&amp;rsquo;t want it any other way. But an AI conversation is a different kind of beast, because a conversation &lt;em&gt;is&lt;/em&gt; its history. The model&amp;rsquo;s next answer depends on everything said so far. Run an AI command, exit, run it again, and you&amp;rsquo;ve started a fresh conversation with someone who&amp;rsquo;s never met you. For an interactive assistant, or any AI workflow that unfolds across several invocations, that&amp;rsquo;s plainly the wrong behaviour. The user expects to pick up where they left off.&lt;/p&gt;
&lt;h2 id="save-and-restore"&gt;Save and restore
&lt;/h2&gt;&lt;p&gt;The &lt;code&gt;chat&lt;/code&gt; package handles this through a &lt;a class="link" href="https://gitlab.com/phpboyscout/go-tool-base/-/blob/5c78fc9/pkg/chat/persistence.go#L25" target="_blank" rel="noopener"
 &gt;&lt;code&gt;PersistentChatClient&lt;/code&gt;&lt;/a&gt; interface. Like streaming, it&amp;rsquo;s an optional capability discovered with a type assertion, sitting beside &lt;a class="link" href="https://phpboyscout.uk/an-ai-interface-that-fits-on-one-screen/" &gt;the four-method core&lt;/a&gt; rather than bloating it. A client that supports persistence also satisfies this interface:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;pc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.(&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;PersistentChatClient&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;snapshot&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;pc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Save&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;// store the snapshot somewhere&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;A snapshot is a serialisable value that captures the conversation. You store it. Next run, you load it, &lt;code&gt;Restore&lt;/code&gt; it onto a fresh client, re-register your tools, and call &lt;code&gt;Chat&lt;/code&gt; again. &amp;ldquo;Where were we?&amp;rdquo; works, because the model is handed back the whole history.&lt;/p&gt;
&lt;h2 id="a-snapshot-is-opinionated-about-what-it-carries"&gt;A snapshot is opinionated about what it carries
&lt;/h2&gt;&lt;p&gt;The interesting part is what a snapshot does and doesn&amp;rsquo;t contain, because that&amp;rsquo;s a series of deliberate decisions.&lt;/p&gt;
&lt;p&gt;It carries the messages, the system prompt, the model name, and tool &lt;em&gt;metadata&lt;/em&gt;: the names, descriptions and parameter schemas of the tools that were registered.&lt;/p&gt;
&lt;p&gt;It does not carry tool &lt;em&gt;handlers&lt;/em&gt;. Handlers are code, not data; you can&amp;rsquo;t serialise a function meaningfully, so after a restore you re-register them with &lt;code&gt;SetTools&lt;/code&gt;. The snapshot remembers that a tool called &lt;code&gt;read_file&lt;/code&gt; existed and what its shape was; it doesn&amp;rsquo;t try to remember the Go function behind it.&lt;/p&gt;
&lt;p&gt;And it does not carry API tokens. This is the one to dwell on. A snapshot is a file. A file gets synced, backed up, copied between machines, attached to a support ticket by a user trying to be helpful. A snapshot that carried the API key would be a credential leak the moment it left the laptop it was made on. So the snapshot never contains a token, at all. On restore, the client picks the credential up again the ordinary way, from &lt;a class="link" href="https://phpboyscout.uk/where-should-a-cli-keep-your-api-keys/" &gt;the environment or the keychain&lt;/a&gt;. The conversation and the secret are kept in separate places on purpose, and only one of them is ever in the file.&lt;/p&gt;
&lt;h2 id="encrypted-at-rest-if-you-want-it"&gt;Encrypted at rest, if you want it
&lt;/h2&gt;&lt;p&gt;The package ships a &lt;code&gt;FileStore&lt;/code&gt; that writes snapshots as JSON files, with &lt;code&gt;0600&lt;/code&gt; permissions in a &lt;code&gt;0700&lt;/code&gt; directory, and it can encrypt them. Pass &lt;code&gt;WithEncryption&lt;/code&gt; a 32-byte key and snapshots are written with AES-256-GCM.&lt;/p&gt;
&lt;p&gt;That option exists because a conversation can hold sensitive content even when it holds no credential. The log a user pasted in for analysis, the source file they asked the model to review, the internal details tucked into their questions: none of that is an API key, and all of it might be something you&amp;rsquo;d rather not have sitting in plain JSON in a backup somewhere. Encryption at rest covers it.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;FileStore&lt;/code&gt; is also careful about the snapshot identifiers it&amp;rsquo;s handed. An ID has to be a canonical UUID, and the resolved file path is checked to lie inside the store directory, so a snapshot ID arriving from an untrusted source (a CLI flag, a request payload) can&amp;rsquo;t be bent into a path-traversal that reads or writes somewhere it shouldn&amp;rsquo;t. Persisting conversations adds a small filesystem surface, and the store treats it as exactly that.&lt;/p&gt;
&lt;h2 id="the-short-version"&gt;The short version
&lt;/h2&gt;&lt;p&gt;A CLI tool forgets everything between invocations, which is correct for most commands and wrong for an AI conversation, because a conversation is its history.&lt;/p&gt;
&lt;p&gt;go-tool-base&amp;rsquo;s &lt;code&gt;chat&lt;/code&gt; package lets you persist one. &lt;code&gt;PersistentChatClient&lt;/code&gt; saves a snapshot you can store and restore later, picking the conversation back up where it ended. The snapshot is deliberate about its contents: messages, system prompt and tool metadata yes; tool handlers no, because they&amp;rsquo;re code you re-register; API tokens never, because a snapshot is a file and a file travels. The built-in &lt;code&gt;FileStore&lt;/code&gt; can encrypt snapshots at rest with AES-256-GCM and validates snapshot IDs against path traversal. Resumable conversations, without the conversation file turning into a place secrets leak from.&lt;/p&gt;</description></item><item><title>An AI agent that has to make the build pass</title><link>https://phpboyscout.uk/an-ai-agent-that-has-to-make-the-build-pass/</link><pubDate>Thu, 02 Apr 2026 00:00:00 +0000</pubDate><guid>https://phpboyscout.uk/an-ai-agent-that-has-to-make-the-build-pass/</guid><description>&lt;img src="https://phpboyscout.uk/an-ai-agent-that-has-to-make-the-build-pass/cover-an-ai-agent-that-has-to-make-the-build-pass.png" alt="Featured image of post An AI agent that has to make the build pass" /&gt;&lt;p&gt;Most AI code generation works on a charming little principle I&amp;rsquo;ll call generate-and-hope. The model writes the code, the model stops at the closing brace, and whether the thing actually compiles is left as an exercise for you. For a snippet you paste into an editor, fine. For a whole generated command, that&amp;rsquo;s just outsourcing the disappointment.&lt;/p&gt;
&lt;p&gt;go-tool-base does something I&amp;rsquo;m rather happier with: the AI has to make the build pass before it&amp;rsquo;s allowed to claim it&amp;rsquo;s done.&lt;/p&gt;
&lt;h2 id="generate-and-hope"&gt;Generate and hope
&lt;/h2&gt;&lt;p&gt;The usual shape of AI code generation is this. You ask for code, the model produces it, and the model&amp;rsquo;s job ends at the closing brace. Whether it compiles, whether the tests pass, whether the imports even resolve, none of that has been checked. The model produced something that &lt;em&gt;looks&lt;/em&gt; right. You find out whether it &lt;em&gt;is&lt;/em&gt; right when you build it.&lt;/p&gt;
&lt;p&gt;For a snippet you paste into an editor, that&amp;rsquo;s perfectly fine. The compiler tells you in a second. But go-tool-base&amp;rsquo;s generator, driven by &lt;code&gt;gtb generate command --script&lt;/code&gt; or &lt;code&gt;--prompt&lt;/code&gt;, produces a whole command: the implementation, its tests, the lot. &amp;ldquo;Generate and hope&amp;rdquo; at that scale means handing the user a project that may or may not build, and quietly making them the one who finds out which.&lt;/p&gt;
&lt;h2 id="drafting-is-only-step-one"&gt;Drafting is only step one
&lt;/h2&gt;&lt;p&gt;So the generator doesn&amp;rsquo;t stop at drafting. Writing the first version of the implementation and its tests is step one of two. Step two is an autonomous repair agent.&lt;/p&gt;
&lt;p&gt;Once the draft is on the filesystem, a separate agent takes over. It&amp;rsquo;s an LLM running in a loop, but a loop aimed at one narrow, checkable job: make this project build and pass its tests. It isn&amp;rsquo;t asked to be creative. It&amp;rsquo;s asked to get to green.&lt;/p&gt;
&lt;h2 id="a-fixed-set-of-tools-and-no-shell"&gt;A fixed set of tools, and no shell
&lt;/h2&gt;&lt;p&gt;The agent is not handed a shell. It&amp;rsquo;s given a &lt;a class="link" href="https://gitlab.com/phpboyscout/go-tool-base/-/blob/5c78fc9/internal/agent/tools.go" target="_blank" rel="noopener"
 &gt;fixed, defined set of tools&lt;/a&gt; and nothing else. Three of them let it explore and edit the project: &lt;code&gt;list_dir&lt;/code&gt;, &lt;code&gt;read_file&lt;/code&gt;, &lt;code&gt;write_file&lt;/code&gt;. Four of them let it verify the project:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;go_build&lt;/code&gt; runs the build and captures the compiler errors.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;go_test&lt;/code&gt; runs the tests and captures the failures.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;go_get&lt;/code&gt; resolves a missing dependency.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;golangci_lint&lt;/code&gt; runs the project&amp;rsquo;s linter.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That restriction is the design, not a limitation of it. The agent can&amp;rsquo;t delete arbitrary files, can&amp;rsquo;t reach the network, can&amp;rsquo;t run anything that isn&amp;rsquo;t on the list. It has exactly what it needs to make code compile and nothing it would need to do damage. Its file writes are confined to the project directory by an explicit path check, so even &lt;code&gt;write_file&lt;/code&gt; can&amp;rsquo;t go wandering up into &lt;code&gt;/etc&lt;/code&gt;. A coding agent you&amp;rsquo;d actually let near a filesystem is one whose abilities are an allowlist, not a denylist. (I keep coming back to that principle through this series&amp;hellip; safety as a boundary you draw, not a behaviour you hope for.)&lt;/p&gt;
&lt;h2 id="the-loop"&gt;The loop
&lt;/h2&gt;&lt;p&gt;The repair loop is a ReAct loop, the same reason-act-observe shape as &lt;a class="link" href="https://phpboyscout.uk/letting-the-ai-call-your-go-functions/" &gt;the tool-calling loop&lt;/a&gt;, only this time pointed at a goal:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The draft is on disk.&lt;/li&gt;
&lt;li&gt;Verify: run &lt;code&gt;go_build&lt;/code&gt; and &lt;code&gt;go_test&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;If verification failed, read the error logs, the compiler error or the failing test.&lt;/li&gt;
&lt;li&gt;Reason about the cause: an undefined variable, a missing import, a wrong signature.&lt;/li&gt;
&lt;li&gt;Act: call &lt;code&gt;write_file&lt;/code&gt; to patch the code, or &lt;code&gt;go_get&lt;/code&gt; to add the dependency.&lt;/li&gt;
&lt;li&gt;Loop. Steps two to five repeat until the project is green, or the agent hits its bounded step limit.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;What makes this work is treating the error output as &lt;em&gt;feedback&lt;/em&gt; rather than as a failure to log and walk away from. A compiler error is the single most useful sentence you can hand a model that&amp;rsquo;s trying to fix code. It says what&amp;rsquo;s wrong, and usually where. The loop feeds it straight back in, and the model fixes against it.&lt;/p&gt;
&lt;h2 id="verification-changes-what-done-means"&gt;Verification changes what &amp;ldquo;done&amp;rdquo; means
&lt;/h2&gt;&lt;p&gt;Here&amp;rsquo;s the real shift, and the agent&amp;rsquo;s own documentation puts it well: the agent &amp;ldquo;doesn&amp;rsquo;t just say it fixed a bug; it uses a Test tool to verify the fix before reporting success.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;A generate-and-hope model reports success when it finishes &lt;em&gt;writing&lt;/em&gt;. It has no idea whether the code works, and it isn&amp;rsquo;t really claiming otherwise. &amp;ldquo;Done&amp;rdquo; means &amp;ldquo;I produced text&amp;rdquo;. The repair agent reports success when &lt;code&gt;go_build&lt;/code&gt; and &lt;code&gt;go_test&lt;/code&gt; actually &lt;em&gt;pass&lt;/em&gt;. &amp;ldquo;Done&amp;rdquo; means &amp;ldquo;the build is green&amp;rdquo;. Those are two completely different claims, and only the second is worth anything to the person who asked for the command.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s the line between an AI that&amp;rsquo;s a creative writer and an AI that&amp;rsquo;s a collaborator you can hand a task to. And when the agent can&amp;rsquo;t reach green, when it spends its whole step budget and the project is still broken, the generator fails safely: it leaves the best-attempt code in place, commented out so the project still compiles, and tells the user what to finish by hand. There&amp;rsquo;s also an &lt;code&gt;--agentless&lt;/code&gt; flag for anyone who&amp;rsquo;d rather have a plain single-shot retry than the multi-step agent. The default, though, is the agent, because the default should be code that&amp;rsquo;s been checked.&lt;/p&gt;
&lt;h2 id="where-this-leaves-us"&gt;Where this leaves us
&lt;/h2&gt;&lt;p&gt;Most AI code generation generates and hopes: the model writes code and the user discovers whether it works. For a whole generated command, that pushes a may-or-may-not-build project onto the user.&lt;/p&gt;
&lt;p&gt;go-tool-base&amp;rsquo;s generator drafts the command and then hands it to an autonomous repair agent. The agent has a fixed set of tools (explore and edit the project, build it, test it, lint it, fetch dependencies) and no shell at all, with file writes confined to the project directory. It runs a ReAct loop, reading each error and patching against it, until the build is green or it exhausts its steps. The point is what &amp;ldquo;done&amp;rdquo; comes to mean: not &amp;ldquo;the model finished writing&amp;rdquo;, but &amp;ldquo;the build passes&amp;rdquo;. Only one of those is a claim worth trusting.&lt;/p&gt;</description></item><item><title>Stop regex-ing the LLM's prose</title><link>https://phpboyscout.uk/stop-regexing-the-llms-prose/</link><pubDate>Tue, 31 Mar 2026 00:00:00 +0000</pubDate><guid>https://phpboyscout.uk/stop-regexing-the-llms-prose/</guid><description>&lt;img src="https://phpboyscout.uk/stop-regexing-the-llms-prose/cover-stop-regexing-the-llms-prose.png" alt="Featured image of post Stop regex-ing the LLM's prose" /&gt;&lt;p&gt;Ask an LLM a question and it hands you back prose. Lovely to read, miserable to program against. You wanted the one number buried in the middle of it, and now you&amp;rsquo;re writing a regular expression to fish a word out of three well-written paragraphs that phrase themselves slightly differently every single time you run them.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s a much better way, and it&amp;rsquo;s the difference between forever interpreting an LLM and actually building on one.&lt;/p&gt;
&lt;h2 id="the-problem-with-a-paragraph"&gt;The problem with a paragraph
&lt;/h2&gt;&lt;p&gt;You ask an LLM to analyse a log file and tell you the severity of what it found and a suggested fix. It comes back with three well-written paragraphs. Somewhere in there is the word &amp;ldquo;critical&amp;rdquo;, and somewhere is the fix.&lt;/p&gt;
&lt;p&gt;Your program now has to &lt;em&gt;extract&lt;/em&gt; those two facts from prose, and prose has no contract. The next run, the model phrases it differently. It leads with a caveat. It says &amp;ldquo;severe&amp;rdquo; where last time it said &amp;ldquo;critical&amp;rdquo;. It puts the fix first. Anything that worked by finding &amp;ldquo;critical&amp;rdquo; in the text is now quietly wrong, and you didn&amp;rsquo;t change a line. Parsing free text for structured facts is a game you lose slowly.&lt;/p&gt;
&lt;p&gt;What you actually wanted was never a paragraph. It was a value: a thing with a &lt;code&gt;severity&lt;/code&gt; field and a &lt;code&gt;fix&lt;/code&gt; field, that you can branch on and store and pass around like any other.&lt;/p&gt;
&lt;h2 id="ask-for-the-struct-not-the-prose"&gt;Ask for the struct, not the prose
&lt;/h2&gt;&lt;p&gt;go-tool-base&amp;rsquo;s &lt;code&gt;chat&lt;/code&gt; package draws the line with two methods. &lt;code&gt;Chat&lt;/code&gt; gives you text. &lt;code&gt;Ask&lt;/code&gt; gives you a struct.&lt;/p&gt;
&lt;p&gt;You define the Go type you want back:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Analysis&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Severity&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;`json:&amp;#34;severity&amp;#34;`&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Fix&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;`json:&amp;#34;fix&amp;#34;`&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Analysis&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Ask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;Analyse this log file: &amp;#34;&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="nx"&gt;logText&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The framework generates a JSON Schema from that struct, sends it to the model as the required response format, and unmarshals the reply straight into &lt;code&gt;result&lt;/code&gt;. You never lay a finger on the prose. You get &lt;code&gt;result.Severity&lt;/code&gt; and &lt;code&gt;result.Fix&lt;/code&gt;, typed, ready to use. If you want the model&amp;rsquo;s answer to drive a &lt;code&gt;switch&lt;/code&gt; statement, this is the method that lets it.&lt;/p&gt;
&lt;h2 id="the-struct-is-the-schema-is-the-contract"&gt;The struct is the schema is the contract
&lt;/h2&gt;&lt;p&gt;The detail that makes this hold up over time: you don&amp;rsquo;t write the schema. The struct &lt;em&gt;is&lt;/em&gt; the schema.&lt;/p&gt;
&lt;p&gt;The framework derives the JSON Schema from your type. In go-tool-base that&amp;rsquo;s &lt;a class="link" href="https://gitlab.com/phpboyscout/go-tool-base/-/blob/5c78fc9/pkg/chat/schema.go#L11" target="_blank" rel="noopener"
 &gt;&lt;code&gt;GenerateSchema[T]()&lt;/code&gt;&lt;/a&gt;; in rust-tool-base the schema comes from your Rust type through &lt;a class="link" href="https://gitlab.com/phpboyscout/rust-tool-base/-/blob/9c22aa8/crates/rtb-ai/src/client.rs#L208" target="_blank" rel="noopener"
 &gt;&lt;code&gt;schemars&lt;/code&gt;&lt;/a&gt;. (Yes, there&amp;rsquo;s a Rust sibling now. I&amp;rsquo;ll &lt;a class="link" href="https://phpboyscout.uk/rust-tool-base-the-same-idea/" &gt;introduce it properly&lt;/a&gt; in a few weeks, but it keeps gatecrashing these posts because the two frameworks deliberately share ideas.) Either way there&amp;rsquo;s one definition, your type, and the schema is just a projection of it.&lt;/p&gt;
&lt;p&gt;That matters, because otherwise two things have to agree. There&amp;rsquo;s the schema you tell the model to obey, and there&amp;rsquo;s the type you unmarshal the answer into. Hand-write the schema and those two can drift: add a field to the struct, forget to add it to the schema, and the model is never told to produce it, so it silently never appears. Deriving the schema from the type collapses the two into one. They can&amp;rsquo;t disagree, because there&amp;rsquo;s only one of them.&lt;/p&gt;
&lt;h2 id="both-frameworks-with-one-extra-step-in-rust"&gt;Both frameworks, with one extra step in Rust
&lt;/h2&gt;&lt;p&gt;go-tool-base does this with &lt;code&gt;Ask&lt;/code&gt; and a &lt;code&gt;ResponseSchema&lt;/code&gt; set on the client config. rust-tool-base does it with &lt;code&gt;chat_structured::&amp;lt;T&amp;gt;&lt;/code&gt;, where &lt;code&gt;T&lt;/code&gt; is any type that&amp;rsquo;s both deserialisable and &lt;code&gt;JsonSchema&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;rust-tool-base adds one step worth calling out. Before it deserialises the model&amp;rsquo;s reply into your &lt;code&gt;T&lt;/code&gt;, it &lt;em&gt;validates&lt;/em&gt; the raw response against the schema with a JSON Schema validator. That splits the failure into two distinct, named cases: the response didn&amp;rsquo;t match the schema, or it matched the schema but still wouldn&amp;rsquo;t deserialise. A model that returns subtly wrong JSON fails loudly and specifically, with an error that tells you which of those happened, instead of quietly handing you a zero-valued struct that you end up debugging an hour later.&lt;/p&gt;
&lt;h2 id="when-youd-reach-for-it"&gt;When you&amp;rsquo;d reach for it
&lt;/h2&gt;&lt;p&gt;The line is simple, and it&amp;rsquo;s about who reads the answer.&lt;/p&gt;
&lt;p&gt;If a &lt;em&gt;human&lt;/em&gt; reads the answer, prose is right. &lt;code&gt;Chat&lt;/code&gt;, free text, let the model write well. A summary, an explanation, an interactive reply: leave all of those as prose.&lt;/p&gt;
&lt;p&gt;If a &lt;em&gt;program&lt;/em&gt; consumes the answer, you want a value. Classification, extraction, a code review scored out of a hundred with a list of issues, a yes-or-no with reasons: anything where the next thing that happens is your code branching on the result. There, &lt;code&gt;Ask&lt;/code&gt; and &lt;code&gt;chat_structured&lt;/code&gt; turn the LLM from something you have to interpret into something that returns a value, and a typed value is a thing you can actually build on.&lt;/p&gt;
&lt;h2 id="to-sum-up"&gt;To sum up
&lt;/h2&gt;&lt;p&gt;An LLM returns prose by default, and prose has no contract, so a program that picks structured facts out of it breaks the moment the model rephrases.&lt;/p&gt;
&lt;p&gt;Structured output asks for the value instead. You define a struct, the framework derives a JSON Schema from it, the model is constrained to that shape, and you get a typed result. go-tool-base&amp;rsquo;s &lt;code&gt;Ask&lt;/code&gt; and rust-tool-base&amp;rsquo;s &lt;code&gt;chat_structured&lt;/code&gt; both work this way, with the schema derived from your type so the schema and the type can&amp;rsquo;t drift; rust-tool-base additionally validates the response against the schema before deserialising. Use it whenever the answer feeds code rather than a human. It&amp;rsquo;s one of the four methods that make up &lt;a class="link" href="https://phpboyscout.uk/an-ai-interface-that-fits-on-one-screen/" &gt;go-tool-base&amp;rsquo;s small chat interface&lt;/a&gt;, and it&amp;rsquo;s the one that makes an LLM safe to program against.&lt;/p&gt;</description></item><item><title>Letting the AI call your Go functions</title><link>https://phpboyscout.uk/letting-the-ai-call-your-go-functions/</link><pubDate>Sun, 29 Mar 2026 00:00:00 +0000</pubDate><guid>https://phpboyscout.uk/letting-the-ai-call-your-go-functions/</guid><description>&lt;img src="https://phpboyscout.uk/letting-the-ai-call-your-go-functions/cover-letting-the-ai-call-your-go-functions.png" alt="Featured image of post Letting the AI call your Go functions" /&gt;&lt;p&gt;An AI that can only produce text can &lt;em&gt;describe&lt;/em&gt; your system. An AI that can call your Go functions can actually operate it. That gap, between describing and doing, is the difference between a chatbot and something genuinely useful, and crossing it comes down to one fiddly mechanism: tool-calling, and the loop that drives it.&lt;/p&gt;
&lt;h2 id="talking-about-the-system-versus-operating-it"&gt;Talking about the system versus operating it
&lt;/h2&gt;&lt;p&gt;Wire an AI provider into a CLI command and you get something that can talk. Ask it a question, get a paragraph back. Useful, up to a point.&lt;/p&gt;
&lt;p&gt;But notice the ceiling. An AI that can only generate text can &lt;em&gt;describe&lt;/em&gt; things. It can tell you what it would do. What it can&amp;rsquo;t do is look at the actual current state of your system, or take a real action, because it has no hands. It&amp;rsquo;s reasoning in a vacuum about a world it can&amp;rsquo;t reach out and touch.&lt;/p&gt;
&lt;p&gt;The thing that gives it hands is tool-calling. You hand the AI a set of functions it&amp;rsquo;s allowed to call. Now, mid-conversation, it can decide it needs to &lt;em&gt;read that file&lt;/em&gt; before it can answer, or &lt;em&gt;run that query&lt;/em&gt;, or &lt;em&gt;check that status&lt;/em&gt;, and actually go and do it, and then reason about the real result. The AI stops describing your system and starts operating it.&lt;/p&gt;
&lt;h2 id="the-loop-is-the-hard-part"&gt;The loop is the hard part
&lt;/h2&gt;&lt;p&gt;Tool-calling has a shape, and the shape is a loop. The literature calls it ReAct: Reason, Act, Observe.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The AI &lt;strong&gt;reasons&lt;/strong&gt; about the prompt and decides whether it needs a tool.&lt;/li&gt;
&lt;li&gt;If it does, it &lt;strong&gt;acts&lt;/strong&gt;, asking for a specific tool with specific arguments.&lt;/li&gt;
&lt;li&gt;Your code runs the tool and feeds the result back. The AI &lt;strong&gt;observes&lt;/strong&gt; that result.&lt;/li&gt;
&lt;li&gt;Round again. Reason about the new information, maybe call another tool, maybe several. Keep going until the AI has what it needs and produces a final text answer with no more tool calls.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Conceptually simple. Tedious and error-prone to implement by hand every single time: parsing the model&amp;rsquo;s tool-call requests, dispatching to the right function, marshalling arguments in and results out, feeding observations back in the exact format the provider expects, knowing when to stop, and not looping forever if the model gets itself stuck.&lt;/p&gt;
&lt;p&gt;That orchestration is pure plumbing, and it&amp;rsquo;s identical for every tool and every command. So you can probably guess what&amp;rsquo;s coming: go-tool-base&amp;rsquo;s &lt;code&gt;chat&lt;/code&gt; package owns it. You don&amp;rsquo;t write the loop. You write the tools.&lt;/p&gt;
&lt;h2 id="defining-a-tool"&gt;Defining a tool
&lt;/h2&gt;&lt;p&gt;A &lt;code&gt;chat.Tool&lt;/code&gt; is four things: a name, a description, a parameter schema, and a handler. The description is what the AI reads to decide &lt;em&gt;whether&lt;/em&gt; to use the tool, so it&amp;rsquo;s worth writing well. The schema describes the arguments, and you don&amp;rsquo;t hand-write it. You write a tagged Go struct and let it generate:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ReadFileParams&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Path&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;`json:&amp;#34;path&amp;#34; jsonschema_description:&amp;#34;Relative path to the file&amp;#34;`&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The struct is the contract. The framework derives the JSON Schema the AI is given straight from those tags, so the schema and the Go type the handler receives can&amp;rsquo;t drift apart, because they share a single source. The handler is then just an ordinary Go function that takes those parameters and returns a result.&lt;/p&gt;
&lt;p&gt;You register your tools with &lt;code&gt;SetTools&lt;/code&gt;, call &lt;code&gt;Chat&lt;/code&gt;, and that&amp;rsquo;s the whole of your involvement. The framework runs the ReAct loop and &lt;code&gt;Chat&lt;/code&gt; returns the AI&amp;rsquo;s final text answer once the loop settles.&lt;/p&gt;
&lt;h2 id="two-details-that-show-it-was-built-for-real-use"&gt;Two details that show it was built for real use
&lt;/h2&gt;&lt;p&gt;A couple of decisions in the loop tell you it&amp;rsquo;s meant for production, not a demo.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Tool errors don&amp;rsquo;t abort the conversation.&lt;/strong&gt; When a handler returns an error, the framework doesn&amp;rsquo;t crash the loop. It hands the error &lt;em&gt;back to the AI as a string&lt;/em&gt;, as just another observation. That&amp;rsquo;s deliberate, and it&amp;rsquo;s right. A real agent should be able to call a tool, watch it fail, and react: try different arguments, take a different route, or tell the user it couldn&amp;rsquo;t manage it. A loop that aborted on the first tool error would be far more brittle than the model driving it.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The loop is bounded.&lt;/strong&gt; There&amp;rsquo;s a &lt;a class="link" href="https://gitlab.com/phpboyscout/go-tool-base/-/blob/5c78fc9/pkg/chat/constants.go#L16" target="_blank" rel="noopener"
 &gt;&lt;code&gt;MaxSteps&lt;/code&gt; limit, default 20&lt;/a&gt;. An AI that gets confused could otherwise call tools forever, and a CLI command that never returns is a worse failure than a wrong answer. The cap guarantees the command terminates. The agent gets room to genuinely work a problem across many steps, but not infinite room to flail about in.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s also parallel tool execution: when the model asks for several tools in a single step (three independent file reads, say) the framework runs them concurrently rather than one after another, because there&amp;rsquo;s no reason to make the AI sit and wait out a sequence of things that don&amp;rsquo;t depend on each other.&lt;/p&gt;
&lt;h2 id="boiling-it-down"&gt;Boiling it down
&lt;/h2&gt;&lt;p&gt;A text-only AI can describe your system; an AI that can call your functions can operate it. Bridging that gap means tool-calling, and tool-calling means the ReAct loop (reason, act, observe, repeat) whose orchestration is fiddly, identical every time, and not a problem worth solving twice.&lt;/p&gt;
&lt;p&gt;go-tool-base&amp;rsquo;s &lt;code&gt;chat&lt;/code&gt; package runs the loop for you. You define &lt;code&gt;chat.Tool&lt;/code&gt; values (name, description, a tagged parameter struct that generates its own schema, a handler), call &lt;code&gt;SetTools&lt;/code&gt; and &lt;code&gt;Chat&lt;/code&gt;, and get the final answer. Tool errors go back to the AI as observations so it can recover, and a &lt;code&gt;MaxSteps&lt;/code&gt; cap guarantees the command always terminates. You write Go functions. The framework turns them into things an agent can reach for.&lt;/p&gt;</description></item><item><title>Nobody reads the manual</title><link>https://phpboyscout.uk/nobody-reads-the-manual/</link><pubDate>Sun, 29 Mar 2026 00:00:00 +0000</pubDate><guid>https://phpboyscout.uk/nobody-reads-the-manual/</guid><description>&lt;img src="https://phpboyscout.uk/nobody-reads-the-manual/cover-nobody-reads-the-manual.png" alt="Featured image of post Nobody reads the manual" /&gt;&lt;p&gt;Let me describe the actual lifecycle of a user meeting your CLI tool, because it&amp;rsquo;s a bit humbling. They run it. It doesn&amp;rsquo;t quite do what they expected. They run it again with &lt;code&gt;--help&lt;/code&gt;. They get a wall of monospaced flag descriptions, skim it, don&amp;rsquo;t find the thing they wanted, and either give up or go and ask a human who already knows.&lt;/p&gt;
&lt;p&gt;Your documentation might be magnificent. It doesn&amp;rsquo;t matter, because the user never reached it.&lt;/p&gt;
&lt;h2 id="the-manual-loses-on-location-not-quality"&gt;The manual loses on location, not quality
&lt;/h2&gt;&lt;p&gt;That&amp;rsquo;s the lifecycle, and notice exactly where it breaks. The documentation might be excellent. It might answer their precise question in full. It doesn&amp;rsquo;t matter, because it&amp;rsquo;s on a website, in another window, behind a search box, and the user is &lt;em&gt;here&lt;/em&gt;, in the terminal, mid-task. The docs lost not on quality but on &lt;em&gt;location&lt;/em&gt;. They simply weren&amp;rsquo;t where the work was.&lt;/p&gt;
&lt;p&gt;go-tool-base&amp;rsquo;s answer starts with a decision about location: the documentation gets embedded into the binary itself. Your &lt;code&gt;docs/&lt;/code&gt; folder ships &lt;em&gt;inside&lt;/em&gt; the tool, the same way its default config does. Wherever the tool is installed, the docs are right there alongside it, no network, no browser. That embedding is what makes everything else possible, and there are two things built on top of it.&lt;/p&gt;
&lt;h2 id="a-browser-in-the-terminal"&gt;A browser, in the terminal
&lt;/h2&gt;&lt;p&gt;The first is the &lt;code&gt;docs&lt;/code&gt; command, and it&amp;rsquo;s not &lt;code&gt;--help&lt;/code&gt; with extra steps. It launches a proper Terminal User Interface, built on Bubble Tea.&lt;/p&gt;
&lt;p&gt;It has a sidebar, structured from the project&amp;rsquo;s own &lt;a class="link" href="https://gitlab.com/phpboyscout/go-tool-base/-/blob/5c78fc9/pkg/docs/docs.go#L25" target="_blank" rel="noopener"
 &gt;&lt;code&gt;mkdocs.yml&lt;/code&gt;&lt;/a&gt;, so the docs are a navigable tree rather than one flat scroll. Markdown renders with real formatting through Glamour (colour, tables, lists, headings) instead of collapsing into monospaced soup. There&amp;rsquo;s live search across every page, regex included.&lt;/p&gt;
&lt;p&gt;Compared with &lt;code&gt;man&lt;/code&gt; and &lt;code&gt;--help&lt;/code&gt;, the difference isn&amp;rsquo;t a nicer coat of paint. &lt;code&gt;man&lt;/code&gt; gives you linear scrolling and grep; this gives you a structured tree, rich rendering and real search. It&amp;rsquo;s the documentation experience a modern developer expects, except it followed the tool &lt;em&gt;into&lt;/em&gt; the terminal instead of demanding the user leave it.&lt;/p&gt;
&lt;h2 id="a-documentation-assistant-that-wont-make-things-up"&gt;A documentation assistant that won&amp;rsquo;t make things up
&lt;/h2&gt;&lt;p&gt;The second thing built on the embedded docs is the one I find genuinely transformative: &lt;code&gt;docs ask&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The user doesn&amp;rsquo;t navigate anything. They just ask:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;mytool docs ask &lt;span class="s2"&gt;&amp;#34;how do I point this at a self-hosted server?&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and get a direct, specific answer. Under the hood, the framework collates the tool&amp;rsquo;s embedded markdown and hands it to the configured AI provider (Claude, OpenAI, Gemini, Claude Local, any OpenAI-compatible endpoint) as the context for the question.&lt;/p&gt;
&lt;p&gt;Now, &amp;ldquo;an AI answers questions about my tool&amp;rdquo; should immediately make you nervous, and the correct thing to be nervous about is hallucination. An AI that confidently invents a flag that doesn&amp;rsquo;t exist, or describes behaviour the tool simply doesn&amp;rsquo;t have, is worse than no assistant at all, because the user &lt;em&gt;trusts&lt;/em&gt; it.&lt;/p&gt;
&lt;p&gt;This is where embedding the docs pays off a second time, and it&amp;rsquo;s why I keep stressing that the corpus is &lt;em&gt;closed&lt;/em&gt;. The model is instructed to answer &lt;strong&gt;only&lt;/strong&gt; from the tool&amp;rsquo;s actual documentation, and the context it&amp;rsquo;s handed is exactly that documentation and nothing else. It isn&amp;rsquo;t drawing on a vague memory of similar tools from its training data. It&amp;rsquo;s answering from this tool&amp;rsquo;s real, shipped, version-matched docs. The corpus is small, closed and authoritative, which is the combination that keeps the answers honest. &amp;ldquo;Zero hallucination by design&amp;rdquo; isn&amp;rsquo;t a slogan about the model. It&amp;rsquo;s a property of bounding what the model is allowed to look at, which is the same instinct I &lt;a class="link" href="https://phpboyscout.uk/your-cli-is-already-an-ai-tool/" &gt;leaned on with the &lt;code&gt;mcp&lt;/code&gt; command&lt;/a&gt;: the safety comes from the boundary you drew, not from trusting the AI to behave itself.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s a nice second-order effect, too. The answer is always about the version of the tool the user actually has, because the docs were embedded into &lt;em&gt;that build&lt;/em&gt;. No mismatch between a website documenting the latest release and the slightly older binary sitting on the user&amp;rsquo;s machine.&lt;/p&gt;
&lt;h2 id="the-upshot"&gt;The upshot
&lt;/h2&gt;&lt;p&gt;Documentation usually loses to &lt;code&gt;--help&lt;/code&gt; not on quality but on location: it&amp;rsquo;s in a browser, and the user is in the terminal. go-tool-base embeds the docs into the binary and surfaces them two ways: a &lt;code&gt;docs&lt;/code&gt; command that&amp;rsquo;s a real TUI browser with a sidebar, rich markdown and search, and &lt;code&gt;docs ask&lt;/code&gt;, which answers natural-language questions using the embedded docs as context.&lt;/p&gt;
&lt;p&gt;Because that context is the tool&amp;rsquo;s own closed, shipped documentation and the model is told to use nothing else, the assistant stays grounded, and it&amp;rsquo;s always describing the exact version the user is holding. The fix for unread documentation was never to write more of it. It was to put it where the work happens and let it answer back.&lt;/p&gt;</description></item><item><title>An AI interface that fits on one screen</title><link>https://phpboyscout.uk/an-ai-interface-that-fits-on-one-screen/</link><pubDate>Fri, 27 Mar 2026 00:00:00 +0000</pubDate><guid>https://phpboyscout.uk/an-ai-interface-that-fits-on-one-screen/</guid><description>&lt;img src="https://phpboyscout.uk/an-ai-interface-that-fits-on-one-screen/cover-an-ai-interface-that-fits-on-one-screen.png" alt="Featured image of post An AI interface that fits on one screen" /&gt;&lt;p&gt;The moment you decide a CLI tool should talk to an LLM, there&amp;rsquo;s a strong gravitational pull towards reaching for LangChain, or one of its many relatives. It&amp;rsquo;s the obvious move. It&amp;rsquo;s also, for most CLI work, a bit like hiring a removals firm to carry a single box up the stairs.&lt;/p&gt;
&lt;p&gt;Let me explain why go-tool-base went the other way, and what &amp;ldquo;the other way&amp;rdquo; actually looks like.&lt;/p&gt;
&lt;h2 id="the-instinct-and-why-it-overshoots"&gt;The instinct, and why it overshoots
&lt;/h2&gt;&lt;p&gt;When you add AI to a tool, the instinct is to reach for the big general-purpose framework. LangChain and its relatives are capable, and they exist for a real need: orchestrating complex multi-step AI applications, with retrieval pipelines, memory stores, chains of calls, whole fleets of agents.&lt;/p&gt;
&lt;p&gt;Now look at what a CLI tool actually needs from an LLM. It needs to send a prompt and get text back. Sometimes it wants structured data back instead of prose. Sometimes it wants to let the model call a few of the tool&amp;rsquo;s own functions. That&amp;rsquo;s pretty much the whole list.&lt;/p&gt;
&lt;p&gt;Pulling in a framework built to orchestrate retrieval and agent swarms in order to do &lt;em&gt;that&lt;/em&gt; is a poor trade. You take on a large new vocabulary of concepts, a wide dependency surface, and a great deal of abstraction you&amp;rsquo;ll never touch, all to perform three or four operations. The framework isn&amp;rsquo;t wrong. It&amp;rsquo;s just answering a far bigger question than the one a CLI tool is asking.&lt;/p&gt;
&lt;h2 id="what-go-tool-base-chose-instead"&gt;What go-tool-base chose instead
&lt;/h2&gt;&lt;p&gt;go-tool-base didn&amp;rsquo;t reach for a framework. The decision is on the record in its own design notes: before a single line was written, LangChain Go, go-openai, Vercel&amp;rsquo;s AI SDK and around ten other options were evaluated, and not one of them matched what a CLI framework actually needs. So the &lt;code&gt;chat&lt;/code&gt; package was built deliberately small.&lt;/p&gt;
&lt;p&gt;How small? The entire core &lt;a class="link" href="https://gitlab.com/phpboyscout/go-tool-base/-/blob/5c78fc9/pkg/chat/client.go" target="_blank" rel="noopener"
 &gt;&lt;code&gt;ChatClient&lt;/code&gt;&lt;/a&gt; interface is four methods:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ChatClient&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;interface&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;Chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;Ask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;question&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;SetTools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;tools&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="nx"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;code&gt;Add&lt;/code&gt; appends a message to the conversation. &lt;code&gt;Chat&lt;/code&gt; sends a prompt and returns text. &lt;code&gt;Ask&lt;/code&gt; sends a prompt and returns a &lt;em&gt;typed Go struct&lt;/em&gt;, the model&amp;rsquo;s answer unmarshalled straight into a value you defined. &lt;code&gt;SetTools&lt;/code&gt; hands the model a set of your own functions it&amp;rsquo;s allowed to call. That&amp;rsquo;s the whole surface. Downstream code that uses AI never holds anything larger than this, and never has to know which provider is behind it.&lt;/p&gt;
&lt;p&gt;The package&amp;rsquo;s own documentation has a word for this: right-sized. Large enough to solve genuine provider-abstraction complexity, small enough that the full interface fits on a single screen.&lt;/p&gt;
&lt;h2 id="thin-is-not-the-same-as-does-little"&gt;&amp;ldquo;Thin&amp;rdquo; is not the same as &amp;ldquo;does little&amp;rdquo;
&lt;/h2&gt;&lt;p&gt;This is the part worth being precise about, because &amp;ldquo;four methods&amp;rdquo; can sound like &amp;ldquo;barely does anything&amp;rdquo;, and that&amp;rsquo;s the wrong read entirely.&lt;/p&gt;
&lt;p&gt;Behind those four methods sits genuinely awkward work. Five providers (OpenAI, Claude, Gemini, a locally installed &lt;code&gt;claude&lt;/code&gt; binary, and any OpenAI-compatible endpoint) each with a different wire API, all normalised behind the one interface. A &lt;a class="link" href="https://phpboyscout.uk/letting-the-ai-call-your-go-functions/" &gt;tool-calling loop&lt;/a&gt;. Structured output via JSON Schema, made to behave consistently across providers that each express it differently. Error normalisation. Token chunking.&lt;/p&gt;
&lt;p&gt;The point of a thin abstraction is not that there&amp;rsquo;s little underneath it. It&amp;rsquo;s that the &lt;em&gt;interface&lt;/em&gt; stays small while the &lt;em&gt;implementation&lt;/em&gt; quietly absorbs the complexity. Four methods on the surface; five provider integrations and a tool-calling loop below the waterline. The thinness is a property of what the caller sees, not of what the package does. A reach-for-LangChain decision gets that backwards: it exposes the caller to all the machinery, whether or not the caller will ever need it.&lt;/p&gt;
&lt;h2 id="the-core-stays-small-even-as-features-grow"&gt;The core stays small even as features grow
&lt;/h2&gt;&lt;p&gt;There&amp;rsquo;s a neat detail in how &lt;code&gt;chat&lt;/code&gt; keeps the interface from creeping. The package also supports streaming responses and conversation persistence, both of which are real features with real surface area. Neither of them is in the four-method core.&lt;/p&gt;
&lt;p&gt;Instead they&amp;rsquo;re &lt;em&gt;separate, optional&lt;/em&gt; interfaces. A streaming-capable client also satisfies &lt;code&gt;StreamingChatClient&lt;/code&gt;; a persistable one also satisfies &lt;code&gt;PersistentChatClient&lt;/code&gt;. Code that wants those capabilities does a type assertion to ask for them, and code that doesn&amp;rsquo;t simply never sees them. So the common path stays four methods forever. New capabilities arrive as opt-in interfaces alongside the core, not as new methods bolted onto it. The thing that fits on one screen keeps fitting on one screen.&lt;/p&gt;
&lt;h2 id="extensible-without-forking-testable-without-a-network"&gt;Extensible without forking, testable without a network
&lt;/h2&gt;&lt;p&gt;Two more properties keep the package small without making it limiting.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s extensible. The provider list isn&amp;rsquo;t closed. A &lt;code&gt;RegisterProvider&lt;/code&gt; call lets any package contribute a new provider, and &lt;code&gt;chat.New&lt;/code&gt; will route to it. You add a backend without forking &lt;code&gt;pkg/chat&lt;/code&gt; or sending a patch upstream.&lt;/p&gt;
&lt;p&gt;And it&amp;rsquo;s testable. The package ships generated mocks. A downstream tool&amp;rsquo;s AI features can be tested against a mock &lt;code&gt;ChatClient&lt;/code&gt; returning canned responses, with no network, no API key, and no flakiness. Because the interface is four methods, that mock is trivial to set up and complete by construction. A sprawling framework interface is a sprawling thing to fake; a four-method one is not. (I&amp;rsquo;ll come back to testing AI code properly &lt;a class="link" href="https://phpboyscout.uk/testing-code-that-calls-an-llm/" &gt;in a later post&lt;/a&gt;, because it deserves a whole article of its own.)&lt;/p&gt;
&lt;h2 id="the-right-size"&gt;The right size
&lt;/h2&gt;&lt;p&gt;When a CLI tool needs AI, the instinct is a large framework like LangChain. For orchestrating retrieval pipelines and agent swarms, that&amp;rsquo;s exactly the right tool. For sending a prompt, getting a struct back, and letting the model call a few functions, it&amp;rsquo;s enormous overkill.&lt;/p&gt;
&lt;p&gt;go-tool-base&amp;rsquo;s &lt;code&gt;chat&lt;/code&gt; package is the deliberate alternative, chosen only after LangChain Go and a dozen others were weighed up and rejected. Its core &lt;code&gt;ChatClient&lt;/code&gt; interface is four methods. Underneath sit five normalised providers, a tool-calling loop, structured output and error handling, but the caller sees four methods and never learns which provider is active. Streaming and persistence are opt-in interfaces beside the core, not additions to it. It extends without forking and tests without a network. Right-sized: the complexity is real, but it lives under the interface rather than in it.&lt;/p&gt;</description></item><item><title>Your CLI is already an AI tool</title><link>https://phpboyscout.uk/your-cli-is-already-an-ai-tool/</link><pubDate>Thu, 19 Mar 2026 00:00:00 +0000</pubDate><guid>https://phpboyscout.uk/your-cli-is-already-an-ai-tool/</guid><description>&lt;img src="https://phpboyscout.uk/your-cli-is-already-an-ai-tool/cover-your-cli-is-already-an-ai-tool.png" alt="Featured image of post Your CLI is already an AI tool" /&gt;&lt;p&gt;&amp;ldquo;Make it work with AI&amp;rdquo; has become one of those requests that lands on a developer&amp;rsquo;s desk with a thud and not much further detail attached. My instinct, the first time, was to brace for a big lump of integration work&amp;hellip; a bespoke adapter for this assistant, another for that one, a treadmill of little wrappers stretching off into the distance.&lt;/p&gt;
&lt;p&gt;Turns out I&amp;rsquo;d already done most of the work. So have you, if your CLI tool is any good. Let me explain what I mean.&lt;/p&gt;
&lt;h2 id="you-already-described-your-capabilities"&gt;You already described your capabilities
&lt;/h2&gt;&lt;p&gt;Stop and think for a second about what a well-built CLI tool actually is. It&amp;rsquo;s a set of named operations, each with a human-readable description, each taking a set of typed, named, documented parameters. You wrote all of that already, because a CLI without it is unusable by &lt;em&gt;people&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Now look at what an AI assistant needs in order to call a tool. A set of named operations. A description of each, so it knows when to reach for them. A typed parameter schema for each, so it knows how to call them.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s the same list! A good CLI is already, structurally, a description of a set of capabilities. The information an AI agent needs isn&amp;rsquo;t extra work you have to go and do. It&amp;rsquo;s work you finished the moment your &lt;code&gt;--help&lt;/code&gt; output was any good.&lt;/p&gt;
&lt;p&gt;The only thing missing is a translator. Something that takes &amp;ldquo;this is a CLI&amp;rdquo; and presents it as &amp;ldquo;this is a set of tools an AI can call&amp;rdquo;.&lt;/p&gt;
&lt;h2 id="mcp-is-that-translator-and-its-a-standard"&gt;MCP is that translator, and it&amp;rsquo;s a standard
&lt;/h2&gt;&lt;p&gt;The temptation, when you want your tool to be AI-usable, is to sit down and write an integration. A little adapter for Claude Desktop. Another for Cursor. Another for whatever turns up next month. Each one a bespoke wrapper, each one a thing to maintain, and the list never stops growing because new assistants keep appearing. That&amp;rsquo;s the treadmill I was bracing for.&lt;/p&gt;
&lt;p&gt;The Model Context Protocol exists to kill that list. MCP is an open standard for how an AI model discovers and calls local tools. Implement it once and your tool works with every assistant that speaks it. Write once, not once-per-client.&lt;/p&gt;
&lt;p&gt;So go-tool-base implements it once, in the framework, for everyone. (That&amp;rsquo;s rather the theme of this whole series, if you hadn&amp;rsquo;t spotted it yet&amp;hellip; do the annoying thing once, properly, in a place where every tool inherits it.)&lt;/p&gt;
&lt;h2 id="the-mcp-command-and-the-mapping-it-does-for-free"&gt;The &lt;code&gt;mcp&lt;/code&gt; command, and the mapping it does for free
&lt;/h2&gt;&lt;p&gt;Every tool built on go-tool-base inherits a built-in &lt;a class="link" href="https://gitlab.com/phpboyscout/go-tool-base/-/blob/5c78fc9/pkg/props/tool.go#L15" target="_blank" rel="noopener"
 &gt;&lt;code&gt;mcp&lt;/code&gt; command&lt;/a&gt;. Run it:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;mytool mcp
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and the tool starts a JSON-RPC server over standard I/O, speaking MCP. That&amp;rsquo;s the whole user-facing surface. One command.&lt;/p&gt;
&lt;p&gt;Behind it, the framework walks your Cobra command tree and maps it straight onto MCP tool definitions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Each &lt;strong&gt;command&lt;/strong&gt; becomes a &lt;strong&gt;tool&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Each command&amp;rsquo;s &lt;strong&gt;short description&lt;/strong&gt; becomes the &lt;strong&gt;tool&amp;rsquo;s description&lt;/strong&gt;, the text the AI reads to decide whether this is the tool it wants.&lt;/li&gt;
&lt;li&gt;Each command&amp;rsquo;s &lt;strong&gt;flags and arguments&lt;/strong&gt; become the tool&amp;rsquo;s &lt;strong&gt;JSON Schema parameters&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There&amp;rsquo;s no second schema to write and then keep in sync (and we all know how well &amp;ldquo;keep these two things aligned by hand&amp;rdquo; tends to go). The command tree &lt;em&gt;is&lt;/em&gt; the schema. Add a new command to your CLI and it&amp;rsquo;s a new tool for the agent, automatically, with the description and flags you already gave it. Nobody has to remember to update an MCP manifest, because there&amp;rsquo;s no separate MCP manifest to forget about.&lt;/p&gt;
&lt;h2 id="configuring-an-assistant-to-use-it"&gt;Configuring an assistant to use it
&lt;/h2&gt;&lt;p&gt;On the assistant&amp;rsquo;s side it&amp;rsquo;s just as undramatic. You tell your AI client (Claude Desktop, Cursor, anything MCP-aware) to launch &lt;code&gt;mytool mcp&lt;/code&gt;. From then on the assistant:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Starts your tool in MCP mode when it boots.&lt;/li&gt;
&lt;li&gt;Discovers every command as a callable tool.&lt;/li&gt;
&lt;li&gt;Calls the right one, with the right parameters, when a user&amp;rsquo;s request needs it.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Your CLI tool has quietly become something the AI can pick up and use, mid-conversation, on its own initiative.&lt;/p&gt;
&lt;h2 id="the-safety-property-worth-noticing"&gt;The safety property worth noticing
&lt;/h2&gt;&lt;p&gt;Now, &amp;ldquo;let an AI run things on my machine&amp;rdquo; is rightly a sentence that makes people nervous. It makes me nervous, and I built the thing. So it&amp;rsquo;s worth noticing the constraint sitting quietly in this design.&lt;/p&gt;
&lt;p&gt;The AI can only call what you defined. The tools it sees are exactly the commands in your tree, and the parameters it can pass are exactly the flags and arguments you declared, validated against the JSON Schema generated from them.&lt;/p&gt;
&lt;p&gt;It can&amp;rsquo;t invent a command. It can&amp;rsquo;t pass a parameter you never defined. The boundary of what the agent can do is the boundary of what your CLI does, and you drew that boundary already, back when you built the tool. Exposing the CLI over MCP doesn&amp;rsquo;t widen the surface one inch. It just makes the existing surface reachable. The AI isn&amp;rsquo;t running &lt;em&gt;things&lt;/em&gt;. It&amp;rsquo;s running &lt;em&gt;your commands&lt;/em&gt;, the ones you wrote, tested and shipped, and nothing else.&lt;/p&gt;
&lt;h2 id="the-gist"&gt;The gist
&lt;/h2&gt;&lt;p&gt;A CLI tool, built properly, is already a structured description of a set of capabilities: named operations, descriptions, typed parameters. Which is also exactly what an AI agent needs in order to call a tool. The gap between the two is only a translator, and writing a bespoke one per assistant is a treadmill you don&amp;rsquo;t need to step onto.&lt;/p&gt;
&lt;p&gt;go-tool-base puts the translator in the framework. Every tool gets an &lt;code&gt;mcp&lt;/code&gt; command that serves the command tree over the Model Context Protocol&amp;hellip; commands become tools, descriptions become descriptions, flags become JSON Schema parameters, with no second schema to maintain. Point any MCP-aware assistant at it and your CLI is an agent-callable tool, bounded to exactly the commands you shipped.&lt;/p&gt;
&lt;p&gt;You did the hard part when you built a good CLI. MCP just opens the door you&amp;rsquo;d already framed.&lt;/p&gt;</description></item></channel></rss>