Pioneering on PHP Boy Scout

Release trust without the framework

Sun, 05 Jul 2026 00:00:00 +0000

A few days ago I shipped afmpeg and ffmpeg-wasi, a way to run FFmpeg as a WebAssembly module straight from Go with nothing installed on the host. afmpeg fetches that wasm module at runtime and then… runs it. A lovely trick, right up until you ask the obvious question: how does afmpeg know the wasm it just pulled off the internet is the one I built, and not something swapped in on the way down?

It has to check a signature. And I already had the machinery to do exactly that… it was just bolted to the inside of go-tool-base.

So I pulled it out. Two new modules, both public: signing and signing-aws-kms.

The part I lifted out

signing is the OpenPGP/WKD signing-and-verification model the rest of my tools already lean on. It’s the same code behind gtb update when it checks its own releases, lifted out into a standalone module you can drop into any project. Verify a signed release, or sign your own, and you do it without dragging the whole go-tool-base framework (and its dependency tree, which is not small) in behind it.

That last bit is the whole reason it’s a separate module and not just a package. Go has a rule here that trips people up: when your code imports a package, you inherit that package’s entire module dependency list… the full go.mod, not just the corner you actually touched. So lifting one tidy little package out of go-tool-base into a sub-package would still have handed every consumer viper, OpenTelemetry, the whole charm stack, the lot. Only a separate module, with its own minimal go.mod, keeps that weight off your build. The core here leans on ProtonMail’s go-crypto and cockroachdb/errors, and nothing else.

The backend you inject

Signing needs a key, and keys live in awkward places: a PEM file on disk, a YubiKey, AWS KMS, GCP, Azure, a Vault. The remote ones drag in heavy SDKs. The AWS KMS client alone pulls in 57 modules, and I really didn’t fancy that sitting in the core where everyone pays for it whether they ever touch KMS or not.

So signing doesn’t implement those backends at all. It defines an interface (a thing that hands back a crypto.Signer) and lets you inject whichever backend you actually use. A light local backend (a PEM key on disk) ships in the box, as a sensible default and a worked example of the shape. The heavy ones are separate modules you opt into, one at a time.

signing-aws-kms is the first of them. It wraps a KMS-held key as a signer, so the private half never leaves AWS and every signature is a round-trip to the cloud. Blank-import it and it registers itself. GCP, Azure and Vault will follow the same pattern, and because each is its own module, your binary only ever carries the one you reached for. And if none of them fit? The interface is right there for you to write your own.

The proof, already running

This isn’t a library out looking for a user. ffmpeg-wasi signs its release assets in CI right now, with a KMS key driven through the go-tool-base CLI, and afmpeg verifies that signature against an embedded key before it hands a single byte to the runtime.

No valid signature, no run.

The thing that needed the trust and the thing that provides it are two separate projects, talking to each other through a signature.

The verifying key isn’t only baked into afmpeg, either. It’s published to WKD on my own domain, so the anchor you check against lives somewhere my git host has no say over. A signature is only ever worth as much as the key you check it against, and pinning that to my own domain rather than the platform that hosts my code is a deliberate bit of the posture, not an accident of wherever a file happened to be convenient to drop.

Where it leaves things

Two more modules out in the world, both public, both documented over at signing.phpboyscout.uk. The selfish win is that afmpeg got to trust its own downloads without me reinventing a single thing to do it. The broader one is that the signing model I keep banging on about is no longer something you have to swallow my entire framework to use. Take the part you want, and leave the rest on the shelf.

The secret that wasn't on my branch

Sat, 04 Jul 2026 00:00:00 +0000

My CI went red on a secret I’d never committed.

Not a close call, not a near-miss I’d half-forgotten about. gitleaks, the secret scanner, failed a merge request of mine on a private key that was not on my branch, was not in my change, and as far as I could tell had nothing to do with me at all. The job was adamant. I was baffled. Somewhere in between was a lesson about what a secret scanner actually scans.

Prove it isn’t yours

First rule of being accused: don’t get defensive, get evidence. The scanner pointed at a couple of commits carrying a test private key and a PEM block in a spec document. I genuinely didn’t recognise them, but “I don’t recognise it” is a feeling, not a fact, and feelings don’t reopen a pipeline.

Git will tell you the truth if you ask it precisely. The question is: are these flagged commits actually part of my branch?

git merge-base --is-ancestor <flagged-sha> HEAD

That asks “is this commit an ancestor of where I am?”. The answer came back no. The commits the scanner was choking on were not in my history. They weren’t mine.

So whose were they? A bit of digging turned them up on a completely separate, still-unmerged branch, where someone (me, a few days earlier, on a different feature) had committed a throwaway test key and a PEM example in a spec, on purpose, as fixtures. Deliberate, harmless, and nowhere near the branch under review. And yet here they were, failing a merge request that had never touched them.

What gitleaks actually scans

Here’s the bit I’d taken for granted. I assumed gitleaks detect scanned my change. It doesn’t. With no further instruction it scans the whole history reachable in the checkout it’s handed.

And the checkout it’s handed is where the second half of the surprise lives. GitLab runners default to GIT_STRATEGY: fetch, which reuses the runner’s working directory between jobs rather than cloning fresh every time. It’s faster, and most of the time you never notice. But it means a shared runner accumulates refs from every branch it has ever built. My MR’s job happened to run on a runner that had, at some point, built that other branch, so the fixtures were sitting right there in the local object store, fair game for a scanner walking the whole graph.

So gitleaks did exactly what I’d asked it to do, which was “scan everything”, and “everything” turned out to be a great deal more than my change. It walked the lot and dutifully reported fixtures from a branch I wasn’t even proposing to merge. The scanner wasn’t wrong. My idea of what it was looking at was.

Unblock now, fix properly after

Two problems on two timescales. I needed the MR to merge today, and I needed this to never happen again. Those want different fixes.

The immediate one: tell gitleaks those specific fixtures are known and intended. They’re test material, they’re meant to be there, so they go in the allowlist:

paths = [
 # ...
 '''internal/cmd/keys/keys_test\.go''',
 '''docs/development/specs/2026-06-08-keys-mint-command\.md''',
]

That unblocks the merge. It does nothing about the root cause, which is that the scan was looking at the wrong commits in the first place. Allowlisting individual false positives as they crop up is closing the stable door after the horse has bolted, one horse at a time, forever.

The real fix lives in the shared CI component, not in any one repo. Scope the scan to the commits the merge request actually introduces (cicd v0.10.3):

if [ -n "$CI_MERGE_REQUEST_DIFF_BASE_SHA" ]; then
 gitleaks detect --source . --verbose --redact \
 --log-opts="$CI_MERGE_REQUEST_DIFF_BASE_SHA..$CI_COMMIT_SHA"
else
 gitleaks detect --source . --verbose --redact
fi

--log-opts is handed straight through to git log, so that range, base-of-the-MR to tip, is the exact set of commits the merge request adds and nothing else. On a merge request the scan now sees only what you’re proposing to merge. Off a merge request (a plain branch or a tag pipeline) it falls back to the full scan, because there you genuinely do want the lot. The before-and-after in the job log tells the whole story: the entire accumulated history on one side, the handful of commits you actually wrote on the other.

Fixing the fix

There’s a tax on touching CI shell, and I paid it. The change went into the go, rust and tofu security templates. Go and tofu went green. Rust failed.

The rust template had built its optional --config flag the clever way, inline:

... $([ -n "$CONFIG" ] && echo "--config $CONFIG")

As a command argument, that substitution is harmless: its exit status is thrown away, so nobody cares that the test inside returns false when there’s no config. But when I rewrote the block and reached for the same pattern as an assignment, VAR=$(... && ...), it became a different animal. An assignment takes the exit status of the command substitution, and under set -e a non-zero status anywhere aborts the job. So on every run where the config was empty, which was most of them, the test returned false, the assignment inherited that false, and set -e killed the job stone dead. Same $(...), two completely different fates, decided entirely by whether it sat to the right of an = or got handed to a command as an argument. Go and tofu never used the assignment form, so only rust fell down the hole.

The fix was to stop being clever and write the boring if:

GITLEAKS_CONFIG=""
if [ -n "$CONFIG" ]; then
 GITLEAKS_CONFIG="--config $CONFIG"
fi

Boring shell is good shell.

What a scanner is actually looking at

The whole mess came from one unspoken assumption: that a tool called to scan “my change” was scanning my change. It was scanning a checkout, and a checkout on a shared runner is a much bigger, messier thing than the diff in front of you. None of the pieces were broken. gitleaks did its job, GIT_STRATEGY: fetch did its job, my fixtures were exactly where I’d left them. They just added up to a red pipeline that had nothing to do with the code I was trying to ship. I’d spent a good chunk of the day proving my innocence to a scanner that was only ever doing as it was told… and the one thing I’d actually got wrong was being sure I already knew what it was looking at.

I filed a feature request into my own framework

Fri, 03 Jul 2026 00:00:00 +0000

I’m building a tool called keryx, and the part of it that matters here is its studio: a browser app where the work happens, which saves everything you do into a git repository behind the scenes, the way a developer’s project lives in git with a history you can step back through.

I wanted that repository to be able to live entirely in memory. Cloned, edited, committed and pushed without ever writing a working copy out to a disk, for the times when you can’t, or would rather not, leave a checkout sitting around on the machine. It sounds exotic, but it’s something git libraries genuinely support, and it’s exactly what a browser studio running on a server somewhere wants.

Getting it working needed one small, awkward piece of plumbing in the middle. And a few lines into writing that piece, I stopped, because I realised I was writing it in the wrong repository.

The bridge I was about to vendor

Here’s the awkward bit. All of keryx’s file handling goes through afero, the standard filesystem interface in the Go world, the thing you hand your code so it neither knows nor cares whether it’s talking to a real disk, a test fake, or memory. It’s the interface go-tool-base hands you for filesystem work. But an in-memory git repository, the kind go-git gives you with its memfs, doesn’t speak afero. It speaks go-billy’s filesystem interface instead. Two perfectly good filesystem abstractions, and a worktree on the wrong side of the gap from all my code.

What I needed was an adapter: a bridge that makes a billy filesystem look like an afero.Fs, so the studio’s existing file handlers work unchanged over a repo that lives entirely in RAM. Twenty minutes of work, maybe. The obvious move was to write it inside keryx and get on with my afternoon.

And that’s the move I caught myself making. Because a billy-to-afero bridge is not a keryx thing. It’s not even a studio thing. It’s a general capability that any tool built on go-tool-base might want the moment it touches git. Vendor it in keryx and I’ve buried a reusable bit of plumbing inside one consumer, where it will drift away from the framework and get reinvented, slightly differently, in the next tool I build that needs it.

The bridge belonged in the framework. So that’s where I put it.

A feature request, against myself

I wrote the need up properly. Not a code comment, not a mental note, but an actual feature request, with a reference implementation sketched out, dropped into the go-tool-base repository as a document for the framework to act on.

There’s something slightly absurd about filing a feature request against your own project. The author and the customer are the same person. But that’s exactly what gives it its value. The most useful design input a framework gets is a real consumer hitting a real wall, and for once I was both: the person who maintains go-tool-base, and the person downstream of it who’d just discovered something it couldn’t yet do. The request wasn’t hypothetical or “wouldn’t it be nice”. It was “I am stuck on this right now, here is precisely what it can’t do yet.”

What came out the other side is pkg/vcs/repo/aferobilly, a first-class part of the framework as of v0.22.0. Its own description is the clearest summary of what it is:

// Package aferobilly adapts a go-billy/v5 Filesystem to an afero.Fs. It is the
// pure, reusable bridge behind pkg/vcs/repo's worktree-as-afero accessors, but
// works for any billy filesystem (memfs, osfs, chroot).

Alongside it, the worktree itself grew the accessors that hand you that view: WorkFS() for a live afero handle, and WithWorkFS() for an atomic sequence (worktree_fs.go, and the adapter itself). keryx then consumed it like any other framework feature, and the in-memory studio fell into place.

Two sessions, one dependency

The bit I’d actually recommend to anyone is what I did with my time while that got built.

I didn’t down tools and wait for the adapter. I handed the feature request to a separate agent session and let it build the framework feature from the spec, working in the go-tool-base repo, while my keryx session carried straight on with all the studio work that didn’t depend on the bridge. Two sessions running in parallel, deliberately sequenced around the one dependency between them: keryx needs the adapter, so the adapter session goes first, but only the last mile of keryx actually waits on it. When go-tool-base cut the release with the adapter in it, keryx pulled the new version and the final piece slotted in.

That’s a workflow the framework split makes possible. The thing that’s a shared capability gets built once, in its proper home, by one stream of work, while the thing that consumes it carries on in another. The dependency between them is real, so the order matters, but only at the very end.

The one rule that came with it

Upstreaming it also meant the tricky part got solved properly, once, with a warning attached, rather than learned the hard way in a consumer. The adapter is concurrency-safe by construction: it serialises every operation through a lock, so when that lock is the same mutex guarding the repo, a live afero handle over the worktree is genuinely safe to share. But that safety has a sharp edge, and the package says so plainly:

// A handle (and its open files) must NOT be used from inside a critical section
// that already holds the same locker (the repo mutex is non-reentrant — that
// would deadlock).

Use the handle inside a WithWorkFS callback and you’ll re-lock a non-reentrant mutex and hang yourself. That’s exactly the kind of footgun that, vendored in keryx, I’d have discovered at 11pm with a wedged process and no idea why. In the framework, it’s documented at the source, where the next consumer reads it before they trip over it.

The truest test of a framework

Building a real product on your own framework is the best test of it, and this is what that actually looks like in practice. The test is sharper than “does it work”. It’s “what does the product need that the framework doesn’t have yet”, and every real answer to that is a feature request waiting to be filed.

The discipline is filing it against the framework instead of patching around it in the app. Do that, and the awkward bridge has exactly one home, the deadlock warning gets written down once, and the next tool I build inherits all of it for free. The customer was me. The feature request was real. And go-tool-base is better for my having been stuck.

There's no AI in my photo culler

Wed, 01 Jul 2026 00:00:00 +0000

Before a wedding photographer can edit a single frame, there’s the cull: sitting down with three or four thousand photos from the day and deciding which are even worth keeping. The blurry ones, the ones where the flash fired into a mirror, the same moment shot eight times in a burst where only one frame is sharp. It’s mechanical, it’s exhausting, and it’s the first job krites does for Hailey.

Every culling tool I looked at before building it leads with the same word. AI. AI culling, AI selects, trained on millions of weddings. So when I sat down to write krites’ first pass, I assumed I’d be wiring up a model too. For the part that does the most work, it turns out, I didn’t need one.

The shipped culler doesn’t load a single weight. It’s arithmetic, the sort a calculator could do if you were patient enough, and that’s a deliberate choice rather than a corner I cut. Here’s what’s actually under it.

Blur is the variance of a Laplacian

The first question for any frame is whether it’s in focus. You can answer it without knowing anything about what’s in the photo.

A Laplacian is an edge detector. Run it over an image and it lights up wherever the brightness changes sharply, the crisp boundary between a dark suit and a white shirt, the line of an eyelash. A photo in focus is full of those sharp transitions; a soft or motion-blurred one has smeared them all into gentle gradients. So if you measure how much the edge response varies across the frame, a sharp photo gives you a big spread of values and a blurry one gives you a flat, lifeless number. That single number is the focus score.

In krites it’s a 3×3 kernel over the frame’s luma (the brightness channel, Rec. 601 weights), and the score is the variance of the response:

lap := int(luma[(y-1)*w+x]) + int(luma[(y+1)*w+x]) +
 int(luma[y*w+x-1]) + int(luma[y*w+x+1]) - lapCenter*c

Sum the responses, sum their squares, and the variance falls out as sumSq/n - mean*mean. No training data, no inference, and the same pixels always give the same answer. (quality.go.)

Exposure is a histogram

The second question is whether the exposure is salvageable. If a third of the frame is pure white, the highlights are blown and there’s no detail to bring back; if it’s mostly pure black, the shadows are crushed the same way.

That’s just counting. Walk the luma plane once, tally how many pixels sit at or above a near-white threshold and how many at or below a near-black one, divide by the total, and you’ve got two fractions: the blown-highlight proportion and the crushed-shadow proportion. A photographer cares about those two numbers directly, and a for loop produces them (quality.go).

Two photos are the same when sixty-four bits agree

Then there are the bursts. A photographer holds the shutter through the first kiss and gets twelve nearly-identical frames; you want the sharpest one and the rest out of the way. To do that the tool has to know which frames are “the same shot”, and again you don’t need to understand the photo to tell.

The trick is a perceptual hash, a difference hash to be exact. Shrink the image right down to a nine-by-eight grey thumbnail, then for each row note simply whether each cell is brighter than the one to its right. That’s sixty-four yes/no comparisons, packed into a sixty-four-bit number, a fingerprint of the picture’s broad light-and-dark structure that survives a resize, a small reframe or a touch of noise:

if grey[y*hashW+x] > grey[y*hashW+x+1] {
 h |= Hash(1) << bit
}

Two fingerprints are compared by counting the bits that differ between them, the Hamming distance, which on a 64-bit integer is one CPU instruction (bits.OnesCount64). A small distance means the frames look alike. krites only clusters consecutive frames within that distance, so a run of similar shots merges into a burst but two unrelated photos that happen to rhyme don’t (dedup.go).

Best-of-burst is then the dullest line of code in the project: keep the sharpest frame in the cluster, demote the others from keep to maybe, and write down why.

fv.Reasons = append(fv.Reasons, "near-duplicate of "+bestFrame+" (kept the sharper frame)")

Signals in, a verdict out

None of those measurements decide anything on their own. A focus score of 50 is rejectable on one shoot and fine on another, because the numbers scale with resolution and content. So the signals feed a profile, a small set of thresholds, and the profile turns them into a ruling: below the hard focus gate it’s a reject, below a softer floor it’s a maybe, blown past the exposure gates it’s a reject, otherwise it’s a keep. Every verdict carries its reasons in plain words, “out of focus (sharpness 32 below 50)”, because krites proposes and the human disposes (cull.go).

The seed thresholds for a wedding are just a starting point, written to config on krites init and tuned from there:

seedMinSharpness = 50 // below this: rejected as out of focus
seedSoftSharpness = 150 // below this (but >= min): demoted to maybe
seedMaxHighlights = 0.10
seedMaxShadows = 0.30
seedDedupDistance = 8

The thresholds are the whole point of keeping them visible. “Suitable for a wedding album” is Hailey’s definition, not mine and not a model’s, and a number in a config file is something she can move (profile.go).

Where the models do belong

I’m not claiming AI has no place in this. Some of what a wedding photographer culls on genuinely needs a model: is this person mid-blink, is anyone actually looking at the camera, is the composition any good. Those are coming, and they’ll be model-backed when they do. The deliberate bit is that they sit outside this deterministic core, behind an interface, opt-in. The maths that does the heavy lifting of the first pass never imports a model.

That separation buys three things you lose the moment a neural net touches the hot path. It’s reproducible: the same frames in the same order always cull the same way, so a verdict is debuggable and a regression is catchable. It’s quick enough to run over four thousand frames on a laptop with no GPU. And it stays honest about what it knows, because a threshold you can read is a threshold you can argue with, which a confidence score from a black box never quite is.

“AI culling” makes for a better headline. But blur really is just a number, a duplicate really is just sixty-four bits, and the grim, mechanical first pass that stands between a photographer and their best photos comes down to arithmetic.

A stack trace is not an error message

Tue, 30 Jun 2026 00:00:00 +0000

The repair agent I’ve been building into go-tool-base narrates what it’s doing as it goes. It builds, it tests, it lints, it fixes, and it logs each step so I can watch it think. Mostly that log is a calm, readable trickle: tried this, that failed, reading the file, here’s the fix. Mostly.

The moment a build or a lint step failed, the calm trickle turned into a wall of Go stack frames, the same forty lines of runtime gubbins over and over, burying the one line I actually wanted to read.

Two different things we both call “the error”

The agent’s tools wrap their failures with cockroachdb/errors, which is a lovely library: it attaches a stack trace to an error the moment you create it, so when something goes wrong deep in the weeds you can see exactly how you got there. A failed go build comes back as one of these rich, wrapped values, carrying its message and its stack.

The line that recorded the failure looked like this:

l.Warn("Tool execution failed", "tool", name, "error", err)

Looks fine. It is not fine. The logger is charmbracelet/log, and when you hand a structured logger a cockroachdb error value as a field, it renders the whole thing: message, wraps, types, and every frame of that attached stack. So every failed step, and during self-repair there are plenty, printed a full traceback at WARN. The signal, the actual build error, was in there somewhere, wearing a forty-line coat.

The thing is, a stack trace and an error message are two different objects that we lazily both call “the error”. err.Error() is the message: short, human, “lint issues found…”. The value err is the message plus the evidence of where it came from. They serve different readers. The message is for whoever’s watching the loop run. The stack is for whoever’s debugging why the loop itself is broken. Hand the wrong one to the wrong reader and you’ve got either noise or a mystery.

Pick a reader per level

The fix is to stop making one log line serve both:

l.Warn("Tool execution failed", "tool", name, "error", err.Error())
l.Debug("Tool execution failure detail", "tool", name, "error", err)

The message goes to WARN, where someone’s watching the agent work and just wants to know what failed. The full wrapped value, stack and all, goes to DEBUG, where someone’s gone looking for trouble and wants every frame. Turn the level up and the evidence is right there; leave it at the default and the loop reads like prose again.

It’s a one-line change, give or take, and it lives in the shared chat tool-dispatch path rather than in the agent, so every tool-using client gets the quieter log for free. It’s the same loop I’d just taught to respect the linter; apparently I was determined to make it both honest and readable in the same fortnight.

Where the stack belongs

The stack trace was never the thing I needed to hide. cockroachdb/errors attaching it is exactly what I want; it’s the whole reason I use the library. The mistake was where I let it surface. A trace dumped at WARN on every routine failure isn’t observability, it’s wallpaper, and wallpaper is what you stop seeing. Keep the loud version for the level where someone’s actually gone looking for it, and leave the everyday log alone. The stack was never the noise. Printing it on every line was.

A flag is not a setting

Sun, 28 Jun 2026 00:00:00 +0000

I was reviewing a change to rust-tool-base’s scaffolder when a word stopped me dead. rtb generate config-field. I couldn’t have told you why in that first second… I looked at it and just knew it was wrong.

The verb there is generate, and the verb is fine. It was the noun that grated, config-field, the name of the thing being made. Renaming it is a small change. It’s also a breaking one, and a gut feeling is no reason to break someone’s command, so before I touched it I went and worked out what the instinct was reacting to.

Accurate, and still wrong

Here’s the awkward bit: config-field is correct. The thing it makes really is a field on a config struct. If that name lived deep in a package, somewhere only another developer reading the source would ever trip over it, it’d be fine. The code’s audience is me, and “config field” is exactly what the code sees.

But it doesn’t live deep in a package. It sits right out on the command line, on the one surface a user actually types, and a name out there has a different job. It has to telegraph what it does to someone who has never read a line of the source, in words a layperson would reach for. By that test config-field fails, and not because it’s wrong. It fails because it’s right about the wrong thing. It describes the plumbing when all the user wants is to turn on the tap.

That’s the rule I keep coming back to anywhere a person actually touches the tool: accurate is the floor, not the bar.

What the noun names

The right name falls out of what the thing actually is, so I went and pinned that down. rtb’s scaffolder makes three different things, and the noun is how you pick which:

rtb generate command <name> # a new subcommand
rtb generate flag <name> # a command-line argument on a command
rtb generate setting <name> # a field on the tool's typed config

A flag and a setting sound like cousins, but they answer two different questions: where does the value come from, and how long does it live. A flag is something the user types for a single run (a clap argument, in Rust terms), like deploy --region eu or --dry-run. Transient, scoped to the one command. A setting is a typed field on the tool’s AppConfig, read from its layered config: a file, the environment, or a one-off override on the CLI. Persistent, and tool-wide. (The full contrast is its own doc now.)

Put the two side by side and the old name gives itself away. flag says what the thing is to a user. config-field said what it is to the code. One tells the truth at the surface; the other leaks an implementation detail you were never meant to care about.

Why they’re two things at all

This is the bit that makes the rename honest rather than fussy, and it’s where rust-tool-base and go-tool-base part ways.

In go-tool-base, a flag and a setting are pretty much the same object. cobra and viper (Go’s CLI and config libraries) fuse them: you bind a flag, viper reads its value from a config file or the environment, and you’re done. One persistent flag laid over a config bag. That’s no compromise, it’s an excellent convenience abstraction, gtb leans on it to the hilt, and for what it’s worth it’s the model I personally find the simpler of the two. One mechanism, one thing to keep in your head.

Rust won’t hand you that fusion, and it’s right not to. rtb’s config is a typed AppConfig (built on figment, a Rust config library), not a dynamic get_string("key") bag, so a command-line argument and a config field genuinely are different types with different lifetimes. Splitting them isn’t rtb being puritanical about it. It’s the shape Rust’s type system gives you, and the framework leans in and makes the most of it. The rtb version is, no argument, the more type-safe of the two.

So neither is better. They suit different paradigms, and both do the job beautifully. But the knock-on for naming is concrete. Once a flag and a setting really are two different things, calling one of them config-field doesn’t just expose the plumbing, it tells a small lie: it implies a setting is the same kind of object as the struct field it happens to sit in. setting tells the truth. This is the thing you configure once and the tool remembers, the sibling of flag.

(rtb has form here, mind. “flag” already pulls double duty: a runtime feature flag and a compile-time Cargo feature are two more genuinely different things the framework keeps deliberately apart. Stretch one word across that many concepts and naming each one precisely stops being pedantry and becomes the only way anyone keeps them straight.)

The change

So config-field became setting, and picked up its mirror image remove setting to round out the trio of command / flag / setting. It’s a breaking change, rtb generate config-field is gone for good, and it earned its keep. The cost is a line in a changelog. The return is a command surface that says what it means.

Name the tap

The gut reaction was right, but the gut reaction was never the point. The point is what it was reacting to: a name, out on a surface a human uses, describing the machinery instead of the job. config-field was accurate. It still made the user stop and think about a struct field when all they wanted was to set something up and get on with it.

Nobody turning on a tap wants to think about the pipework behind the wall. Name the tap.

The agent said SUCCESS. The linter disagreed.

Fri, 26 Jun 2026 00:00:00 +0000

There’s a repair agent inside go-tool-base now. When you run gtb generate command, it doesn’t just spit out a file and wish you luck. An agent takes the generated code, builds it, runs the tests, and fixes whatever it broke, looping until the thing actually works (or until it’s tried the same fix five times and admits defeat). The whole point is that the generator hands you code that’s ready, not code that’s nearly ready and quietly now your problem.

So it stung a bit when I realised the agent had been holding itself to a lower bar than I’d hold any junior to. And I was the one who’d set the bar.

What “done” meant to the agent

The agent is a loop with real tools: it can build, test, read files, write files, tidy the module, and run golangci-lint. It works through them, and when it’s happy it replies with the word “SUCCESS” and the loop stops. On the Go side, the check is exactly that blunt:

if strings.Contains(strings.ToUpper(resp), "SUCCESS") {
 return nil
}

That’s the whole gate (agent.go). There’s no clever verification on my end that the agent actually did its homework. It does the work, it tells me it’s done, and I believe it. Which is fine, as long as the agent and I agree on what “done” means.

We didn’t.

The instruction that made lint optional

The agent decides it’s finished by following a numbered list in its system prompt. Here’s the line that did the damage:

If there are lint issues, use ‘golangci_lint’.

Read that the way the agent would. “If there are lint issues”… well, how would it know? The only way to find out is to run golangci-lint. But the instruction makes running golangci-lint the thing you do once you already know there are issues. It’s a chicken with no egg. And the SUCCESS condition at the bottom of the list never mentioned lint at all:

When the project builds successfully and tests pass, reply with “SUCCESS”.

So the agent did the sensible thing, given its orders. It built the code, ran the tests, saw both go green, and declared victory. golangci-lint was sat right there in its toolbox, unused, because nothing ever told it the job wasn’t finished until lint was clean too. I’d handed it a linter and then written a prompt that let it walk straight past it.

The galling part is that the linter was never the missing piece. The golangci_lint tool had been registered the whole time, and it even runs with --fix, so it’ll quietly clear the trivial stuff and only surface what actually needs a decision. The capability was there. The instructions just never required it.

The fix was words, not code

Here’s the part I find genuinely interesting. I didn’t add a check. There is no new gate in the Go. The fix is four lines of English:

Run ‘go_build’, ‘go_test’ and ‘golangci_lint’ in the project directory… Run all three; a clean build and passing tests do not imply clean lint.

Reply with “SUCCESS” only once ‘go_build’, ‘go_test’ AND ‘golangci_lint’ all pass with no errors and no reported issues.

That’s it. Lint moves from a remediation step you reach for once you somehow already know there’s a problem, into the gate itself. “Done” now means three green lights, not two.

It nags at me a little, that one. The reliability of an agent that writes and fixes real code came down to whether one sentence of instructions was precise enough. When your success criteria are a paragraph of prose, vagueness in that paragraph is a bug, the same as a vague type or an off-by-one. The spec just happens to be written in English, and the thing reading it is a language model that will cheerfully take the cheap reading if you leave it lying around. That’s the same lesson the goblin who wouldn’t stay dead taught me from the other direction: with these tools, what you say is what you get, and what you don’t say is fair game.

Leave it better, not just building

The Boy Scout Rule is the whole reason this blog exists, and I’d quietly exempted the robot from it. “Leave the campsite cleaner than you found it” had become “leave it building”, which is not the same thing and never was. If I’m going to put an agent in the loop precisely so it tidies up after the generator, then “tidy” has to mean what it would mean for a person on my team. Build, test and lint. No walking past the bin because nobody told you to pick it up.

The cobra hook I was sure I'd enabled

Wed, 24 Jun 2026 00:00:00 +0000

It came out of an audit. I’d recently pointed a small army of review agents at the whole go-tool-base codebase, back before that became a political problem, and one of the findings was that a subcommand could quietly skip the framework’s own start-up code. My first reaction was the dangerous one: surely not… we switched that on ages ago. So I asked for a second pair of eyes on the exact line.

There was no line. I was certain I’d enabled it, but I had simply never done it.

What the start-up hook is for

go-tool-base is built on cobra, the library most Go command-line tools are built on. In cobra, a command can carry a PersistentPreRunE: a function that runs before the command itself, and that, the name strongly implies, persists down to the command’s children. Think of it as the “before you do anything, get the tool ready” step.

go-tool-base uses exactly one of them, on the root command, to do all the humdrum setup: load and merge configuration, set up logging, ask about telemetry the first time you run, wire up the telemetry collector, and check whether there’s a newer release to install. Everything the tool does afterwards leans on that having happened. By the time your actual command runs, props.Config is meant to be populated and the collector is meant to exist.

The reasonable assumption (the one I’d made, anyway) is that “persistent” means it cascades. Define it once at the root and every mytool foo bar three levels down gets it for free.

“Persistent” promises less than it says

Here is the catch, and it is a good one to file away if you ever build a command tree. cobra runs only the nearest PersistentPreRunE it finds, walking up from the command you actually invoked. If a subcommand defines its own, that one runs and the root’s does not. Not as well as. Instead of. There’s no warning and no error; the child’s hook simply wins, and the parent’s is passed over in silence.

So the moment any command below the root declared its own PersistentPreRunE, the entire start-up for that branch, the config, the logging, the telemetry, the update check, would just not happen. props.Config would be nil. The collector would be nil. The first you’d hear of it is a nil-pointer panic a long way from the cause, or, worse, no panic at all and a tool running happily unconfigured.

EnableTraverseRunHooks is cobra’s opt-in to the behaviour most people assume is already the default: run every PersistentPreRunE from the root down to the leaf, in order. I’d assumed it was the default. It is not, and I’d never turned it on.

A landmine nobody had stepped on

The saving grace was that nothing was actually broken yet. In go-tool-base’s own command tree, the root is the only command that defines a persistent pre-run, so “root to leaf” and “nearest only” happen to produce the identical result. The flag being off changed nothing I could observe.

The bug was latent. It was a trap laid for the first person to do something entirely reasonable: add a PersistentPreRunE to one of their own subcommands. go-tool-base is a framework other tools are built on, so that person was never going to be me. The instant a downstream author did the obvious thing, their config and telemetry would vanish for that branch of their tool and nothing would tell them why.

That is the kind of bug I least like shipping. It compiled. It passed the tests. It would have demoed perfectly. And it sat there waiting to hand a stranger a debugging session with no breadcrumbs, for the crime of using a standard cobra feature the obvious way.

One line, and a note for whoever’s next

The fix is the line I was so sure already existed (root.go):

// Run every parent PersistentPreRunE from root to leaf rather than only the
// closest one. Without this, a downstream subcommand that defines its own
// PersistentPreRunE silently shadows the root bootstrap (config load,
// telemetry, update check) for that subtree. With it set, the framework
// bootstrap always runs first and a child hook runs after it.
cobra.EnableTraverseRunHooks = true

With it set, cobra runs the root start-up first and then the child’s hook, in order, so a downstream command adds to the setup instead of replacing it.

I didn’t want to stop there, because the next author to add a child hook still deserves to understand the ordering. So the change also drops a one-time debug line if it spots any command in the tree carrying its own PersistentPreRunE (the same file), saying out loud what’s going on:

l.Debug("a downstream command defines its own PersistentPreRunE; " +
	"it runs AFTER the framework bootstrap (config load, telemetry, update check), not instead of it")

And, belt and braces, the collector now defaults to a no-op so the few paths that do legitimately return early, like init and help, still satisfy the “always non-nil” promise the rest of the code relies on. The whole thing shipped with a pair of regression tests that assert the bootstrap really does run when a child hook is present, and that it runs first. It’s all written up in a short spec and landed in one commit.

Trust, but grep

There are two things worth taking away. The cobra one is portable: if you rely on PersistentPreRunE cascading down a command tree, set EnableTraverseRunHooks, because “persistent” means less than it sounds and the nearest hook wins by default.

The other is the one I keep having to relearn. The settings I’m most certain about are the ones I never check, precisely because the certainty is what stops me looking. Somewhere along the line I’d promoted “I meant to” straight to “I did”, with nothing in between… and then defended it out loud before I’d even gone to look. A review agent is good at exactly that blind spot: it has no memory of intending to do something, only the code in front of it. The best thing the audit turned up wasn’t a clever bug. It was a flag that was never there. Leaving the campsite better than you found it has to include the traps nobody’s stepped on yet.

When you hand the same key to every call

Tue, 16 Jun 2026 00:00:00 +0000

I was building a tutorial, the kind where the whole point is that the reader runs every command and it just works. So I generated a fresh project with go-tool-base, added a command, then added a command underneath that command, and hit build. It didn’t.

pkg/cmd/hello/cmd.go: props.ChildCmd undefined
 (type *props.Props has no field or method ChildCmd)

My own generator, in my own framework, had just written code that referenced a thing that didn’t exist… which is a special kind of embarrassing.

A bug with a two-month alibi

git blame walked me straight to the commit that introduced the command middleware system back in March. Middleware here is the web-style idea of wrapping a command’s run function with cross-cutting behaviour, timing, auth, recovery, that sort of thing. To wire it in, the generator started emitting this for a nested command:

setup.AddCommandWithMiddleware(cmd, child.NewCmdChild(p), props.ChildCmd)

where the line before had simply been:

cmd.AddCommand(child.NewCmdChild(p))

The catch is that third argument. props.ChildCmd is meant to be one of a set of constants, but those constants are hand-declared for the framework’s built-in commands only (UpdateCmd, DocsCmd, and friends). The generator never declares one for a user’s child command, so the generated parent referenced a name that nothing had ever declared. Undefined. Won’t compile.

Here’s the part that should worry you more than the bug. It shipped in March and nobody noticed until late May. Partly because it only bites nested commands, a command under another user command; top-level commands register by a different path and were fine. But mostly because the generator’s tests checked the generated code as text, asserting that it contained the right strings, and never once ran go build on the result. CI was green for two months on code that could not compile. We were grading the essay without ever reading it aloud.

What the key actually was

Once I stopped staring at the missing name, the real problem came into focus… and it wasn’t the missing constant at all.

That third argument is a middleware lookup key. The framework keeps a table of middleware registered against each key, and the key tells it which to apply. It is not an on/off switch and it is not optional, so the generator had to supply one at every registration site. It was being asked to guess, on every call, a value it had no reliable way to produce for a user command.

And the tell was sitting right there in the same generator: everywhere else, the idiom was props.FeatureCmd("name"), a function that derives a key from a string. The nested-registration path was the one place that assumed a hand-declared constant instead. One call site out of step with all the others.

That is the actual lesson, and it has nothing to do with cobra or codegen. When you find yourself threading the same derived value through every single call site, and getting it wrong, the value is in the wrong place. The feature key was never the caller’s business. It belonged to the command.

Changing my mind about cobra

This is where I had to eat a helping of my own opinion.

go-tool-base is built on cobra, the de-facto Go library for building command trees, and I like it a great deal. I had deliberately not wrapped it. Every abstraction over a good library is a tax the reader pays, so my standing rule was: use cobra directly, don’t hide it behind something of mine.

The trouble is the middleware pattern kept growing, and the bigger it got the more plainly “don’t abstract cobra” was a position I was holding past its evidence. The very thing I’d refused to build was the thing the design had come to need. It helped that, having recently moved the project to GitLab, the version had reset to a 0.x prerelease, which makes a breaking change cheap. The window to stop patching and do it properly was open, and not for long.

Go’s composition model made it almost painless. You can embed a pointer to one struct inside another and the outer type inherits all the inner one’s methods for free, which is about as close to monkey-patching as a statically typed language gets. So a setup.Command became a cobra command plus the feature it belongs to:

type Command struct {
	*cobra.Command
	Feature props.FeatureCmd
}

func Wrap(feature props.FeatureCmd, cmd *cobra.Command) *Command {
	return &Command{Command: cmd, Feature: feature}
}

func (c *Command) Register(children ...*Command) {
	for _, child := range children {
		if child.Command == nil {
			continue
		}
		if child.RunE != nil {
			child.RunE = Chain(child.Feature, child.RunE)
		}
		c.AddCommand(child.Command)
	}
}

Because *cobra.Command is embedded, a setup.Command is a cobra command for every method cobra offers; the one place you need the raw pointer, you reach for .Command. The generated command now carries its own identity:

func NewCmdChild(props *props.Props) *setup.Command {
	cmd := &cobra.Command{Use: "child", RunE: ...}
	return setup.Wrap(props.FeatureCmd("child"), cmd)
}

and the parent registers it with nothing threaded through the call:

parent.Register(child.NewCmdChild(p))

The feature key now lives on the command, derived from the command’s own name, which is the one place the generator can always produce it correctly. The bug isn’t so much fixed as made unsayable: there’s no call site left to write the wrong thing into. The wiring got cleaner on the way past, too, each command’s run is wrapped exactly once with its own feature, instead of the old recursive pass that re-applied the parent’s feature down the whole subtree. And the old free function stays on as a deprecated shim that just calls Register, so nothing downstream breaks before v1.0.

Changing my mind about the tests

The redesign was the satisfying fix. The test was the important one.

The reason a non-compiling generator sailed through CI for two months is that its tests read the generated source as text. A generator is a program that writes a program, and we were checking that it wrote the expected words without ever asking whether the words formed a working program. So the redesign shipped with a different kind of test, one that scaffolds a real project, adds a nested command, and actually builds it. Its own comment says the quiet part out loud:

The previous test suite asserted file-content shapes but never tried to go build the generated module, so the nested-command path that referenced undefined props.<Name>Cmd symbols compiled cleanly in tests and broke only when downstream users built their tools.

It’s gated behind an integration flag, because it shells out to the Go toolchain and that’s too heavy for every unit run, but it closes the exact gap that hid the bug. The only real test of a code generator is whether its output compiles.

What it comes down to

Three times in one bug I had to change my mind. I’d decided cobra shouldn’t be abstracted; the evidence said abstract it. I’d reached for the one-line patch; the evidence said redesign. I’d trusted tests that read the generated code; the evidence said build it. None of those were comfortable, and the version reset is the only reason the timing was kind.

Both technical lessons are worth keeping. When the same derived value is threaded through every call, the abstraction is in the wrong place. And the only proof that something which writes code actually works is to compile what it writes. But the one underneath both, the one I apparently have to keep relearning, is simpler than either: don’t get so attached to an implementation that you can’t change your mind when the evidence says it doesn’t fit. The framework is better for it, and it now has a composition seam to hang the features cobra doesn’t give us natively. A nested command that wouldn’t build was just the thing that finally made me look.

The goblin that wouldn't stay dead

Fri, 12 Jun 2026 00:00:00 +0000

Turn one, the player swings, the die comes up 20, and my AI dungeon master narrates the goblin falling silent, leaving the player alone in the corridor. Good. Turn two, another roll, a 6 this time, and the same dungeon master cheerily has the goblin “dance back” out of the dark to take another swing. The goblin I’d just watched die was up and fighting again, and the model didn’t so much as blink.

I didn’t feel cheated, or even surprised. I felt the small, familiar thud of oh, yeah, I forgot that bit. Because the model hadn’t gone rogue. It had done exactly what a language model does. The gap was mine.

This was the war story behind part four of the go-tool-base tutorial, the AI dungeon master. The tutorial shows the clean, final design and quietly moves on. It doesn’t show the three different ways I got it wrong first, which is a shame, because the wrong turns are where the actual lesson is.

Why a dungeon master at all

A word on why I was even here. I was trying to prove the chat component of the framework to myself. There’s a voice that pipes up whenever I build anything in this space, “LangChain exists, who do you think you are?”, and the answer I keep landing on is that LangChain is enormous and I wanted something small enough to hold in your head. The tutorial was the test: could a newcomer wire AI into a CLI with it and come out the other side with something that actually behaves?

That last word is the whole problem. A tutorial has to leave you holding something dependable, and dependability is the one thing AI fights you on. I also wanted it to be fun, a thing someone might keep poking at after the tutorial ends, maybe even the hook that gets a person other than me to use the framework. I batted hook ideas around and liked none of them, until the obvious one landed: I run a tabletop game on the odd weekend, so make the AI the dungeon master. Gamify the thing. Then watch it raise the dead.

Strike one: nothing to enforce

The first version was the naive one. I gave the model a roll tool, because the one thing you absolutely cannot let a language model do is pick its own numbers, and otherwise let it narrate freely. The conversation history carried from turn to turn, so it remembered the fight. I assumed remembering was enough.

It isn’t. Remembering and being held to it are different things. The history told the model a goblin had died; nothing stopped it writing the goblin back in when the next turn’s narration wanted a bit of jeopardy. Memory is not a constraint. The model will happily contradict its own past if you’ve given it room to, and I had given it nothing but room.

Strike two: a tool to read the state

The obvious fix, and I do mean obvious, the kind you reach for without thinking, was to give the model a state tool so it could check who was alive before it narrated. Hand it the facts on request and surely it’ll stop making them up.

What it actually did was dither. Handed a tool it could call to look things up, it called it. And called it. And called it again, turning a turn over in its hands without ever committing to an action, burning through its step budget on lookups and leaving the player staring at nothing. I’d cured the lying by inventing paralysis. A tool the model can call is a tool it will call, often instead of doing the thing you actually wanted.

Strike three: refereeing its own dice

When I did get it reading state cleanly, the third failure crept in, and this one was subtler. Once the model could see the goblin’s hit points, it started deciding the fight. It would read that the goblin had 12 HP and just narrate a killing blow, hits and damage and all, without calling the roll or attack tools at all. Why ask the dice when you can see the board and write whatever outcome the story wants? Give a model enough context and it stops being a narrator and starts being a referee, which is precisely the job I’d built tools to keep out of its hands.

The fix was less, not more

Three failures, and notice the shape of my fixes: each one added something. More memory, then a tool, then more context. Every instinct said the model needed more to work with. Every time, the extra capability was the new way to be wrong.

So I went the other way. The truth lives in a plain Go struct that I own, not the model. There’s no state tool to dither on, because the loop simply prepends the current state to every turn’s input, fresh, so the model never has to ask and never gets to drift. The mechanics, the dice and the damage, live in Go functions the model has to call, and the system prompt says in as many words that it must not decide a hit or damage itself. The model is left with exactly one job: narrate. The prose is its to invent. The maths, the state and the shape of the result are not.

That’s the line that turned three bugs into a feature. You don’t make a language model reliable by giving it more to work with. You make it reliable by giving it less to be wrong about.

The freedom I chose not to give it

There’s a real tension in that, and I want to name it rather than pretend the boxed-in version is the only true one. At my own table the rules are guidelines, not guardrails. I ignore them, bend them, improvise, reach for the “rule of cool” when the moment’s better for it. A great AI dungeon master would have that same freedom, and a few out there genuinely do, Old Greg’s Tavern is a lovely example of how far the free-form version can go.

But that freedom costs far more than a tutorial can spend, and it buys unpredictability I was specifically trying to teach people to avoid. So I made a deliberate trade: guardrails instead of guidelines. Simple, but not so simple it’s boring. The player still gets a “not on rails” game, they can try anything and the DM copes, but every outcome that matters runs through code I trust. That’s the right shape for a tutorial, and, not by coincidence, the right shape for most AI features you’d actually ship.

What the goblin taught me

The thing I keep coming back to is that the model never misbehaved. It resurrected the goblin because I gave it the freedom to. It dithered because I gave it a button to press. It refereed because I let it see the board. Every failure was a permission I’d handed over without meaning to. The reliability didn’t come from a cleverer prompt or a bigger model, it came from working out, one dead goblin at a time, exactly how little the model needed to be trusted with.

If you want the version where it all works first time, the tutorial has it, the tool-calling and the typed turns wired up properly. This was the road there. The goblin, you’ll be glad to hear, now stays down.

A signing key that never leaves KMS

Thu, 11 Jun 2026 00:00:00 +0000

The last post in this series walked through how a tool verifies a release signature the platform can’t forge. That post had a loose end dangling off the back of it, and I knew it the whole time I was writing. Because a signature has to be produced by a private key… and a private signing key is the single worst thing in this entire story to lose. Steal it, and you sign malware that sails through every check I spent two posts building, signature and all. So where does that key live? The answer I landed on is the one this whole post is about: inside AWS KMS, and it never comes out.

The only key you can’t steal

Think about where a signing key normally ends up. A file on a build server. A secret in CI. A key on the release engineer’s laptop, “just for the release, I’ll delete it after”. Every one of those is a copy, and every copy is one more thing somebody can read, exfiltrate, or quietly clone while your back is turned. You can wrap them in passphrases and vaults and rotation policies all you like, and you’re still standing guard over a thing that exists in a place you don’t fully control.

The way out is almost annoyingly simple to state: the only key nobody can steal is the one that was never anywhere to be stolen from. So don’t hold the key at all. Let something else hold it, somewhere it has no export path, and ask that thing to sign for you.

That thing is AWS KMS. This is the infrastructure side of the question I opened the signing series with, finally answered with real Terraform.

A key that’s born in the box and stays there

The signing key is an asymmetric KMS key, and the module that provisions it is small enough to read in one sitting:

resource "aws_kms_key" "this" {
 description = var.description
 key_usage = "SIGN_VERIFY"
 customer_master_key_spec = var.key_spec # default RSA_4096

 # Asymmetric SIGN_VERIFY keys do not support KMS-managed rotation;
 # rotation is handled by minting a new key (alias = `<name>-v2`) and
 # publishing the v2 public key alongside the v1 key (dual-sign window).
 enable_key_rotation = false

 policy = data.aws_iam_policy_document.key_policy.json
}

The private half of that key is generated inside KMS and there is no API that hands it back to you. You don’t sign with it the way you’d sign with a file. You call kms:Sign: the bytes you want signed go up, a signature comes back down, and the key itself never moves. An attacker who completely owns my CI, my account, my laptop, can ask KMS to sign things for as long as their access lasts… but they can’t walk off with the key and keep signing forever. The blast radius is “while I’m compromised”, not “until I rotate a key I didn’t know had leaked three years ago”.

Why RSA-4096 and not the Ed25519 I’d normally reach for? Because KMS asymmetric signing doesn’t offer Ed25519, and OpenPGP’s packet format is tied to the algorithm that signed it, so the choice of key spec ripples all the way out to the signature on the wire. RSA-4096 is the strong option KMS does offer, so RSA-4096 is what the workflow is built around. A constraint of the box shaped the cryptography, not the other way round, and I’d rather say so than pretend I picked RSA on purpose.

Minting an OpenPGP key from a key you can’t hold

Here’s the part I find genuinely neat. OpenPGP wants a private key to self-sign its own public key when you generate it. And I don’t have a private key in any form I can hand to a library… it’s sitting in KMS, behind a door with no handle on my side. So how do you produce a valid OpenPGP public key at all?

go-tool-base leans on a small Go interface, crypto.Signer: anything that can return its public key and sign a digest. A KMS-backed signer satisfies it by turning each Sign call into a kms:Sign request. Then pkg/openpgpkey builds the OpenPGP entity around that signer:

func Entity(signer crypto.Signer, name, email string, creationTime time.Time) (*openpgp.Entity, error) {
	rsaPub, ok := signer.Public().(*rsa.PublicKey)
	// ...
	pubPkt := packet.NewRSAPublicKey(creationTime, rsaPub)
	// Construct the private-key packet directly (rather than
	// packet.NewSignerPrivateKey, which panics on opaque signers):
	// the crypto.Signer drives the actual signing, so a KMS-backed
	// signer works here.
	privPkt := &packet.PrivateKey{PublicKey: *pubPkt, PrivateKey: signer}
	// ...
}

Look at that PrivateKey packet. The field where OpenPGP expects the secret key material holds the crypto.Signer instead, which is to say, a remote handle to KMS. When the entity self-signs its public key, that self-signature is computed by KMS. gtb keys mint runs exactly this and writes out an ASCII-armored OpenPGP public key, and at no point did a single byte of private key material exist on the machine that minted it. The OpenPGP “private key” is a phone line to a vault, not a key.

That public key is what gets published off-platform over WKD and baked into the binary, the two trust anchors that post cross-checks.

Access without a human and without a standing key

A key that never leaves KMS is only as good as the rules about who may call kms:Sign. The signer role is deliberately narrow: it can call kms:Sign and kms:GetPublicKey on this one key and nothing else, and it is assumable only over OIDC from specific CI subjects, the same keyless federation the rest of the estate runs on. No human holds it. No long-lived access key sits in a CI variable waiting to leak. A release job federates in for its few minutes, signs, and the credentials evaporate with the runner.

So the chain of “who can sign a release” has no standing secret in it anywhere. Not a key file, not an access key, not a console user. Just a short-lived token, scoped to two API calls, on a key that can’t be exported.

The real cost: rotation is manual

This isn’t free, and the bit it taxes you on is rotation. KMS won’t auto-rotate an asymmetric SIGN_VERIFY key, which is why the module sets enable_key_rotation = false rather than leaving a default on. Rotating means minting a new key (a -v2 alias), publishing its public key alongside the old one, and running a dual-sign window long enough that clients have picked up the new anchor before you retire the old. It’s manual, it’s a runbook, and pretending otherwise would be the kind of thing this series exists to argue against. The trade I made was: a key with no exfiltration path, in exchange for rotation I have to do by hand. For a release-signing key, that’s the right side of the trade.

Why this is a command and not a script I hid

The origin of all this is a good deal less tidy than the result. I was working through the key-generation runbook, creating the offline rotation key with a gpg command I’d copied straight off my own page… and it just hung. No error, no prompt, just a cursor blinking while gpg waited on something it never bothered to mention.

My first instinct was the lazy one: drop the minting script into a scripts folder in my infra repo and never speak of it again. Then it nagged. That repo’s private, so the recipe would live somewhere nobody else could ever reach, and I’d already half-promised myself a tutorial walking people through this exact setup. So it shouldn’t sit in infra at all. It should be a gtb command, with a pluggable backend so anyone can swap my KMS for whatever provider they happen to run.

The deeper objection is the one that actually shaped it, though. I didn’t want to be shelling out to gpg by hand in the first place. gtb is a tool I hand to other people, and every time it drops to the shell for some gpg incantation, that’s an environment I’m asking the next person to reproduce, a dependency to install, a fiddly step to get subtly wrong, all before they can sign a single thing. The aim was to keep as much of this inside the box as I could: mint the key, build the WKD tree, produce the signature, all in pure Go, with no gpg on the path and no gpg-wks-client either.

So gtb keys mint pulls the public half out of your KMS key and frames it as OpenPGP, the trick from earlier; gtb keys wkd builds the tree ready to upload; and gtb sign produces the detached signature through that same remote round-trip. What comes out is an entirely ordinary OpenPGP signature gpg --verify is happy with, so you’re not locked into anything of mine. And none of it is just for me: build your tool on go-tool-base and the same handful of commands stands you up with this exact model, pointed at your own KMS. No cloud KMS to hand? There’s a local backend, a plain key on disk, to wire the whole thing together on your laptop first. These are commands for you, the person shipping the tool. Your users never run mytool keys mint… they just get updates that quietly check themselves, which was the whole idea two posts ago.

That setup deserves a walkthrough of its own, and it’ll get one. For now, the ergonomics were the point, not a nicety bolted on afterwards. The safest setup in the world is no use to anyone if it takes a PhD to stand up.

Where this leaves the whole story

Step back and the full loop is finally closed. The private key is born in KMS and never leaves it. Its public key is minted from it, with KMS computing its own self-signature. That public key is published off-platform and embedded in the binary. Releases are signed by KMS, reached only through short-lived OIDC federation. And the client verifies against the embedded and WKD keys cross-checked against each other. At no single point in that chain is there a thing an attacker can grab that lets them forge a release, and the most dangerous thing of all, the private key, has no theft path because it has no export path.

That’s the thread running through the whole signing series, from the very first checksum to here: the strongest control isn’t a better lock on the key. It’s arranging things so the key was never somewhere you could lose it. Nobody is coming to clean your supply chain, so the least you can do is leave it nothing worth stealing.

A signature the platform can't forge

Tue, 09 Jun 2026 00:00:00 +0000

A self-updating tool has a chicken-and-egg problem baked into it. The thing doing the updating is the thing being updated, so when it reaches out and pulls down a newer version of itself, it’s the one that has to decide whether to trust what just landed. No human in the loop, nobody to ask. I’ve been closing that gap in go-tool-base’s self-updater in two phases. The first gave it a checksum: download the new binary, hash it, compare it against the release’s checksums.txt. That catches the accidents, the truncated download, the flipped bit on a dodgy mirror. And I said at the time, plainly, that it does nothing about a determined attacker who owns the release platform… the checksums file sits right next to the binary, so whoever can swap one can swap both. I left that as an IOU. This second phase is me paying it.

The thing a checksum can’t do

A checksum is a promise that the bytes you got match the manifest. It says nothing about who wrote the manifest. So if GitLab, or my account, or a leaked CI token gets compromised, the attacker rewrites the binary and the checksums.txt in the same breath, and the hash matches perfectly, because they’re the one who computed it. It’s the same wall I keep walking into whenever I think about supply-chain trust: a checksum is only ever as good as whatever’s standing behind it, and the thing standing behind a checksum is the very platform that just handed you the file. Same hands, both times.

To get past that, you need a signature whose root of trust lives somewhere the platform can’t reach.

The crypto is the easy part

Here’s the bit that caught me slightly off guard while I was building this: the cryptography is the easy part. Verifying a detached OpenPGP signature is a library call, and go-tool-base’s TrustSet wraps it up in one method:

func (t *TrustSet) VerifyManifestSignature(manifest, signature []byte) error {
	// ...
	signer, err := openpgp.CheckArmoredDetachedSignature(
		t.entities, bytes.NewReader(manifest), bytes.NewReader(signature), nil)
	if err != nil {
		return errors.Wrap(ErrSignatureInvalid, err.Error())
	}
	if signer == nil {
		return errors.Wrap(ErrSignatureInvalid, "no signer in trust set matched")
	}
	return nil
}

Hand it the manifest, the detached signature, and a set of trusted public keys (the entities), and it tells you whether any one of them signed it. That’s the whole of the cryptography, and it’s genuinely not where the hard work lives.

The hard work is that set of trusted public keys. Where do they come from? Because if the answer is “we ship them right next to the binary”, well… you’re straight back to the checksum problem. Whoever can swap the binary can swap the key too, sign with their own, and the check waves it through none the wiser.

Pulling the two questions apart

So the design splits along exactly that seam. The verification half is fixed, and deliberately boring (the method above). The trust anchor, the actual keys, comes from a swappable KeyResolver:

// The interface separates "where the trust anchor comes from" from "how a
// signature is verified against it", so SelfUpdater can be wired with
// whichever resolver chain a tool needs without changing verification logic.
type KeyResolver interface {
	Name() string
	Resolve(ctx context.Context) (*TrustSet, error)
}

That little seam is really the whole game. Everything interesting about standing up to a compromised platform comes down to which resolver you hand the updater, and the verification code never has to know the difference.

Three answers to “where does the key live”

The first option is to embed it. Bake the public key straight into the binary at build time (NewEmbeddedResolver), so it rides along inside a release you already trusted enough to run. Tidy and self-contained. The catch is that a future malicious release could embed a different key, so on its own, embedding really just trusts whoever cut the most recent binary.

The second is WKD, the Web Key Directory. Fetch the key over HTTPS from a well-known path on a domain you control (NewWKDResolver), nothing to do with where the release itself is hosted. Now the key isn’t in the binary at all, so poisoning a release doesn’t touch it. You haven’t made the problem disappear, mind… you’ve moved the trust onto your domain’s host and its DNS. A different blast radius, but a blast radius all the same.

The third option is to do both, and make them agree. Run embedded and WKD, and insist they agree:

func (c *CompositeResolver) Resolve(ctx context.Context) (*TrustSet, error) {
	// ... run each child resolver concurrently ...
	if err := checkAgreement(successes); err != nil {
		return nil, err // ErrKeyResolverMismatch
	}
	return successes[0].ts, nil
}

Think of it as the two-key rule on a safe deposit box, or two witnesses who’ve never met telling you the same story. One source on its own you might quietly doubt. But if the key baked into the binary and the key sitting on my domain hand back the same fingerprint, that agreement is worth a great deal more than either of them alone. And if they ever come back different, that’s not a maybe, that’s an alarm: ErrKeyResolverMismatch. Poison one source and the mismatch is the thing that gives the game away.

That composite is the real answer, and it’s why the interface exists at all. There’s nothing a single attacker can get their hands on that holds the whole thing up by itself. The key is baked into a release you trusted, and fetched from a domain well off the release platform, and the two have to match before a single byte of the update is allowed through.

The separation is the whole point

It’s easy to nod along at “two sources” and miss the part that actually does the work. The agreement between the embedded key and the WKD key is only worth something if an attacker can’t reach both of them from the same place. If the key I bake into the binary and the key I serve over WKD both came out of the same release pipeline, whoever owns that pipeline swaps the pair of them, the fingerprints still match, and the cross-check happily waves the forgery through. Same hands, both times. Again.

So they don’t share a pipeline, and that’s the entire design, not an accident of how things ended up. The binary, and the key embedded in it, are built and signed in GitLab CI, which federates into AWS KMS to do the signing itself. The WKD key lives somewhere else completely: a Cloudflare Pages site serving openpgpkey.phpboyscout.uk, deployed by hand at rotation time with the Wrangler CLI and a token allowed to do nothing but edit that one Pages project. No Git integration, no webhook, nothing that lets a push to the repo or a run of the release pipeline so much as touch it. The Cloudflare account is even administered under a different email and a different second factor from the GitLab and AWS ones, so the three anchors really are independent rather than just feeling that way.

Which is what makes them fail independently, and that independence is the only thing that makes the agreement worth checking. To forge a release that survives the cross-check, an attacker doesn’t have to beat one system, they have to beat two unrelated ones, on different platforms, behind different credentials, in the same window, without either of them noticing.

There’s a quieter benefit in the cadence, too. Releases go out constantly and automatically; the WKD key changes rarely, and only ever by hand. So the busy, automated path, the one an attacker is most likely to prise open, is exactly the one with no power to rewrite the key everyone checks against.

Requiring it, without breaking everyone

Now, a check nobody ever switches on is just theatre. But switch it on before the keys are actually out there in people’s installs, and you’ve handed everyone a self-inflicted outage instead. So the default is deliberately timid. The framework ships DefaultRequireSignature = false: a tool built on go-tool-base doesn’t suddenly start rejecting its own updates the day its author bumps the framework version.

The tool author flips it to true in main(), but only after they’ve shipped a release that embeds the key, so every install out there already holds the trust anchor before the first release that insists on one. Ship the key, then turn the lock: the same leave-yourself-a-way-back discipline as any migration you’d like to still have a job after. And the end user still gets an override (update.require_signature, or an env var) for the day it all goes sideways and they need out.

What it actually buys

The first phase stopped accidents. This one stops the platform. And not because the cryptography is clever, OpenPGP checks the signature in a single call, but because the trust anchor is arranged so that nothing the attacker can actually reach holds the whole thing up on its own. A signature only ever proves the sender, never the contents. All of this is really about making “the sender” something a compromised release host can’t quietly fake its way into being.

Which leaves one last thread dangling. The verifying key gets fetched from somewhere, fine… but the signing key, the private half that actually produces these signatures, has to live somewhere the platform can’t reach either, or none of the rest holds up. That’s the capstone, and where this series ends: where that key lives, and why it never leaves the box it’s born in.

Three traps release-plz sets for a Rust workspace

Fri, 05 Jun 2026 00:00:00 +0000

I wrote up the two days I lost releasing a seventeen-crate workspace to crates.io as a war story, wrong turns and all. This is the other half: the field guide, so you don’t have to lose the same two days.

release-plz is a genuinely good tool, and none of what follows is a bug. It’s three behaviours that are entirely within its design and will still ambush you the moment you point it at a Cargo workspace rather than a single crate. Mildest first, because the third is the one that actually ate my release.

First, what release-plz is doing

In one line: it’s release-please for cargo. It keeps a Release MR open, bumps your versions and per-crate changelogs from your Conventional Commits, and when that MR merges it publishes every crate to crates.io and tags the release. On a workspace where N crates all share one version, “the release” is N publishes and N tag operations. Hold on to that N. It’s hiding behind all three traps.

Trap 1: the default tag template is built for one crate, not a workspace

You will reach for one tag per version, and for me it was more than tidiness. I wanted to ship the whole framework as a single release: one v0.5.1 covering all seventeen crates, because that was the compatibility promise I wanted to make. Use the crates that share a version and they’re guaranteed to work together. A single tag felt like the natural way to say “this is one coherent release of the whole thing” (and it didn’t hurt that the repo already had a v0.5.0 tag from before release-plz, so one unified tag also looked like continuity). So you either set this, or, worse, you leave git_tag_name unset assuming the default does something workspace-aware:

git_tag_name = "v{{ version }}"

Here’s the catch. release-plz’s default git_tag_name is v{{ version }}, and release-plz tags per crate. So the first crate publishes and creates the tag v0.5.1. The second crate publishes and tries to create v0.5.1 again:

ERROR failed to create git tag 'v0.5.1'
 "message": "Tag v0.5.1 already exists"

By the time you read that error, the first crate (and on a retry, the next, and the next) is already live on crates.io, and crates.io publishes are forever. Leaving the line out doesn’t save you, because the default is the same single-crate-shaped template. This is the trap I walked straight into on the release commit.

Trap 2: “one release for the whole workspace” isn’t a setting, it’s a category error

The natural next thought is “fine, I’ll keep one tag but configure release-plz to roll the crates into a single release.” There’s no knob for that, and chasing one is a waste of an afternoon. release-plz’s model is per-crate all the way down: per-crate tags, per-crate GitLab/GitHub releases, per-crate changelogs. “One unified release for the whole workspace” isn’t an option it withholds, it’s a shape it doesn’t have.

So you stop fighting it and set the per-crate templates explicitly:

git_tag_name = "{{ package }}-v{{ version }}"
git_release_name = "{{ package }} v{{ version }}"

Now each crate gets its own tag (rtb-assets-v0.5.1, rtb-config-v0.5.1, and so on) and its own release. It’s more objects per version than you wanted, but it’s the grain the tool works in, and once you accept that the collisions stop.

This is where I had to pull apart two things I’d quietly merged in my head: the version and the tag. The compatibility promise I cared about, that crates sharing a version work together, is carried by the version, and release-plz keeps every crate on the one workspace version no matter how it tags them. The tag is just a label pointing at a commit. I’d wanted a single tag to mean “one coherent framework release”, but the coherence was always in the shared version number, not in the tag. Once that landed, seventeen tags stopped feeling like seventeen releases of seventeen different things and started looking like what they are: seventeen labels on one versioned release. The version is not the tag. If you still want one human-facing narrative for the whole thing, keep a hand-written root CHANGELOG.md alongside the generated per-crate ones, rather than trying to make release-plz aggregate.

Trap 3: a release reads its config from the release commit, not HEAD

This is the small one, and the one that cost me the most, because it makes the fix for Trap 1 look like it isn’t working.

When release-plz runs a release, it does not read release-plz.toml from your working tree. It reads it from the release commit, the commit that first introduced the version it’s releasing. So picture the obvious recovery: you hit the tag collision, you realise your template is wrong, you fix it in a follow-up commit and push to main. Your fix is real. It’s committed. It’s on the default branch. And it is completely ignored, because the version hasn’t changed, so the release commit release-plz reads from is still the old one with the old template.

I didn’t take this on faith. With the corrected per-crate template sitting on HEAD, the CI release job still tried to create the unified tag, pinned to the old commit:

ERROR failed to create git tag 'v0.5.1' with ref 'f6de975...'
 "message": "Tag v0.5.1 already exists"

That ref is the release commit, not the HEAD that held my fix. And the cruel part: release-plz release --dry-run on your laptop reads your working-directory config, so it renders the shiny new per-crate tags and tells you you’re sorted. CI runs the real thing against the release commit and does something else entirely. Same config file, two different answers depending on who’s asking, which is why the war story has the title it does.

The operational rule that falls out of this: any release-plz config change that affects how a release behaves has to ride along with a version bump, or it does not apply. A “fix-up” commit on its own is a no-op.

If you set one thing

If you run release-plz on a multi-crate workspace and you change a single line from the defaults, make it the tag template:

git_tag_name = "{{ package }}-v{{ version }}"

And set it before your first release, not during it, so it’s already in the commit that introduces the version, because that’s the only commit a release will ever read it from. Everything else here follows from two facts: the grain is per-crate, and CI reads history while your laptop reads your working tree. Trust the history.

None of this is release-plz misbehaving. Every bit of it is documented and deliberate. It just isn’t where you’ll think to look until it has published six crates you can’t take back, which is roughly how I came to know it so well.

Telemetry that asks, and telemetry that doesn't

Thu, 04 Jun 2026 00:00:00 +0000

go-tool-base has had a thing called telemetry for a long while now. It’s the opt-in kind: the product analytics that asks a user’s permission before it phones a single byte home, sits there as a no-op until they say yes, and can be wiped on request. The whole package is built around consent.

Then the web-service series went and needed telemetry too. Not that telemetry. The other one, the one the rest of the industry means when it says the word: traces, metrics and logs of a running service. And the awkward thing about those two is that they share a name, they want to share a package, and they pull in exactly opposite directions on the one question that matters most.

This is the story of how 0.7.x grew a second telemetry without breaking the first, and where the line between them ended up getting drawn.

Why bother putting it in the framework at all

The starting point is that I could have left observability out. A reader could wire up OpenTelemetry in their own service and go about their day. But the six parts of the web-service series spent a lot of effort making the transports first-class: a gRPC server, an HTTP server, a gateway, TLS across all of them, each one a Register call against the controller. Turning a CLI into a real long-running service and then shrugging “observability is your problem” would have left a hole exactly where it hurts.

Because a service you can’t see into is a liability the moment it leaves your laptop. The series ended with a macguffin service that was typed, fast and served over TLS, and was also a black box: when it got slow, you had nowhere to look. Metrics and traces are how you get the lights on, and they deserved the same first-class treatment as the things they observe.

The other half of the reason is that the framework already had a foot in this world. The analytics package’s preferred backend speaks OTLP, the OpenTelemetry wire protocol. So OpenTelemetry was already in the building. Doing observability any other way would have meant two standards where one would do.

The catch: two telemetries, opposite instincts

Here’s where it gets interesting, and it’s the part worth slowing down on.

The analytics telemetry is about a user. It collects usage data, hashed machine id, which command ran, exit code, and the entire design assumes you have to ask first. It is off by default. The collector you get when it’s disabled is a no-op, so nothing is recorded until the user opts in, and there’s a deletion path for when they change their mind. That’s not an add-on, that’s by design.

The observability telemetry is about a service. It emits operational data, how long a request took, which span was slow, how many errored, to a collector the operator runs. And there is no user in the loop to ask. The operator deploys the service, points it at their collector, and that act is itself the consent. Asking would be nonsensical: whose permission, for data about their own service, on their own infrastructure?

So you have two things called telemetry, wanting to live in one package, with the opposite default on consent. One is off until someone says yes; the other is on the moment it’s configured. Get that wiring wrong and you fail in one of two ugly ways. Gate the operational telemetry behind the user’s analytics opt-in, and a service’s tracing silently does nothing because nobody ticked a box meant for something else. Or loosen the analytics gate to make observability flow, and you start leaking usage data the user never agreed to share. Neither is acceptable, and “just use two packages” throws away everything the two genuinely have in common.

Quite a lot, as it turns out, and all of it below the consent line.

Both ship their data over OTLP to a collector. Both need to describe who is emitting, the service name and version, the resource in OpenTelemetry’s terms. Both parse an endpoint, attach headers, decide whether the connection is plaintext. None of that has the faintest thing to do with consent. It’s just the plumbing of getting bytes to a collector, and the analytics backend already had all of it, written inline.

So the shape of the solution fell out of the problem. Lift the shared plumbing into one place, let both telemetries stand on it, and keep the consent decision firmly out of that shared layer. The structure under pkg/telemetry ended up like this:

pkg/telemetry/
 telemetry.go the analytics Collector (consent-gated)
 backend_otel.go its OTLP backend
 posthog/ datadog/ vendor analytics backends
 otelcore/ shared: OTLP endpoint, resource, telemetry.* config
 tracing/ observability signal
 metrics/ observability signal
 logs/ observability signal
 observability.go Setup: builds the enabled signals (implied consent)

The new otelcore is the keystone. It holds the three things both sides need and nothing they don’t: ParseEndpoint for the OTLP URL, Resource for the service identity, and Resolve for reading the shared telemetry.* config (a base endpoint, plus per-signal overrides, in the same cascade as the TLS config). It imports no signal exporter and knows nothing about traces, metrics, logs or analytics. It is deliberately dumb plumbing.

The refactor: making the old telemetry stand on the new core

This next part is where the old telemetry and the new one become a single thing. The analytics OTLP backend was the first user of OTLP in the framework, and it had grown its own copy of all that plumbing: a function that parsed the endpoint URL, split out the host and path, worked out the insecure flag, and built the resource from a service name. Exactly the code the three new signals were about to need.

So rather than write it a second time and let the two drift, the analytics backend was refactored onto otelcore. Its exporter builder, buildOTelExporterOpts, now calls otelcore.ParseEndpoint, the same function tracing, metrics and logs call, and the resource comes from otelcore.Resource, the same one they use. One implementation of “talk OTLP to a collector”, four callers: the analytics backend and the three observability signals. Change how the framework forms an OTLP endpoint, and every signal moves together.

The reassuring part was that the analytics tests didn’t budge. The refactor moved code without changing behaviour, and the consent machinery, the opt-in, the no-op-when-disabled, the deletion path, never came near otelcore. Which is exactly the point.

Where the line is

Because the shared core is the easy half. The half that earns its keep is the bit that isn’t shared, and it’s a single, deliberate line.

The analytics collector keeps its gate. The constructor, NewCollector, still returns a no-op the moment telemetry is disabled, so a user who hasn’t opted in gets a collector that silently discards everything. Informed consent, untouched.

Observability gets a different door entirely. Setup builds whichever signals the operator has switched on, and it is gated only by telemetry.tracing.enabled and its siblings, which the operator sets. It never consults the analytics opt-in. Turning on tracing doesn’t turn on analytics; disabling analytics doesn’t silence tracing. The two enable flags live under the same telemetry.* config root, sit next to each other, and never read each other.

So that’s the whole architecture in a sentence: one package, one OTLP export core, two consent models that share everything except the answer to “do we need to ask”. The principle underneath, the one that decided every one of these calls, is that the kind of data sets the consent model. Usage data about a person needs informed consent. Operational data about a service runs on implied consent. The CLI and the web service are just where each kind tends to live.

Where this leaves the framework

0.7.x came out the other side with both telemetries: the one that asks first, exactly as it was, and a new one that doesn’t, because it has nobody to ask. They share an export core, a config root and a name, and they part company on the only thing they were ever going to disagree about.

I’ve been careful here to describe how the two consent models are kept apart, not to argue why they have to be. That argument, that “the kind of data decides the consent model” is a line worth holding rather than a convenient bit of engineering, is a piece of its own, and it’s the one I’m writing next.

Same config, two answers

Wed, 03 Jun 2026 00:00:00 +0000

Let me confess a small heresy first, because it’s the reason any of this happened. After a career spent as a branching man, gitflow, gitlabflow, a tidy develop branch and a careful dance of merges, I’ve come round to trunk-based development. I resisted it for years. It felt like working without a net.

What changed my mind was working solo with an AI pair. The branch ceremony that earns its keep on a team of eight is just drag when it’s me and a model at two in the morning. So I’ve softened on “main is always deployable” and let the trunk act as the develop branch, with tagged releases as the actual source of truth. For compiled languages, where the artefact you ship is a built, tagged thing and not whatever’s on a server right now, that finally clicks.

I’d already rolled this out on my Go and Terraform projects with releaser-pleaser, a GitLab-native take on release-please: a bot keeps a Release MR open, and merging it cuts the tag. It’s the same model I wrote about when the infra repo moved to plan-on-merge, apply-on-tag. Lovely. Then I came to do the same for rust-tool-base, and Rust, being Rust, had opinions.

Rust brings its own toolchain

releaser-pleaser is happy to tag a repo and write a release. What it does not do is cargo publish seventeen crates to crates.io in dependency order. Rust’s release story isn’t “push a tag and let a runner build a binary”, it’s a whole publishing pipeline with a public registry at the end of it, and that registry has rules of its own. So for the Rust workspace I reached for the tool built for exactly that job: release-plz. Same Release-MR shape, but it understands cargo, versions every crate, and publishes the lot.

That was the right call. Getting it to actually do it was where I spent two days I’d quite like back.

The gauntlet before the gun

Before I got anywhere near the interesting failure, there was a run of CI papercuts, the sort where every fix politely reveals the next one. GitLab checks out a detached HEAD, and release-plz wants to be on a branch (“HEAD does not point to a branch”), so you re-attach. Then the default CI_JOB_TOKEN can’t push to a protected repo, so you point the remote at a real token. Then release-plz assumes you’re on GitHub and errors that the repo “is not hosted in GitHub”, so you tell it --forge gitlab. Then it refuses to run at all because the pages job left a public/ directory lying about and the working tree is “dirty”, so you stop pulling artefacts into the job.

Five merge requests before the thing would even start doing its actual job. You can read the scar tissue in the before_script; every line in it is a fix for something on that list. None of it was hard. It was just death by a thousand cuts, and I was feeling quite smug by the time it finally reached the publish step.

I should not have been.

“Tag v0.5.1 already exists”

My release-plz.toml asked for one tag per release:

git_tag_name = "v{{ version }}"
git_release_name = "v{{ version }}"

That felt obviously right. It matched the repo’s existing v0.5.0 tag, it’s how a single-crate project tags, and the crates all share one workspace version anyway. One version, one tag. What’s to argue with?

release-plz, that’s what. It tags per crate. So it publishes a crate, creates the tag, publishes the next crate, and tries to create the same tag again:

INFO published rtb-assets 0.5.1
ERROR failed to release package
Caused by:
 0: failed to create git tag 'v0.5.1' with ref 'f6de975a75...'
 1: Response body: { "message": "Tag v0.5.1 already exists" }
 2: HTTP status client error (400 Bad Request) ... /repository/tags

The collision is annoying. What makes it a proper trap is the half-second before it: published rtb-assets 0.5.1. That happened. On crates.io. For keeps. A crates.io publish is forever, there is no unpublish, only a yank that still leaves the name and version burned. So every time my flaky pipeline limped one crate further and then fell over on the tag, it left another crate live on the public registry that I could never take back. By the time the dust settled, six of the seventeen were out there: rtb-assets and rtb-config, then on a later retry rtb-credentials and rtb-error, then rtb-app and rtb-redact. Two more permanent crates per failed run.

I assumed the default

My first fix was the clever one, and it deserves to be on display because it’s the whole lesson in miniature. I deleted the git_tag_name line. My reasoning: per-crate tags are release-plz’s native model, so surely its default does the right thing without me spelling it out. I was confident enough to write it into the commit message: “per-crate tags/releases (release-plz defaults).”

The next run collided on v0.5.1, exactly as before.

Because release-plz’s default git_tag_name is not per-crate. It’s the unified v{{ version }}. I had deleted a line that said the wrong thing and replaced it with a default that said the same wrong thing, then congratulated myself for tidiness. If I’d spent thirty seconds on the configuration reference instead of thirty seconds being clever, I’d have read that in black and white.

Same config, two answers

So I read the manual, and set it explicitly:

git_tag_name = "{{ package }}-v{{ version }}"
git_release_name = "{{ package }} v{{ version }}"

On my laptop, a dry run rendered exactly the per-crate tags I wanted. In CI, the very next run published another crate and then created the tag v0.5.1. The unified one. The wrong one. The one I had just, demonstrably, on main, replaced. Same release-plz.toml, two completely different answers depending on who was asking.

That one took me an embarrassingly long time to see. release-plz does not read your config from the working tree when it runs a release. It reads it from the release commit, the commit that introduced the version it’s releasing. My version was still 0.5.1, set days earlier on a commit that still carried the unified template. You can see it in the failure: the tag it tries to create is pinned to ref 'f6de975...', an old commit, not the HEAD that held my fix. Every edit I made at the tip of main was real, committed, and utterly invisible to the release of 0.5.1, because no version bump had created a fresh release commit for it to read. My fix was correct and inert at the same time. The dry run read my working directory and looked perfect; CI read history and did something else.

There is no config change that rescues an in-flight release. The version was already out, half-published, tagged wrong, and pointed at a commit I couldn’t edit without bumping the version, which I couldn’t cleanly do with six crates already live.

Doing it the way I’d have done it a year ago

So I stopped. Three retries deep, each one a seventy-minute CI cycle thrown at an opaque mismatch, six crates already immovable on crates.io, and a tooling problem I now understood well enough to know the tool was never going to dig me out of this particular hole. The question quietly changed from “why is it doing this?” to “am I going to keep grinding, or finish this the way I would have before I had clever tooling?”

I went manual. cargo publish, the remaining eleven, by hand, in dependency order: the leaf crates first and the rust-tool-base umbrella dead last, because it depends on all of them. crates.io rate-limits new crate names, so after a burst it simply made me wait, a roughly half-hour pause in the middle while the registry caught its breath and I caught mine. Then one v0.5.1 tag, cut by hand, and one GitLab release to match the convention. The next CI run came up green, for the gloriously dull reason that there was nothing left to do: every crate published, the tag already there.

Stop being clever and RTFM

The tool was never broken. Every single thing it did was documented behaviour I hadn’t bothered to read: that the default tag template is unified, that the model is per-crate, that a release reads its config from the release commit and not from HEAD. I assumed my way past the manual three times in a row, and each assumption cost me real, permanent state on a public registry that doesn’t take returns.

And that’s the part that actually stung, because I should have known better than most. I wasn’t a beginner here. I knew the Release-MR pattern cold, I’d shipped it half a dozen times with releaser-pleaser on my Go and Terraform repos. That familiarity was the trap. I trusted the pattern and skipped the tool, on the lazy assumption that something I understood well in one tool would behave the same in the next. release-plz carries the same design, but it’s a different tool, with its own defaults and its own idea of where the config lives. The pattern came across fine. The mechanics didn’t, and I never thought to check.

So here’s the lesson, written down in the hope it sticks this time: no matter how familiar I am with a pattern or a design, the moment I switch the tool that implements it, reading the manual is paramount. The familiarity is exactly what tempts you to skip it, and exactly why you can’t. (The narrower, more practical one, while I’m here: a config change that affects how a release behaves has to travel with a version bump, or it sits there looking applied and doing nothing.)

release-plz is genuinely good, and every release since has gone out clean on the first try, the way the rest of the CI now does. I just had to stop being clever long enough to read how it actually works. RTFM. I’ll get it tattooed eventually.

The security service I had to switch off

Mon, 01 Jun 2026 00:00:00 +0000

A while back I wrote about hardening the account that would hold the signing key, and one line in it has aged badly. “GuardDuty is already looking,” I wrote: the account watched from day one, threat detection on before the key even arrives. Then I went to apply that baseline to a brand-new account, and GuardDuty wasn’t looking at all, because the account wouldn’t let it.

What the baseline switches on

The baseline runs a threat-detection module that turns on three AWS services: GuardDuty, Security Hub, and IAM Access Analyzer. Each sits behind an enable_* toggle, all defaulting to on, which is the right default. An account about to hold something sensitive should be watched, flagged and analysed from the start. The hardening post’s whole argument was that you fit the locks before you move the valuables in, and I stand by it. The gap I hadn’t accounted for is the one between a posture you can design and a posture a fresh account will actually run on its first day.

SubscriptionRequiredException

GuardDuty and Security Hub are first-party AWS services, but they are not on by default. They have to be activated for the account before anything can configure them. Point OpenTofu at a brand-new account that has never had them switched on and you don’t get a security baseline, you get a SubscriptionRequiredException and a failed apply. The locks you carefully specced won’t fit, because the door hasn’t been registered as a door yet.

I’d actually met this exact wall from another direction: an aws-nuke dry-run on a fresh account throws screenfuls of the same exception for every service you’ve never enabled. Same root cause, different tool. A new AWS account is a quieter, emptier place than the console makes it look.

There’s a second, duller reason too, and it belongs in the open: GuardDuty and Security Hub both cost money once their free trial lapses, and the budget for this account wasn’t in place yet. Enabling a service you can’t yet afford to keep on is its own small mistake.

Switching two watchers back off, on purpose

So I did the thing that felt wrong and was right. I switched two of them off, and wrote the reason straight into the module call:

# GuardDuty + Security Hub are deferred. The account's first-time
# service activation is pending (it returns SubscriptionRequiredException)
# and the services carry an ongoing cost beyond their free trial.
# Re-enable by flipping both to true (or deleting these lines) once
# activation clears and the budget is in place.
enable_guardduty = false
enable_securityhub = false

Two things make that a deferral rather than a retreat. First, it’s scoped: Access Analyzer needs no subscription and costs nothing, so it stays on. The account isn’t unwatched, it’s watched by the part that can watch it right now. Second, it carries its own undo. The comment is the re-enable instructions, the toggles sit right there, and flipping them back is a one-line change the day activation clears and the budget lands. A disabled control with a written path back is a deferral. A disabled control with no note is a hole someone finds in a year.

Why this lives in the module, not in a hack

I could only do this cleanly because the module exposes each service as its own toggle, and it only gained that recently; the previous version was all-or-nothing on threat detection. Granular enable_guardduty and enable_securityhub flags are exactly what let you say “this account, these two, not yet” without forking the module or commenting resources out. It’s the difference between a baseline you adapt per account and one you fight.

The honest version of “secure by default”

The hardening post wasn’t wrong. It was idealised. Secure-by-default is the right aim, and on a mature account the baseline goes on clean. On a brand-new one, reality intrudes: services that must be activated before they can be configured, costs that need a budget before they can be incurred. The honest response isn’t to pretend the posture is fully live when it isn’t. It’s to enable what the account will take, defer what it won’t, write down precisely why and how to finish, and leave Access Analyzer watching in the meantime. GuardDuty will be looking soon enough. It just wasn’t looking on day one, and saying so is better than a comment-free enable = true that quietly errored on every apply.

From allow_failure to blocking

Sat, 30 May 2026 00:00:00 +0000

There’s a special kind of CI job that everyone on a team quietly learns to ignore: the one marked allow_failure: true. It runs, it goes red, the pipeline goes green anyway, and after the third time you stop looking at it. I inherited six of those when I moved rust-tool-base’s CI to GitLab. Over a few days I turned three of them into real gates, and the interesting part was never the YAML. It was working out which ones had earned the right to block, and which hadn’t.

What allow_failure actually buys you

allow_failure: true is genuinely useful, and quietly corrosive. It lets a job report a problem without stopping the pipeline, which is exactly right for a check that’s noisy, not yet stable, or guarding against something you can’t fix this minute. The trouble is that a warning nobody is forced to act on is a warning nobody acts on. Leave a job advisory long enough and it becomes scenery: red, ignored, pointless. So an advisory check is really a promise, “I’ll make this blocking once it’s trustworthy”, and a promise you only ever mean to keep is just a lie you haven’t noticed yet.

When I migrated rust-tool-base from GitHub Actions to GitLab CI, the move landed six jobs as allow_failure: true: the macOS and Windows tests, the integration tests, cargo-audit, trivy, and coverage. That wasn’t laziness. A migration is the wrong moment to also be fighting flaky gates. But it left me holding six promises to either keep or admit I wasn’t going to.

A check earns the right to block

Here’s the rule I settled on. A check earns the right to fail your build when two things are true: it’s meaningful (a red result is a real problem, not noise) and it’s reliable (it goes red only when there genuinely is a problem, and it can actually run to completion). Flip a check to blocking before both hold and you haven’t raised the bar, you’ve taught the team to force-merge past red, which is worse than no gate at all, because now the red means nothing.

Three of my six crossed that line within a few days. Three deliberately didn’t. The reasons are the whole story.

trivy: blocked once there was nothing to block on

trivy scans the dependency tree for HIGH and CRITICAL advisories. It went across as advisory for an honest reason: the Cargo.lock at migration time already carried two known HIGH/CRITICAL advisories I hadn’t cleared yet, a path-traversal in gix-validate and a DNS-rebinding issue in rmcp. Make trivy blocking with those sitting there and the pipeline is red from day one, over problems I already knew about and was already fixing. So it stayed advisory until the dependency bumps cleared both, and then the allow_failure line came out. The gate never changed. The tree underneath it got clean enough to stand on.

integration-tests: blocked once it could actually run

The integration tests stand up a real Gitea in a Docker-in-Docker service and talk to it. They were advisory for a different reason: they couldn’t reliably run. dind needs a privileged runner, and the suite was resolving the container host with a hardcoded 127.0.0.1 that didn’t hold everywhere. Blocking a job that fails for infrastructure reasons rather than code reasons is the fastest way to make people distrust the entire pipeline. So the fix wasn’t in the YAML, it was making the thing dependable: privileged set on the runner, and the host resolved through the test library’s own get_host() instead of a hardcoded address. Once it ran the same way every time, it earned the gate.

coverage: blocked once it could run at all, then once it cleared the bar

Coverage is the two-step one, and my favourite, because it nearly didn’t make it for a thoroughly undramatic reason: it ran out of memory. cargo llvm-cov instruments every test binary, and linking hundreds of instrumented object files needs more RAM than the shared medium runner had, so the job bus-errored on the link. I tagged it onto a larger runner, and then the shared SaaS runners were switched off entirely, so the tag matched nothing and the job sat pending forever.

The fix was a self-hosted homelab runner with the RAM the instrumented link actually needs. I moved coverage there but kept it advisory for one run, to confirm the box could finish the build before I trusted it. It did, at 73.22% line coverage, so I set the gate to fail under 70% and made it blocking. Three points of headroom: enough that ordinary churn won’t trip it, tight enough that a real drop will. A coverage gate pinned to the current number is a tripwire that fires on the very next commit; set it a touch below and it catches regressions instead of normal life.

The three I left advisory, on purpose

The point was never “block everything”. Three jobs are still allow_failure in the current pipeline, deliberately. The macOS and Windows tests run on SaaS runners that bill by the minute; they’re worth running, not worth blocking every merge of a Linux-first project over a quota I’m choosing to ration. And cargo-audit stays advisory because cargo-deny already does the blocking advisory check: cargo-audit is a second opinion from a different database, and a second opinion that can veto isn’t a second opinion, it’s a duplicate gate that will eventually disagree with the first and block you on the difference.

That’s the same rule from the other side. Those three haven’t earned the right to block, because blocking them would cost more than it ever caught.

The upshot

allow_failure: true is fine as a waiting room and corrosive as a destination. Every advisory check is a promise to make it blocking once it’s both meaningful and reliable, and the job is to keep the promise or admit you won’t. trivy earned its gate when the advisories cleared, the integration tests when they ran the same way every time, coverage when it had a runner with enough memory and a threshold set just below the current mark. The three I left advisory earned that standing too, by costing more to block than they’d catch. The YAML is one deleted line per job. Knowing which line to delete, and when, is the whole skill.

Two bugs that taught me the rules

Wed, 20 May 2026 00:00:00 +0000

Some bugs are interesting because they’re subtle. These two were interesting because they were the exact opposite… in each case the tool had a hard rule I simply didn’t know about, and its error message couldn’t be bothered to tell me what that rule was. Both came out of building the infrastructure toolchain, both cost me a good deal more time than they had any right to, and both are the sort of thing that looks blindingly obvious the moment you know it and utterly baffling until you do.

So here they are, written down, partly to save you the bother and partly so I don’t go and forget them myself.

Bug one: the rule-less job that skips your merge requests

The cicd gate components, in their first cut, shipped with no rules: block. They were dead simple jobs: lint, scan, validate. No conditions, because they should just always run. Obviously.

They ran on branch pipelines. On merge requests, they didn’t run at all! The gates that were the entire point of the components were simply absent from the one place you’d most want to see them… the merge request.

The cause is a GitLab CI rule that’s remarkably easy to go years without ever learning: a job with no rules: block runs only on branch and tag pipelines. It does not run on merge-request pipelines. So “no conditions” doesn’t mean “runs everywhere” at all. It means “runs everywhere except a merge request”, which is about the least intuitive default I can think of.

The fix is faintly absurd, and that’s exactly what makes it stick. You add an unconditional rule: rules: [{ when: on_success }]. The content of that rule does precisely nothing. It always matches. What actually matters is that the job now has a rules: block at all, because merely having one is what makes a job eligible for merge-request pipelines. A rule whose content is meaningless, added solely so the block exists. That’s the fix. I’ll admit I stared at it for a moment.

Bug two: the import block that only works at the root

The second one came from terraform-aws-security-baseline. The account-hardening module needed to adopt a resource that already existed in the account, which is exactly what OpenTofu’s import {} block is for. So an import block went into the account-hardening module, right next to the resource it was adopting. The natural home for it, surely.

OpenTofu disagreed, and rejected it outright. The rule: an import block is only allowed in the root module. It can’t live inside a child module. A module that wants one of its own resources imported can’t declare that import itself… the import has to be declared up at the root, and the root caller does the adopting.

The fix was to take the import block out of the module and document caller-side adoption instead. The module describes the resource, and the root configuration that calls the module is where the import actually lives.

Two unrelated bugs, in two completely different tools, and the same shape sitting underneath both of them.

In each case the tool has a hard structural rule. Where a block is allowed to live. What makes a job eligible for a particular kind of pipeline. And in each case the error told me the tool was unhappy without telling me which rule I’d broken, so the obvious next move (debugging my own logic) was the wrong move entirely. There was nothing wrong with the logic. The thing was simply in a place the tool doesn’t allow, or missing a block the tool quietly insists on.

The lasting lesson here isn’t the two specific rules, useful as they are to know. It’s the reflex. When something that should obviously work just doesn’t, and the error is unhelpful, stop debugging your logic and start suspecting a structural rule about where something is allowed to be, or whether a thing is eligible in the first place. GitLab CI and OpenTofu both have a handful of these, and you mostly learn them the hard way, by tripping over them. Knowing the shape of the category at least means the next one costs you an hour instead of a whole afternoon.

Worth remembering

Two bugs from building the toolchain, one shape. A GitLab CI job with no rules: block runs on branches and tags but silently not on merge requests, and the fix is an unconditional rules: block whose content does nothing and whose mere existence is the entire point. An OpenTofu import block gets rejected inside a child module, because imports are only legal at the root, so the caller adopts and the module just describes.

Neither error named the rule it was enforcing, and that’s the category to watch for. When sound logic fails against an unhelpful error, suspect a structural rule about where a thing may live or whether it’s even eligible… not a bug in what you actually wrote. It’ll save you an afternoon. It certainly cost me a couple.

Reviewed, then applied

Mon, 18 May 2026 00:00:00 +0000

The genuinely dangerous moment in infrastructure-as-code isn’t the apply. It’s the gap between the plan a human read and approved, and the change that actually runs a moment later. If those two are different computations (and by default they are) then nobody really reviewed the thing that touched your account. The infra repo closes that gap from both ends.

The gap between “reviewed” and “ran”

Here’s the moment in infrastructure-as-code where things go wrong.

Someone opens a merge request. CI runs tofu plan and the output is there to review: these three resources change, this one is destroyed. A human reads it, decides it’s correct, approves, merges. Then apply runs.

The trap is in what apply actually applies. If apply does its own fresh tofu plan and then applies that, the change that runs is not necessarily the change that was reviewed. State can have moved. A provider can have drifted. Someone else can have applied something in between. The reviewed plan and the applied change are two separate computations done at two different moments, and every difference between those moments is a change nobody looked at.

infra closes that gap from both ends.

Plan as an artifact

The first end is making the reviewed plan and the applied plan the same object.

The tofu-plan component runs the plan and saves it. It writes tfplan.cache, OpenTofu’s binary plan file, as a CI artifact. It also writes tfplan.json, which GitLab renders as a plan widget right in the merge request: the add, change and destroy summary, there to review without leaving the MR.

The tofu-apply component then does not re-plan. It applies that saved tfplan.cache. And OpenTofu itself enforces the safety net: applying a stale plan file, one captured against a state that has since moved, is rejected by the tool. So what reaches the account is provably the plan that was reviewed, or it’s nothing at all. There’s no third option where something unreviewed slips through.

Applying is a human decision

The second end is when apply runs.

infra is trunk-based: it dropped the develop branch and works on main. But a naive trunk setup auto-applies every push to main, which means there’s no human gate at all, just whatever the last merge happened to contain.

So the gate is built explicitly. releaser-pleaser keeps a release merge request open against main. Ordinary merges to main run plans but apply nothing. The apply happens only when a person merges the release MR. Merging it cuts a release tag, and the tag pipeline is what runs tofu-apply, against the plan banked by the latest main pipeline.

The effect is that the act of applying to the account is the deliberate, visible act of merging the release request. Nothing reaches the account because a commit landed. It reaches the account because a person decided a release should go out and merged it. (Which, after the accidental v2.0.0 that kicked off the whole GitLab move, is a discipline I’d freshly relearned the value of.)

The guard on the gate

There’s one more piece, because a gate is only as good as its precondition.

A verify-main-plan job blocks the release MR from being mergeable unless the latest main pipeline is green. You can’t cut a release, and therefore can’t apply, on top of a main whose plan didn’t even succeed. The human gate has its own gate: the thing you’re about to merge has to be standing on a known-good plan before you’re allowed to merge it.

The bottom line

The risk in infrastructure-as-code is the gap between the plan a human reviewed and the change that runs, because a re-plan at apply time is a different computation from the one that was approved.

infra closes it twice over. tofu-plan saves the plan as a tfplan.cache artifact and renders it as a merge-request widget; tofu-apply applies that exact artifact, and OpenTofu rejects it outright if the state has moved underneath it. And applying is gated on a human merging a releaser-pleaser release request, not on a push, with a verify-main-plan check making sure that request can only be merged on top of a green plan. What gets applied is what was reviewed, when a person decided it should be.

One graph, not micro-stacks

Sun, 17 May 2026 00:00:00 +0000

Once an infrastructure repo has a few concerns in it (account hardening, the security baseline, the signing stack still to come) there’s a steady pressure to split them into separate stacks with separate state, and Terragrunt is right there to help you do it. The infra repo keeps everything in one OpenTofu graph instead. The reason comes down to who enforces your dependency ordering: the engine, or you.

The pressure to split

The infra repo’s src/ has several concerns in it, and more coming, the signing stack among them. Once a repo reaches that point, there’s a steady pressure to split: one stack per concern, each with its own state file.

It’s an appealing pressure. Separate stacks feel modular. Each apply touches less, so the blast radius of any one run is smaller. And Terragrunt exists, popular and well-regarded, precisely to orchestrate a fleet of separate stacks. The path is well trodden.

infra didn’t take it. src/ is a single OpenTofu root stack: each concern is a module block, in its own main.<concern>.tf file, all sharing one state and one graph.

What one graph gives you

The thing a single graph gives you is engine-enforced truth about ordering and data.

Inside one OpenTofu graph, the tool builds the full dependency DAG itself. When the signing stack needs a value the security baseline produced, you reference it directly, module.baseline.something, and OpenTofu guarantees two things: the baseline is created before the thing that depends on it, and the value handed across is the current one from this same apply. Ordering and data-passing aren’t things you arranged. They’re facts the engine checks and enforces, every plan, every apply.

What splitting costs

Split src/ into per-concern stacks with separate state, and that guarantee is the thing you spend.

Now one stack reads another’s outputs through terraform_remote_state. That’s a lookup of a snapshot: the other stack’s last applied state, whatever it was, whenever that was. It’s not a live edge in a graph. Ordering is no longer enforced by the engine either; it becomes something you arrange yourself, in CI stage sequencing or in Terragrunt’s own dependency blocks.

That’s the trade, stated plainly. You give up a strong, engine-checked guarantee, and you buy back a weaker, hand-arranged imitation of it. Terragrunt is a good tool for managing that weaker world tidily. But the question worth asking first is whether you should be in the weaker world at all.

When splitting is genuinely right

This isn’t an argument that splitting is always wrong. Separate states genuinely earn their place when concerns have different change cadences, different access boundaries, or different teams owning them: when you actively want an apply of one to be unable to touch another, and you want different people holding different state.

infra has none of those. It’s a single account, a single operator, one cohesive set of concerns. The only thing splitting would buy here is a smaller per-apply blast radius, and that’s better handled by reviewing the plan before it applies, which the next post is about, than by fragmenting the dependency graph. So src/ stays one graph, and Terragrunt was considered and deliberately not adopted.

If ordering between graphs is ever needed

If infra ever does genuinely need more than one stack, the plan isn’t Terragrunt. It’s to keep each stack a single strong graph internally, and to sequence the stacks with CI stages. Keep the engine-enforced guarantee where it’s strongest, inside each graph, and reach for hand-arranged ordering only at the one seam where it’s unavoidable.

Boiling it down

A multi-concern infrastructure repo feels like it should be split into per-concern stacks, and Terragrunt is right there to manage the result. infra keeps src/ as one OpenTofu graph instead.

Inside one graph, OpenTofu enforces dependency ordering and passes current values across module boundaries as checked facts. Split into separate states and that becomes a terraform_remote_state snapshot lookup plus ordering you arrange by hand: a weaker version of what you gave up. Splitting is right when concerns have different cadences, boundaries or owners; for a single-account, single-operator repo none of that applies, so the strong guarantee is worth keeping, and Terragrunt is the tool for a problem infra chose not to have.

CI you include, not copy

Sat, 16 May 2026 00:00:00 +0000

Every infrastructure repo runs the same CI: lint the OpenTofu, scan it, validate it, plan, apply. The first repo, you write that .gitlab-ci.yml by hand. The second, you copy it. By the third, you’ve got three copies of the same pipeline quietly drifting apart, which is the exact problem you’d never tolerate in application code. The cicd repo is the fix, and it’s just the library-first instinct pointed at the pipeline.

The `.gitlab-ci.yml` you keep copying

The infrastructure repos in this series all run the same CI gate jobs: format and validate the OpenTofu, lint it, scan it for security issues and secrets, and on the deploy side, plan and apply.

The first repo, you write that .gitlab-ci.yml by hand. The second repo needs the same jobs, so you copy it. The third repo, you copy it again. Now there are three copies of the same pipeline, and they do what copies always do. They drift. A fix you make in one repo’s CI doesn’t reach the other two. A tightened scan rule lands in the repo you were working in and nowhere else. It’s the copy-paste problem, exactly as it shows up in application code, just written in YAML and therefore that bit easier to pretend isn’t code.

GitLab has a feature for exactly this

GitLab CI/CD Components are the answer to that problem. A component is a reusable, versioned piece of pipeline that you publish, and other projects pull in with an include: pinned to a version:

include:
 - component: gitlab.com/phpboyscout/cicd/tofu-lint@v0.5.0

That’s a library import, for pipeline. The component has a defined interface, a version, and a home in GitLab’s CI/CD Catalog. A consuming repo includes it instead of carrying its own copy, and when the component improves, the consumer moves a version pin rather than re-copying YAML.

Why a monorepo of components

The cicd repo holds all of the components together: tofu-lint, tofu-security, tofu-validate, tofu-plan, tofu-apply, and more. One project, not one project per component.

That’s a deliberate call, and the reason is how GitLab versions things. A version is a tag, and a tag belongs to a project. A component’s version is its project’s tag. So a monorepo of components, versioned together as one tag stream, is the natural unit: a consumer pins @v0.5.0 and gets a known-good set of components that were tested together, rather than juggling a separate version for each one.

Authoring discipline

A component is a file under templates/, and it opens with a spec: inputs: block: the typed inputs, their defaults, the component’s public interface.

The discipline that keeps the library usable is that a component must be consumer-agnostic. It never hardcodes a token, and it never names a particular consumer’s variable. Inputs have sensible defaults, and a consuming repo overrides them. A component that reaches out and assumes something about the repo including it is a component that works in one repo and surprises the next. An authoring guide in the repo keeps that consistent across everyone who adds a component.

The self-test you cannot fully write

The cicd repo tests its own components with a self-test pipeline. It’s worth knowing where that self-test stops.

When a repo tests its own components by running them in child pipelines, GitLab masks $CI_PIPELINE_SOURCE as parent_pipeline. A component’s rules:, which often branch on the pipeline source to behave differently for a merge request than for a branch or a tag, therefore can’t be exercised honestly by the self-test: the source they’d branch on has been flattened. The self-test covers what it can, and the component rules: are, in the end, validated by real consumers using them for real. That’s a genuine limit, and naming it is better than pretending the self-test proves more than it does. (It’s also, not coincidentally, the exact rules: quirk that bit me in one of the two bugs I closed the series with.)

The same instinct, again

This blog keeps circling the same instinct. go-tool-base exists because the same CLI scaffolding kept getting rewritten, so it was extracted into a library. cicd is that instinct pointed at the pipeline: the same gate jobs kept getting copied between repos, so they were extracted into a versioned, included library.

Stop copy-pasting. Publish, version, include. It’s true for CLI code, and it turns out to be just as true for the YAML that builds and ships it.

The gist

Every infrastructure repo needs the same CI, and copying the .gitlab-ci.yml between them produces copies that drift apart. GitLab CI/CD Components fix it: reusable, versioned pipeline that a repo include:s and pins, instead of carrying its own copy.

cicd is a monorepo of those components, versioned together as one tag stream, because GitLab tags a project and a component’s version is its project’s tag. Components are authored consumer-agnostic, with typed spec: inputs: and no hardcoded assumptions, and their rules: are validated by real use because the self-test can’t see the pipeline source. It’s the library-first instinct, applied to CI: publish it once, include it everywhere, fix it in one place.

One image for the whole toolchain

Fri, 15 May 2026 00:00:00 +0000

Every CI gate job across the infrastructure repos reaches for the same pile of tools: OpenTofu, tflint, trivy, checkov, gitleaks, terraform-docs, the AWS CLI. Installing that pile per job is both slow and quietly dangerous, because nothing pins it consistently. infra-tools is the obvious fix (one image, one source of truth for versions), but two of its build decisions are less obvious and worth a look: it publishes with crane instead of a second build, and it deliberately lets its own vulnerability scan fail.

The same pile of tools, in every repo

Every infrastructure repo in this series runs the same CI gate jobs: format and validate the OpenTofu, lint it, scan it for security problems and secrets, check the docs. Those jobs need a specific set of tools, and it’s the same set in every repo.

Install them per job and you pay twice. You pay in time, because every pipeline downloads and installs the whole set again. And you pay in drift, because unless every repo pins every tool identically, the repos slowly diverge on which version of trivy or tflint they actually run, and a check that passes in one repo fails in another for no reason anyone can see.

One image, one source of truth

infra-tools is the answer: a single Debian-based container image with the whole toolchain baked in. Every CI job in every repo uses it with one image: line.

The real value isn’t the convenience. It’s that the image is the one place tool versions are pinned. The Go-based tools are pinned in a mise.toml. checkov, which has no mise plugin, is pinned in a requirements file installed with pipx. The AWS CLI is pinned by a build argument. Three mechanisms, because the tools come from three kinds of source, but one image, and every pin wired to Renovate so a version bump arrives as a reviewable pull request. There’s exactly one answer to “what version of trivy does the toolchain use”, and it lives here.

Publishing with crane, not a second build

A build-pipeline detail that took a real bug to discover.

The pipeline builds the image with kaniko, which builds images without a privileged Docker daemon, something that matters a great deal on shared CI runners. Then it scans the image, then it publishes it.

The obvious way to write the publish stage is “build the image and push it”. But kaniko has no mode for “just push this tarball I already built”. A second kaniko invocation re-executes the entire Dockerfile from the top, including a second mise install, which makes a fresh round of calls to GitHub’s API to fetch tools. GitHub’s anonymous API limit is low and shared by IP, so on a CI runner that second install reliably trips a 403 rate-limit. (Yes, another 403. They do get everywhere.)

So the publish stage doesn’t rebuild. It uses crane to push the exact image tarball the build stage already produced. The image is built once. And because the published bytes are the same bytes the scan stage scanned, there’s no gap between “the image we checked” and “the image we shipped”.

Soft-failing the scanner on purpose

The decision that looks wrong until you see the reasoning: the pipeline scans the image with trivy, and trivy is allowed to fail without failing the pipeline.

A vulnerability scanner that doesn’t gate the build sounds like a scanner switched off. It isn’t. It’s a scanner pointed at something it can’t helpfully gate.

The tools in the image are prebuilt Go binaries. trivy inspects them, reads the version of the Go runtime each was compiled with, and reports every known CVE in that Go runtime. Those findings are real, but they aren’t mine to fix. The only fix is the upstream tool rebuilding itself against a patched Go. With seven such tools in the image, at any given moment one of them is usually a little behind on its Go version.

A hard gate would mean the image becomes unpublishable whenever any single upstream lags, over a CVE in code I don’t own and can’t patch. That’s not a security control; it’s a way to be unable to ship. So the scan is allow_failure. The findings stay fully visible, and the residual count is genuinely useful as a metric for how far behind upstream the toolchain has drifted. It just doesn’t block shipping an image whose only “vulnerabilities” are other people’s build timelines.

What it comes down to

The infrastructure repos all run the same CI gate jobs, needing the same tools, so infra-tools bakes the whole toolchain into one image and pins every version in one place, wired to Renovate.

Two build choices are worth copying. The publish stage uses crane to push the already-built, already-scanned tarball, because a second kaniko build would re-run mise install and hit GitHub’s anonymous rate limit, and because pushing the scanned bytes means shipping exactly what was checked. And the trivy scan is deliberately allow_failure, because it reports Go-runtime CVEs in prebuilt upstream binaries that no change to this repo can fix, so a hard gate would only make the image unshippable over someone else’s lag.

A 403 you can't fix in IAM

Thu, 14 May 2026 00:00:00 +0000

The OIDC post explained the handshake that lets a GitLab pipeline deploy to AWS with no stored key. This is the story of the first time I got it wrong, and spent an afternoon fixing the wrong thing. The error was a flat 403 from AWS, and the maddening part is that no amount of editing the IAM policy was ever going to fix it.

A 403 on the first real run

The OIDC post covered the handshake: GitLab CI mints a signed token, AWS exchanges it for short-lived credentials against a role whose trust policy names the pipeline. During the GitLab migration I wired exactly that up for the infra repo, including a trust policy condition meant to let merge-request pipelines run a plan.

The first merge request that should have triggered tofu-plan didn’t run it. The job failed, and the error from AWS was a flat AccessDenied. A 403.

The instinct, and why it wastes an afternoon

The instinct on an IAM 403 is immediate and almost always right: the policy’s wrong, so go and edit the policy. Tighten the condition. Loosen the condition. Check the wildcard. Re-read the sub pattern character by character.

All of that was wasted, and it was wasted for a reason that took me far too long to see. The trust policy wasn’t matching the wrong value. It was matching a value that does not exist. No amount of editing a condition makes it match a thing that’s never present.

What is actually in the token

GitLab’s OIDC token has a sub claim that encodes the pipeline’s context, and part of that encoding is a ref_type. I’d assumed ref_type could be branch, tag, or mr, because a pipeline can certainly be a branch pipeline, a tag pipeline, or a merge-request pipeline. So the trust policy, for the plan job, matched a sub containing ref_type:mr.

That assumption was wrong. GitLab’s ref_type is branch or tag. That’s the entire set. There is no mr.

A merge-request pipeline doesn’t run against a merge-request ref. It runs against the source branch. So its token’s sub carries ref_type:branch, like any other branch pipeline. The trust policy condition asked for ref_type:mr, GitLab never puts mr in a token, the condition was therefore never true, and every merge-request pipeline got a 403. Forever, until the policy stopped asking for a claim that isn’t real.

The fix, and the lesson worth more than the fix

The fix is small once it’s visible: match ref_type:branch and narrow it down by branch name or project path instead. An afternoon of policy edits, and the actual change is one word.

The lesson is the part worth keeping. When an OIDC trust fails, the useful question is never “is my policy clever enough”. It’s “what’s actually in the token”. An OIDC trust policy can only ever match the claims the identity provider genuinely asserts, and the gap between what a provider asserts and what you assumed it asserts is precisely where this class of bug lives.

So the move, when an OIDC handshake 403s, is to get hold of a real token and decode it. Look at the actual sub, the actual claims, the actual values. Match what’s there. A 403 that survives every sensible edit to the policy is usually not a policy that’s too loose or too strict. It’s a policy matching a claim that was never going to be in the token.

The habit it left behind

I wired an OIDC trust policy to let merge-request pipelines plan, by matching a sub claim with ref_type:mr. The first real merge request got a 403, and no edit to the policy fixed it, because GitLab’s ref_type is only ever branch or tag. A merge-request pipeline runs on a branch ref, so the mr value the policy demanded was never in any token.

The fix was one word. The habit it left behind is the valuable bit: when an OIDC trust fails, stop editing the policy and go and read a real token. A trust policy can only match what the provider actually asserts, and “what I assumed it asserts” is where the 403 was hiding the whole time. (If this shape of bug feels familiar by the end of the series, that’s not an accident: I come back to it with two more from exactly the same family.)

Pure-Rust Git, no git binary

Wed, 13 May 2026 00:00:00 +0000

go-tool-base’s VCS support has two halves that get confused for one. One half talks to forge APIs (GitHub, GitLab) for releases and pull requests. The other talks to the .git directory on disk: clone, history, diff, status. This post is mostly about the second half, and specifically about a question that turns out to have three answers in Rust, only one of which I’d recommend: how do you actually do Git from inside a program?

A VCS subsystem with two halves

go-tool-base has a VCS subsystem, and it does two distinct jobs.

The first is forge APIs. GitHub and GitLab, Enterprise and nested group paths included. It authenticates, lists releases, fetches release assets, manages pull requests. The self-update machinery sits on this half, and it’s what a tool uses to ask “what’s the latest release?”

The second is local Git. go-tool-base also carries a RepoLike object, an abstraction over an actual Git repository on disk: clone it, read its commit history, diff two trees, check its status. This half doesn’t talk to a hosting service at all. It talks to the .git directory.

It would be easy to assume the second half grew out of the first. It didn’t, and where it actually came from is the part worth telling.

A capability ahead of its consumer

The RepoLike object wasn’t built for go-tool-base. It came from another project, where it had already proved itself, and it was pulled into go-tool-base on purpose, with a specific future consumer in mind: the code generator.

The plan is for the generator to use Git directly. When it scaffolds a new tool, that tool should start life as a Git repository, with a git init and an initial commit. When you later regenerate, the generator should diff the regenerated template output against your working tree to detect drift, the same idea as respecting your edits. Both of those are local Git operations, not API calls, so the generator needs a repository abstraction to call into.

That wiring isn’t finished yet. The generator doesn’t drive RepoLike today. But the capability is in place, deliberately, ahead of the consumer that will use it, because the alternative is bolting Git support on later under deadline pressure, and that’s how you end up with the wrong abstraction.

So when rust-tool-base was built, a repository abstraction was never in question. The Rust port carries the same capability for the same reason: a Repo type with init, open, clone, walk, diff, blame, status, commit, fetch and checkout, present and ready for the generator to wire into. The open question was never whether to have it. It was how to do Git from inside a Rust program, and there are three answers, only one of which is any good.

Three ways to do Git, and the one worth picking

Shell out to git. Run the git binary as a subprocess and parse its output. It works until it doesn’t. The binary might not be installed. It might be a different version with different output. Its output is formatted for humans and changes between releases, so parsing it is a standing liability. You’ve made an undeclared dependency on a program you don’t ship.

Link libgit2. libgit2 is the C library that reimplements Git as something you can call from code, and git2 is the Rust binding to it. It’s solid and widely used. But it’s a C dependency, which means a C toolchain in the build, and it’s consistently the single biggest source of cross-compilation pain in the Rust Git ecosystem. The musl builds, the Windows builds, the static linking: libgit2 is where they tend to break.

Use gix. gix is a reimplementation of Git in pure Rust. No C library, no subprocess. It’s just Rust code, and it compiles and cross-compiles like any other crate, because that’s all it is. It’s also generally faster, and being pure Rust it fits the no-unsafe-in-first-party-code story far more comfortably than dragging a C library along.

rtb-vcs is gix-first. The Repo type is built on it. There’s no git binary dependency, and there’s no libgit2 in a default build.

gix is still maturing, and a few write paths, push in particular, aren’t ready in it yet. For those, git2 (the libgit2 binding) is held in reserve as a documented fallback, to be wired behind an opt-in Cargo feature if and when a write path actually needs it. Until then a default build carries no libgit2 at all, and the common case, a tool that clones, reads history, diffs and commits, never pays its cross-compile cost. (The gix backend itself sits behind an opt-in git feature, which is exactly the feature-flag story from a couple of weeks back, doing real work.)

`Repo` is a foundation, not a façade

One design decision is worth calling out, because it came straight from a go-tool-base lesson.

It would have been easy to build Repo as a narrow façade exposing exactly what the scaffolder and the release-notes feature need today, and nothing else. That was rejected on purpose. go-tool-base’s RepoLike is itself the cautionary tale: it arrived from another project, settled into a sensible abstraction, and is already lined up to carry a consumer, the generator, that wasn’t driving its design when it was first written. A repository abstraction gets used by code that doesn’t exist yet. Build one as a narrow façade around today’s needs and you’ve guaranteed a rewrite the first time a downstream tool wants something slightly different.

So rtb-vcs’s Repo is built as a foundation: a sensible, reasonably complete vocabulary of Git operations that a tool author can compose richer behaviour on, without re-importing gix directly and re-deriving the framework’s auth and concurrency conventions. The errors back this up. gix’s error types aren’t leaked through the public API; they’re wrapped in semantic RepoError variants, so the backend could be swapped, gix to git2, or to something else entirely, without breaking a single downstream caller.

Stepping back

go-tool-base’s VCS support has two halves: forge-API calls for releases and pull requests, and a RepoLike object for local Git operations. The repo half arrived from another project and is wired in ahead of its intended consumer, the code generator, which will use it to initialise repositories for scaffolded tools and to diff regenerated output for drift.

rust-tool-base carries the same capability on purpose. Its Repo type is built on gix, a pure-Rust Git implementation, so there’s no dependency on an installed git binary and no libgit2 C library in a default build, which keeps cross-compilation clean. git2 stays an opt-in fallback for the few write paths gix can’t do yet. And Repo is built as a foundation for downstream tools, with the backend wrapped behind its own error type so it can be replaced without breaking callers.

Routing security findings without the noise

Tue, 12 May 2026 00:00:00 +0000

Turning on GuardDuty and Security Hub gives you threat detection. It also gives you a firehose. And an alert system that dutifully forwards everything in that firehose isn’t monitoring, it’s a very efficient way of training your team to ignore alerts. So the alerts module’s real job isn’t detection at all. It’s deciding what’s actually worth interrupting a human for, and the interesting part is everything it deliberately throws away.

Detection is the easy half

Switching on threat detection in an AWS account is a few resources. GuardDuty, Security Hub with its standards, IAM Access Analyzer: the security baseline does exactly that. From then on, the account is generating findings.

And it generates a lot of them. Plenty are low-severity, informational, or simply the normal texture of a cloud account. If you wire every finding to an email or a pager, you haven’t built monitoring. You’ve built noise. And noise has a specific failure mode: people stop reading it, and the one finding that genuinely mattered scrolls past unread alongside two hundred that didn’t.

So the valuable work isn’t detection. It’s routing: deciding what’s worth interrupting a human for, and letting the rest sit quietly in a console for whenever someone reviews it.

Forward the severe, leave the rest

The alerts module routes findings with EventBridge rules into an SNS topic that emails out. The rules are deliberately picky. GuardDuty findings are forwarded only at severity 7 and above. Security Hub findings are forwarded only at HIGH and CRITICAL.

Everything below those thresholds isn’t discarded. It’s still in GuardDuty and Security Hub, where someone doing a review will see it. It just doesn’t get to interrupt anyone’s day. The threshold is the line between “look at this now” and “look at this sometime”.

The duplicate you would otherwise send twice

Here’s the subtle one, and it’s the kind of thing you only find by looking closely at where findings come from.

Security Hub is an aggregator. It pulls findings in from other services, GuardDuty among them. So a single GuardDuty finding can show up in two places: in GuardDuty itself, and again in Security Hub as an aggregated copy.

A rule on GuardDuty findings and a rule on Security Hub HIGH/CRITICAL findings would therefore both fire for the same underlying GuardDuty finding. One event, two emails. Do that across an account and a meaningful fraction of your alert volume is just the same findings counted twice, which is its own kind of noise.

So the Security Hub rule explicitly excludes findings whose ProductName is GuardDuty, with an anything-but match. GuardDuty findings come through the GuardDuty rule. The Security Hub rule handles everything Security Hub adds that GuardDuty didn’t already report. One finding, one alert, regardless of how many services it passed through.

Two tripwires on the root account

Findings are about threats the detectors recognise. The module adds two alarms about something simpler: the root account doing anything at all.

One CloudWatch alarm fires on a root console sign-in. The other fires on any root API call that isn’t a console login. In a well-run AWS account, the root user does almost nothing after initial setup: day-to-day work happens through roles. So root activity isn’t a “finding” to be assessed for severity. It’s a tripwire. Any of it, in an account that should be silent, is worth an immediate look, and the two alarms say so directly.

Why a quiet alert stream matters here

This is monitoring for the account that’s going to hold the release-signing key, and that raises the stakes on getting the routing right.

If a key-bearing account ever does come under attack, the alert that says so has to be seen. An alert stream that’s mostly noise and duplicates is, functionally, no alerting at all, because the people who’d act on it have long since tuned it out. Routing the stream down to “severe, deduplicated, plus root tripwires” is what keeps it something a human will still read on the day it finally matters.

The short version

GuardDuty and Security Hub make detection easy. The hard, valuable part is routing: forwarding what deserves to interrupt someone and leaving the rest in a console.

The alerts module forwards GuardDuty at severity 7-plus and Security Hub at HIGH/CRITICAL, and it drops the duplicate that aggregation creates by excluding GuardDuty-sourced findings from the Security Hub rule, so one finding is one alert. Two CloudWatch alarms act as tripwires on root-account activity, which should be near-zero. For the account that will hold the signing key, a quiet, trustworthy alert stream isn’t a nicety. It’s the difference between monitoring and theatre.

Why go-tool-base left GitHub for GitLab

Mon, 11 May 2026 00:00:00 +0000

A botched version bump made me stop and actually look at where go-tool-base lived, and I didn’t much like what I saw. GitHub had spent months quietly falling over, and when Mitchell Hashimoto (GitHub user #1299, no less) publicly walked Ghostty off the platform, it stopped feeling like just my problem. I’ve been a GitLab fan for years, so the move was less a leap and more an overdue nudge. This is the why, not the how.

It started with a wrong number

Every migration has a trigger, and mine was embarrassingly small. A commit landed on main carrying a BREAKING CHANGE: footer it didn’t really deserve. Semantic-release did exactly what it’s told to do with that footer: it cut a major version. go-tool-base lurched from the v1 line straight to v2.0.0, and a chain of things that keyed off the version went sideways with it.

It was fixable. It wasn’t a disaster. But it was the kind of small, stupid breakage that makes you stop and actually look at your setup instead of just patching it and moving on. And when I looked, the version bump wasn’t the thing that bothered me. It was everything around it.

The platform had been quietly failing

I’d been losing time to GitHub for months. Not dramatically. No single outage you’d write home about, just a steady drip of Actions queues that wouldn’t drain, pull requests that wouldn’t merge, the occasional morning where the thing simply wasn’t there. You absorb it. You re-run the job. You make a coffee and try again. You tell yourself it’s a blip.

The trouble with a steady drip is that you stop counting it. It becomes weather.

The canary left the mine

Then, in late April, Mitchell Hashimoto (co-founder of HashiCorp, creator of Vagrant, Terraform and the Ghostty terminal) published Ghostty Is Leaving GitHub, and The Register picked it up a day later under the headline “GitHub ’no longer a place for serious work’”.

This is not a man with a casual relationship to GitHub. He’s, by his own account, user #1299, joined February 2008. He called it “the place that has made me the most happy”. And he still wrote this:

This is no longer a place for serious work if it just blocks you out for hours per day, every day.

The detail that landed hardest for me wasn’t a quote, it was a habit. He’d kept a journal for a month, marking an “X” on every day a GitHub outage had cost him working time. Almost every day had an X. Reading that, I realised I’d been having the same month. I’d just never been disciplined enough to write it down. He’d turned my vague “it’s been flaky lately” into a row of crosses on a calendar.

I want to ship software and it doesn’t want me to ship software.

When the person who’s been on the platform for eighteen years and loves it says that out loud, it stops being your private grumble. It’s the canary, and the canary has stopped singing.

Why GitLab, and not just “somewhere else”

Being annoyed at GitHub is a reason to leave. It is not, on its own, a reason to pick a destination. The destination has to be a positive choice.

For me GitLab was an easy one, because I’ve been a fan for years. Long enough, in fact, to have also been a reliable grumbler about their pricing tiers, which is how you know it’s a real relationship and not a honeymoon. What I’ve always rated is the model: GitLab treats source hosting, CI/CD, the package registry, releases and Pages as one integrated product, not a marketplace of bolted-on parts you assemble yourself.

That integration is the actual prize. On the old setup, “CI” meant a folder of separate GitHub Actions workflow files, each pinned, each its own little world. On GitLab it’s a single .gitlab-ci.yml pipeline with proper stages (lint, test, security, docs, release) and the release stage talks to the built-in package registry and Pages without me wiring up a single external credential. The CI job that builds the project can authenticate to the things the project needs because they’re the same platform.

There’s a second-order benefit too. A migration is a rare licence to fix things you’d never otherwise touch. Moving gave me the cover to reset go-tool-base’s versioning cleanly (back to a sensible v0.x line, the accidental v2.0.0 left behind as a cautionary tale) and to move the module path to its new home in one deliberate change rather than a thousand apologetic ones.

What I’m not going to claim

I’m not going to tell you GitHub is finished, or that GitLab never has a bad day, because it does, everyone does. This isn’t a teardown. GitHub gave go-tool-base a perfectly good home for its first year, and the archived mirror is still sitting there, read-only, pointing anyone who finds it at the new place.

What changed is simpler than a grand verdict. The friction crossed a line, someone I respect said the quiet part loudly enough that I couldn’t keep filing it under “weather”, and the place I’d have moved to anyway was sitting right there with a better model. Sometimes the prudent move and the move you secretly wanted turn out to be the same move, and you just need a wrong version number to give you permission.

Boiling it down

go-tool-base moved from GitHub to GitLab in May 2026. The proximate cause was a self-inflicted version-bump mess; the real cause was months of GitHub unreliability that I’d stopped consciously noticing until Mitchell Hashimoto’s very public departure named it for me. GitLab was a positive pick, not just an escape hatch: its integrated CI/CD, registry, releases and Pages are one product rather than a kit, and that integration is genuinely worth having. The migration also bought a clean versioning restart as a bonus.

If you’ve been absorbing a steady drip of friction and telling yourself it’s normal: try the calendar trick. Mark the X’s for a month. The page will tell you something you already half-know.

Why I hand-rolled every module

Sun, 10 May 2026 00:00:00 +0000

There are well-known community module libraries for AWS: Cloud Posse, the terraform-aws-modules collection, plenty more. Both terraform-aws-bootstrap and terraform-aws-security-baseline use almost none of them. Every sub-module is hand-rolled from raw AWS resources, and before you accuse me of not-invented-here syndrome (a perfectly fair first guess), hear me out, because the same evaluation kept landing the same way for a real reason.

The promise of a wrapper module

The community module ecosystem makes an appealing offer. Don’t write raw aws_s3_bucket and aws_s3_bucket_policy and aws_s3_bucket_public_access_block and the rest. Call a tested, popular module, pass it a handful of inputs, and get a correct, well-configured bucket. Less code in your repo, and the code you don’t write has been exercised by thousands of other users.

For a lot of infrastructure that’s a genuinely good deal, and I take it often. For the two infrastructure modules in this series, I took it almost never. Every sub-module is built from raw AWS resources. That wasn’t a reflex. It was the same evaluation, made over and over, landing the same way.

What kept going wrong

For each place a wrapper module could have fitted, I looked at the wrapper. And the recurring finding was one of two things. Either using the wrapper correctly, with all the overrides my posture needed, came to more configuration than the raw resources would have. Or the wrapper’s abstraction leaked the instant I needed something it hadn’t anticipated, and I was now writing code to fight it.

The CloudTrail bucket, concretely

The clearest example is the bucket that holds CloudTrail logs.

There are popular modules that set up CloudTrail and bundle an S3 bucket for the logs. Convenient. But that bundled bucket isn’t the bucket I want. It doesn’t carry lifecycle { prevent_destroy = true }, and its bucket policy is weaker than the one the state bucket taught me to want: TLS-only, SSE-KMS-only, wrong-key-denied.

So to use the wrapper I had two options. Accept a weaker audit-log bucket than the rest of the account, which rather defeats the point of an audit log. Or fight the wrapper: disable its bucket, create my own, wire it back in. Fighting the wrapper is more work than simply writing the fifty-odd lines of raw aws_s3_bucket plus policy that give me exactly the posture I’d already designed once. The wrapper didn’t save code. It added a negotiation.

A wrapper is a deal, and deals have terms

This isn’t an argument that community modules are bad. It’s an argument about when the deal is good.

A wrapper module is a good deal while its abstraction holds: while what it assumes you want matches what you want. The moment you need something it didn’t anticipate, the deal inverts. Now you’re working against the abstraction, and an abstraction you’re fighting costs more than no abstraction at all. (Regular readers will recognise that line from the LangChain argument; it’s the same principle in a very different language.)

Infrastructure that holds signing keys is precisely the case where you need to control the specifics: every encryption setting, every lifecycle rule, every line of every bucket policy. That’s a domain where wrapper abstractions leak fast, because the whole job is the details the wrapper smoothed over.

The cost, paid on purpose

Hand-rolling isn’t free. It’s more lines of HCL in the repo, up front, than a one-line module call.

What those lines buy is worth the price for this kind of infrastructure. There’s no transitive module-version churn to track. There’s no abstraction between me and the resource when something behaves oddly. And every line is one I can read, and defend, in a security review, because I wrote it and it says exactly what it does. For a foundation that will hold the most sensitive key in the system, “readable and mine” beats “short and someone else’s”.

That’s a deliberate trade, not a universal rule. For an internal tool on a deadline, reach for the wrapper. For the security-critical base of everything else, the raw resources won every time I checked.

To sum up

The community module ecosystem offers less code that more people have tested, and for plenty of infrastructure that’s the right call. For terraform-aws-bootstrap and terraform-aws-security-baseline it almost never was, because each wrapper turned out to be more configuration than the raw resources once my posture was accounted for, or it leaked the moment I needed a specific.

The CloudTrail log bucket is the pattern in miniature: the bundled bucket lacked prevent_destroy and a strong policy, so using the wrapper meant either a weaker bucket or fighting the module. A wrapper is a good deal while its abstraction holds and a bad one the moment you fight it, and security-critical foundation infrastructure is all specifics. Hand-rolling cost more lines and bought code I can read and defend. For this, that was the trade worth making.

Hardening the account that will hold the keys

Sat, 09 May 2026 00:00:00 +0000

Bootstrapping the account got it ready: somewhere to store state, an identity to deploy as, enough for the next tofu apply to run. Ready is not the same as safe. An account with no audit trail, nothing watching it, and no considered way for a human to get in is fine for experimenting and absolutely not where you put the most sensitive key in the system. So before the signing key goes anywhere near it, the account gets a security baseline.

Ready is not the same as safe

The bootstrap post ended with an account that was ready: it had somewhere to store state and a CI identity to deploy as. The next tofu apply could run.

Ready is not safe. That account still has no audit trail, so nobody could tell you afterwards what happened in it. It has no threat detection, so nothing is watching. Its defaults are AWS’s defaults, which are not a security posture. There’s no considered way for a human to get in. An account in that condition is fine for experimenting. It’s not somewhere you put the most sensitive key in the whole system.

So before the signing key is anywhere near it, the account gets a security baseline.

The baseline, in one downstream stack

terraform-aws-security-baseline is that baseline, and it’s exactly the downstream stack the bootstrap post promised: applied through the automation role bootstrap created, not bootstrapped specially.

It’s six sub-modules, each behind an enable_* toggle: account-hardening (IAM password policy, account-wide S3 public-access blocking, default EBS encryption), audit-logging (a multi-region CloudTrail with log-file validation), aws-config, threat-detection (GuardDuty, Security Hub, IAM Access Analyzer), alerts, and operator-role. Together they turn a bare account into one that records what happens, watches for trouble, and controls who gets in.

Most of those are the expected baseline. The operator role is the one worth slowing down on, because it’s built backwards from how people usually think about an admin role.

The operator role, and the inversion

InfraAdmin is the human way into the account: the role a person assumes to do operator work. Two things define it.

The trust policy decides who may assume it. It trusts only the account root principal, and it requires multi-factor authentication: the assume call must carry aws:MultiFactorAuthPresent, and aws:MultiFactorAuthAge bounds how recently that MFA was performed. No MFA, no role. So far this is a careful but ordinary admin role.

The inversion is a second, separate inline policy, and it’s almost entirely Deny. It denies, using NotAction, anything where aws:RequestedRegion falls outside an allowed set of regions. The role’s power comes from an admin grant. This inline policy fences that power.

That’s the part worth holding onto. People picture an admin role as a list of what it can do. This one is better understood by what it cannot: it cannot act outside its permitted regions, full stop. A fat-fingered command, or a compromised session, cannot quietly spin resources up in some region nobody’s watching. The fence is as much the point of the role as the grant is.

The carve-out, because honesty

There’s a fiddly detail, and it’s the kind of thing that makes the region fence real rather than theoretical.

Some AWS services are global. IAM, CloudFront, Route 53 and friends have no region, and they don’t honour aws:RequestedRegion. A naive region-deny would therefore deny calls to IAM, and you’d lock yourself out of the very service you manage access with. (A close cousin of the kind of self-inflicted lockout I’ll come back to in a later post.)

So the Deny carries explicit carve-outs for the global services. It isn’t elegant, and it can’t be: the global-versus-regional split is just a fact of AWS, and a correct region fence has to account for it. The carve-out list is the real cost of the control working.

Harden the room, then move the keys in

There’s an order to all of this, and the order is the argument.

The account that will hold the signing key has to be audited before the key arrives, so that from day one every call against it is in CloudTrail. It has to be watched before the key arrives, so GuardDuty is already looking. It has to be access-controlled before the key arrives, so the only human path in is MFA-gated and region-fenced.

You don’t move something valuable into a room and then think about locks. You build the room, fit the locks, check they work, and then move the valuable thing in. The security baseline is fitting the locks. The signing key comes later, into a room already built for it.

Worth remembering

Bootstrapping an account makes it ready for the next deploy. It does not make it safe to hold anything that matters. terraform-aws-security-baseline is the downstream stack that closes that gap: audit logging, AWS Config, threat detection, account hardening, and an operator role, applied through the CI role bootstrap created.

The operator role is the piece to study. It’s MFA-gated on the way in, and then fenced by a separate, almost-all-Deny inline policy that confines it to permitted regions, with carve-outs for the global services that have no region. An admin role defined as much by its fence as its grant. Harden the room first; the keys move in afterwards.

No access keys in CI

Fri, 08 May 2026 00:00:00 +0000

A long-lived AWS access key, sitting in a CI system, is just about the single credential I’d most like to be rid of. It’s powerful, it never expires unless someone remembers to rotate it (nobody remembers to rotate it), and it lives in one of the most attractive targets in the whole supply chain. For infrastructure that’s eventually going to hold a release-signing key, it’s exactly the wrong place to start. So the phpboyscout infrastructure has no AWS access key in CI at all. None.

The access key you don’t want

A CI pipeline that runs tofu apply against AWS needs AWS credentials. The traditional way to give it some is an IAM user with an access key pair, pasted into the CI system as a masked variable.

Look at what that key is. It’s long-lived: it works until someone remembers to rotate it, and rotating it is a chore, so mostly nobody does. It’s powerful: it can apply infrastructure, so it can do nearly anything. And it’s sitting in a CI system, which is one of the most attractive targets in your whole supply chain. You’ve taken your highest-value credential and stored a permanent copy of it in a place built for running automated jobs.

For infrastructure that’s going to hold a release-signing key, that’s precisely the wrong starting point. So the phpboyscout infrastructure has no AWS access key in CI at all. Not a well-guarded one. None.

Federation instead of a stored secret

The replacement is OIDC federation, and the shape of it is worth walking through, because it’s genuinely different from “a secret, but better”.

A modern CI platform can mint an OIDC token. GitLab does this with an id_tokens: block: at job time, GitLab issues a short-lived JSON Web Token, signed by GitLab, that asserts a set of facts. This is project X. This is pipeline Y. This is running on ref Z, of this type.

AWS can consume that. The sts:AssumeRoleWithWebIdentity call takes such a token and, if it satisfies an IAM role’s trust policy, returns short-lived AWS credentials for that role. The trust policy is where the control lives: it names GitLab as a trusted token issuer, and it constrains the token’s sub claim so that only the specific project, and the specific refs, you intend can assume the role.

Put it together: the pipeline asks GitLab for a token, hands it to AWS, and gets back credentials that last about an hour and are scoped to one role. Nothing long-lived is stored anywhere. The credential exists only for the job that needs it, and it can’t be stolen from a CI variable store, because it was never in one.

Two halves of one handshake

That handshake is built by two of the repos in this series, each owning one side.

terraform-aws-bootstrap builds the AWS half, in its automation-iam module: it registers GitLab as an OIDC identity provider in the account, and it creates the automation role with the trust policy that decides which pipelines may assume it.

The CI components build the consuming half: the id_tokens: block that asks GitLab for the JWT, and then simply letting the AWS provider’s own credential chain perform the exchange. The pipeline doesn’t call sts by hand. It presents the token; the SDK does the rest.

The gotcha: don’t set a profile

There’s one quiet way to break this, and a stack can look completely correct while doing it.

The AWS SDK finds credentials by walking a chain of sources in order. The web-identity path, the one that uses the OIDC token, is one link in that chain. It triggers off environment variables the CI sets up automatically.

But if the aws provider block has a hardcoded profile = "...", the SDK takes the profile link of the chain instead, and never reaches the web-identity link. A profile line is the sort of thing that ends up in a provider block from someone’s local development setup, where it’s exactly right. Committed and run in CI, it silently short-circuits the federation. The pipeline either fails to find credentials, or finds the wrong ones.

The rule is simple once you know it: the provider block that runs in CI must not name a profile. Leave the chain free to find the web identity. It’s the kind of bug that teaches you to be precise about which link of the credential chain you’re actually relying on.

The bottom line

Giving CI an AWS access key means storing your most powerful, longest-lived credential in one of your most exposed systems. OIDC federation removes it entirely. The CI platform mints a short-lived signed token, AWS exchanges it via AssumeRoleWithWebIdentity for hour-long credentials against a role whose trust policy names the exact pipeline, and nothing permanent is stored.

terraform-aws-bootstrap builds the AWS side, the identity provider and the trust policy; the CI components build the consuming side, the token request. The one trap is a hardcoded profile in the provider block, which short-circuits the SDK’s credential chain before it reaches the web-identity path. Get that right, and a pipeline deploys to AWS as a verifiable, short-lived identity, with no key to steal.

Two layers of tags, and which one wins

Fri, 08 May 2026 00:00:00 +0000

Tagging cloud resources is one of those jobs that’s trivial to do badly and surprisingly fiddly to do well. Everyone agrees resources should be tagged. The argument nobody quite has out loud is where the tags should come from, and getting that wrong gives you either a giant copy-pasted tag block on every resource, or a set of tags that quietly disagree with each other across the account.

Tags answer two different questions

If you look at what tags are actually for, they split cleanly into two kinds, and the split is the whole point.

Some tags are true of every single resource in the account, identically. The environment it belongs to. The fact that OpenTofu manages it. The project or owner it rolls up to for cost reporting. These are invariants: a resource that didn’t carry them would be the bug.

Other tags are specific to a particular piece of infrastructure. Which component this resource belongs to, what subsystem it’s part of. The CloudTrail bucket is part of audit logging; the Config recorder is part of aws-config. That’s a fact about the module, not about the account.

Treat those two kinds the same and you end up repeating the invariants by hand on every resource, which is exactly the copy-paste that drifts. So the infra setup gives each kind its own home.

Layer one: declared once, on the provider

The invariants live on the AWS provider itself, as default_tags, set one time in the provider block:

provider "aws" {
 # ...
 default_tags {
 tags = {
 # Environment, project, managed-by: the things true of
 # every resource in this account.
 }
 }
}

default_tags applies those tags to every taggable resource the provider creates, automatically, without a single resource having to mention them. Change the environment label once, here, and it propagates to everything on the next apply. No resource carries a copy; they all inherit the originals. The invariants are stated exactly once, in the one place that’s true for all of them.

Layer two: merged in by the module

The resource-specific tags live where the resource does: inside the module. Each module merges its own component tag over whatever tags it was handed, which you can see in the public terraform-aws-security-baseline modules:

tags = merge({ Component = "aws-config" }, var.tags)

So the aws-config module stamps Component = "aws-config" onto the things it builds, the account-hardening module stamps its own, and so on. The caller can pass extra tags down through var.tags, and because they come last in the merge, the caller can override the module’s defaults when it genuinely needs to. Module-specific knowledge stays in the module; per-call adjustments stay with the caller.

Which layer wins

Now the question that actually bites: a resource is getting tags from the provider’s default_tags and from the module’s merge. What happens when both set the same key?

The resource-level tags win. AWS’s provider treats tags set directly on a resource as an override of default_tags on a key collision, so the module’s merged tags take precedence over the account-wide defaults. That’s the right way round: the invariants are a sensible baseline, and a module that has a specific reason to set a key differently can, without having to reach up and edit the provider block that everything else depends on. Most of the time the two layers are simply disjoint, the invariants saying what account this is and the module tags saying what this resource is for, and they never collide at all. When they do, local intent beats the global default, which is the precedence you’d want.

Why bother splitting it

The payoff is that neither layer has to know about the other. The provider declares the invariants once and never thinks about components. Each module declares its component and never hard-codes the environment. Add a new module and it inherits every account-wide tag for free, while contributing its own. Change an account-wide tag and you touch one block, not two hundred resources. The tags stay consistent not because someone’s policing them, but because the place each tag is declared is the one place it can be declared.

The short version

Resource tags answer two questions, and they want two homes. Account-wide invariants (environment, ownership, managed-by) go on the provider’s default_tags, declared once and inherited by everything. Resource-specific tags go in the module, via merge({ Component = "..." }, var.tags), so each module owns its own labels and the caller can still override. On a key conflict the resource-level tag wins, which means the module’s intent beats the account default exactly when it should. Two layers, each declared in the one place it belongs, and no copy-pasted tag block anywhere in sight.

clap's global flag, except in a passthrough subtree

Thu, 07 May 2026 00:00:00 +0000

--output json worked everywhere. On the top-level command, on every ordinary subcommand, wherever the user fancied putting it. Then it stopped working in exactly one place, and of course it was the subcommand I’d been clever about.

How the global flag is meant to work

clap has a lovely feature for this. Define --output text|json once at the top, mark it global = true, and it’s reachable from every subcommand: mytool --output json widget and mytool widget --output json land the same. You stop thinking about it.

The one place it goes missing

One subcommand, credentials, is a passthrough: it sets subcommand_passthrough = true, which makes clap capture everything after the subcommand name as trailing_var_arg and hand it on, the way cargo run -- ... passes the trailing args to your program rather than to cargo. The handler then re-parses those captured tokens against its own clap definition.

The trouble is that the captured tokens include --output. clap’s global = true propagation doesn’t reach a passthrough subtree, because the post-name tokens are taken as trailing_var_arg before the outer parser ever sees them. So in this one subtree the global flag isn’t applied, and worse, when the inner parser re-parses the captured args it meets --output, which it doesn’t define, and rejects it as unknown. The code says so where it matters, in crates/rtb-cli/src/credentials.rs:

// clap's outer `global = true` propagation works for normal
// subcommands, but `subcommand_passthrough = true` captures
// post-name tokens as `trailing_var_arg`, so the global
// never reaches the outer parser for this subtree.
args = strip_global_output(args);

Parse it yourself, then strip it

The fix is two moves. First, parse --output out of the raw args by hand (there’s an OutputMode::from_args_os for exactly that), so the output mode is still honoured. Then strip --output out of the args before the inner parser runs, so the inner clap doesn’t choke on a flag it doesn’t define. strip_global_output is the second move, from crates/rtb-cli/src/render.rs:

if s.starts_with("--output=") {
 continue; // inline form: drop just this token
}
if s == "--output" {
 iter.next(); // space-separated form: drop the token and its value
 continue;
}

It handles both --output=json and --output json, and it’s idempotent, so it’s safe to call whether or not the flag is actually present.

The takeaway

global = true and trailing_var_arg are both “grab the args” features, and in a passthrough subcommand they reach for the same tokens. clap won’t arbitrate that overlap, and shouldn’t try to guess. So you arbitrate: parse the global out of the raw args yourself, strip it before you re-parse the rest, and the flag that “works everywhere” actually does.

Secrets that scrub themselves from RAM

Wed, 06 May 2026 00:00:00 +0000

A while ago I worked out where a CLI should keep your API key: env var, OS keychain, or, grudgingly, a literal in the config file. That answers where the secret lives. It says nothing about what happens to it once it’s loaded and sitting in your process memory, which is the half where secrets actually tend to leak. Rust, it turns out, can do something about that half that Go simply can’t.

What go-tool-base already settled

A while back I wrote about where a CLI should keep your API keys. The answer go-tool-base settled on was three storage modes, in a fixed precedence: an environment variable reference (the recommended default), the OS keychain (opt-in), or a literal value in the config file (legacy, and refused outright when CI=true).

rust-tool-base keeps that design unchanged. Same three modes, same precedence, same refusal of literal secrets in CI. A tool embeds a CredentialRef in its typed config, and a Resolver walks env, then keychain, then literal, then a well-known fallback variable, first hit wins. That part is a straight carry-over, because where to keep the secret was design, and design survives the port.

But storage is only half the life of a secret. The other half is what happens to it once it’s resolved and sitting in your process memory. That’s where Rust can do something Go can’t, and rust-tool-base takes the opening.

The two ways a secret leaks after you’ve loaded it

You’ve resolved the API key. It’s a value in memory now. Two very ordinary things can leak it from there, and neither involves your storage being wrong.

The first is the log line. Somewhere a developer writes a debug print of a config struct, or an error includes the struct that holds the key, or a panic dumps it. The secret is a string like any other string, so it renders like any other string, straight into a log aggregator that a lot of people can read.

The second is the leftover bytes. The key sat in a heap allocation. The variable goes out of scope, the allocation is freed, and on most runtimes “freed” just means “returned to the allocator”. The bytes are still there until something else writes over them. A core dump taken in that window contains your key. So does the next allocation that happens to land on that memory and gets logged before it’s overwritten.

A Go string can’t really defend against either. Go strings are immutable, so you can’t zero one in place; the runtime copies them freely, so you can’t even track every copy; and there’s no compile-time barrier stopping anyone printing one. You can be disciplined, but discipline is all you’ve got.

`SecretString` closes both

rust-tool-base routes every secret through secrecy::SecretString, and the crate is explicit that taking a plain &str or String for a secret is a type error, not a style preference.

For the log line, SecretString has its own Debug implementation, and it prints [REDACTED]. Always. A config struct holding a SecretString can be debug-printed, put in an error, caught in a panic, and the secret field shows up as [REDACTED] every single time. You don’t have to remember not to log it. The type already won’t.

For the leftover bytes, SecretString zeroes its memory when it’s dropped. When the value goes out of scope, before the allocation is handed back, the bytes are overwritten. The window where a freed allocation still holds your key is closed. A core dump taken afterwards finds zeroes.

There’s a third leak SecretString blocks that’s easy to miss. It deliberately doesn’t implement Serialize. You cannot serialise a SecretString. That sounds like an inconvenience until you see what it prevents: a tool that loads config, changes one setting, and writes the whole struct back would, with an ordinary string, faithfully write the resolved secret to disk in plain text. Because SecretString can’t be serialised, CredentialRef can’t be either, and that accident is structurally impossible. Writing a secret back is a deliberate, separate path, never a side effect of saving config.

When code genuinely needs the raw value, to drop it into an Authorization header, it calls expose_secret(). The name is the point. Getting at the plaintext is one explicit, greppable, reviewable call, and everywhere else the secret stays wrapped.

Discipline versus the type system

The plain framing is this. None of these leaks are exotic. Logging a struct, a core dump after a free, re-saving a config file: they’re all routine, and they’re all how real credentials end up somewhere they shouldn’t.

go-tool-base’s storage design is good, and rust-tool-base kept it. But in Go, not leaking the secret once it’s in memory comes down to every developer being careful every time. In Rust, SecretString makes the type system carry it. The redaction, the zeroing, the un-serialisability aren’t things you remember to do. They’re things the secret does to itself because of what it is. That’s the part Go structurally can’t match, and it’s why the port didn’t just copy the storage modes across, it tightened the handling underneath them.

The gist

go-tool-base settled where a CLI keeps a secret: env var, keychain, or literal, in a fixed precedence. rust-tool-base keeps that design and hardens what happens once the secret is loaded.

Every secret is a secrecy::SecretString. It debug-prints as [REDACTED], so it can’t fall into a log by accident. Its memory is zeroed on drop, so it doesn’t survive in freed heap. It isn’t serialisable, so it can’t be written back to config by a blanket save. Getting the plaintext is one explicit expose_secret() call. Go can only ask developers to be careful with a secret in memory; Rust lets the type be careful for them.

The chicken-and-egg of remote state

Wed, 06 May 2026 00:00:00 +0000

Here’s a puzzle that every infrastructure-as-code setup hits exactly once, right at the very beginning, and then never again. An OpenTofu stack stores its state in a backend. The bootstrap stack I wrote about last time has a particular job, and part of that job is to create the backend that remote state lives in. So where does the bootstrap stack store its own state, on the very first run, before it’s built the place state is supposed to go?

Where does the state of the thing that makes the state store live?

That’s the puzzle, and it’s a real ordering deadlock rather than a riddle.

An OpenTofu stack keeps a state file, and for anything shared that state file lives in a remote backend: on AWS, an S3 bucket. Fine. But the bootstrap stack has a particular job, and part of that job is to create the S3 bucket that remote state lives in.

So walk through the first run. Bootstrap has never been applied. The state bucket doesn’t exist, because creating it is what bootstrap is for. Bootstrap needs somewhere to store its own state. The only place that would make sense is the bucket it’s about to create, which isn’t there yet. The thing that builds the state store can’t store its state in the state store.

Run local, then migrate

The way out is a two-step that OpenTofu supports directly.

Bootstrap starts configured with a local backend: backend "local" {}. State is just a file on the operator’s machine. With that in place, the first tofu apply runs. It creates the S3 bucket and the KMS key, and records all of it in the local state file.

Now the bucket exists. So the backend configuration is rewritten to point at it: an s3 backend block naming the new bucket. Then tofu init -migrate-state. OpenTofu sees the backend has changed, picks up the local state file, and copies it into the S3 bucket. From that point on, bootstrap’s own state lives in the bucket that bootstrap created. The egg has laid the chicken.

The local backend was a scaffold. It existed for exactly one apply, to break the ordering deadlock, and then the state moved off it and it was never used again.

It happened twice

The infra repo actually did this migration twice, and the second time is the proof that the pattern is general rather than a one-off trick.

The first migration was the one above: local to S3, at the very start. The second came later, during the move from GitHub to GitLab. GitLab offers a managed HTTP state backend, and infra chose to use it. So the backend block was rewritten again, this time from s3 to http, and tofu init -migrate-state ran again, copying the state from the S3 bucket to GitLab’s backend.

The same move, twice, against three different backends. That’s the useful lesson hiding in the chicken-and-egg story. State is portable. The backend is just where you currently keep it, not a property of the stack itself, and moving it is a routine, supported operation rather than surgery.

Why this is the honest answer, not a hack

It’s easy to look at “apply once with a local backend, then migrate” and feel it’s a bit of a smell, a workaround for something that should have been cleaner.

It isn’t. It’s the honest answer to a real ordering problem, and the alternatives are worse.

The obvious alternative is to create the state bucket by hand, in the console, before running bootstrap at all. But then the most important bucket in the account is unmanaged. It exists outside every OpenTofu graph, nobody’s code describes it, its encryption and policy and prevent_destroy are whatever someone clicked that day, and it drifts. The local-then-migrate dance avoids exactly that. The bucket is created by bootstrap, described in code, and tracked in bootstrap’s own state from its very first apply. It’s managed from birth.

The chicken-and-egg isn’t a flaw to be embarrassed about. It’s just the shape of the problem when a stack has to build its own foundations, and OpenTofu’s -migrate-state is the supported tool for exactly that shape.

Pulling it together

Every OpenTofu stack needs a backend to store state, and the bootstrap stack’s job is to create the backend, so on its first run the bucket it needs doesn’t yet exist.

The resolution is to run bootstrap once with a local backend, let that apply create the bucket and key, then rewrite the backend configuration and tofu init -migrate-state the state into the bucket bootstrap just made. The infra repo did it twice, local to S3 and later S3 to GitLab, which shows the real point: state is portable, and the backend is just where you keep it. Doing it this way, rather than hand-creating the bucket, is what keeps that critical bucket managed in code from its very first day.

The cleanup tool that almost deleted its own hands

Tue, 05 May 2026 00:00:00 +0000

The first time I pointed aws-nuke at a real account, the dry-run printed hundreds of lines of angry red text and my stomach dropped. Then I read it properly, and two things turned out to be true at once. Almost all of that red was noise. And the one operation I genuinely should have worried about wasn’t red at all.

A tool whose whole job is destruction

aws-nuke deletes everything in an AWS account. That’s the point of it: when you spin up a throwaway account to try something, aws-nuke is how you tear it back down to nothing afterwards rather than leaving resources quietly billing forever. go-tool-base’s bootstrap renders a scoped aws-nuke config for exactly this, from a nuke-config module, so the teardown is described in code rather than typed by hand at the worst possible moment.

A tool that deletes everything is a tool you run in dry-run first, every single time, and read the output before you let it touch anything. So that’s what I did. And the output was alarming in a way that turned out to be completely meaningless, and reassuring in a way that turned out to hide the one real hazard.

The wall of red that means nothing

A fresh account threw up screen after screen of SubscriptionRequiredException. Hundreds of lines, all red, all looking like something had gone badly wrong.

They hadn’t. aws-nuke works by asking every region “do you have any of this kind of resource?”, for every kind of resource it knows about. On a brand-new account you’ve never enabled most services in most regions, so the API’s honest answer is “you’re not subscribed to that here”, which surfaces as an exception, which the tool dutifully logs in red. It isn’t a failure. It’s the sound of an empty account being asked four hundred questions and answering “nothing here” to almost all of them.

The skill, and it is a skill, is learning to read a destructive tool’s dry-run and tell the noise from the signal. SubscriptionRequiredException on a fresh account is noise. Once you know that, the wall of red stops being frightening and becomes scenery.

There’s a related trap in the same neighbourhood, and the nuke-config module’s own regions variable documents it. aws-nuke has a special global pseudo-region for things that don’t live in one place (IAM, Route 53, CloudFront), and then the actual regions for everything else. It also accepts all, meaning every enabled region. Mixing all with explicit region values scans some regions twice and muddies the output, so the module’s guidance is to pick one approach or the other. More scenery you have to learn to read before the genuinely important line will stand out.

The line that should have scared me, and didn’t look like it

Buried in that calm-looking eye of the storm, among the resources aws-nuke intended to delete, were the IAM resources granting the identity running the nuke its administrative access.

Sit with that for a second. aws-nuke runs as some principal with enough power to delete everything. To delete EVERYTHING, it has to delete IAM resources too. And if the plan deletes the very grant that gives the running identity its admin before it’s finished, the tool strands itself partway through: no permissions left to complete the teardown, and now you’ve got a half-nuked account and a principal that can’t act on it. The cleanup tool sawing off the branch it’s standing on, calmly, without a single red line to warn you, because from the API’s point of view deleting that resource is a perfectly valid request.

That’s the operation that actually mattered, and it was the quietest thing in the output.

Two ways to keep its hands attached

The fix has two halves, one explicit and one structural.

The explicit half is to preserve the privilege path. The nuke-config module passes a caller-supplied set of filters straight through into the rendered config, so you tell aws-nuke “everything except these”. You exclude the identity running the nuke, and the policy and path that grant it admin, from deletion. The tool cleans the account and leaves its own hands alone, because you told it which resources are off-limits.

The structural half is to not give it a tempting separate thing to delete in the first place. If an identity gets its admin through an IAM group it belongs to, that group is its own deletable resource, one more thing in the plan, one more way to be stranded. The automation role in terraform-aws-bootstrap instead takes its policies as direct attachments to the role itself:

resource "aws_iam_role_policy_attachment" "gitlab" {
 for_each = local.is_gitlab ? var.policy_arns : {}
 role = aws_iam_role.this.name
 policy_arn = each.value
}

No intermediary group sitting there as a separate, deletable object. Flattening the privilege onto the role makes the dependency simpler to reason about and gives the cleanup tool one fewer foot-gun to find. Belt and braces: filter the path out explicitly, and don’t build a structure that invites the problem.

What it comes down to

A destructive tool’s dry-run is the most valuable thing it produces, and reading it well is its own competence. On a fresh account, the screenfuls of red SubscriptionRequiredException are noise, the sound of an empty account answering “nothing here”, and the all-versus-global region wrinkle is more of the same. Learn to see past all of it, because the operation that can actually hurt you is rarely the one shouting. Mine was the calm, unremarkable line proposing to delete the admin grant the nuke needed to finish its own job.

Keep the cleanup tool’s hands attached: filter the privileged path out of the teardown so it’s never a candidate for deletion, and attach that privilege directly rather than through a group that’s just one more thing to delete. Then let it loose on everything else, which is, after all, what you brought it in to do.

The security finding you must not fix

Mon, 04 May 2026 00:00:00 +0000

A security scanner flagged a finding in my Terraform, and the correct response, the one I had to talk myself into, was to leave it exactly as it was. Not because the finding was wrong about what the code did. It was right. It’s that doing what it asked would have quietly bricked the account.

A finding that looks open and shut

I run checkov over the infrastructure as part of CI, and on the KMS key that protects the Terraform state bucket it raised CKV_AWS_111: a key policy that grants kms:* is overly permissive. On the face of it, unarguable. The policy says the account root can perform any KMS action on the key, with a resource of *. A wildcard action and a wildcard resource is the exact shape a scanner is built to shout about, and ninety-nine times in a hundred it’d be right to.

This was the hundredth time.

Why narrowing it bricks the key

Here’s the bit that turns the finding on its head. That kms:*-for-root statement isn’t an over-broad grant I left lying around. It’s the default key policy AWS itself applies, and it’s load-bearing in a way that’s easy to miss.

A KMS key is administered through its own key policy, and that policy is the only way in. Unlike most resources, IAM permissions elsewhere can’t grant access to a key whose policy doesn’t allow it. So if you “tighten” the key by removing root’s full control, and you don’t perfectly replace it with some other administrative principal, you can end up with a key that nobody can administer. Not you, not root, not a future you with a very good reason. KMS will not let you recover it. The key, and anything it encrypts, is stranded.

The kms:*-for-root statement is what keeps the account’s own root able to manage the key as a last resort. It’s not the vulnerability. It’s the escape hatch, and the scanner was asking me to weld it shut.

So the finding gets suppressed, out loud

The answer isn’t to silence the scanner globally, and it isn’t to obey it. It’s to suppress this finding, on this resource, with the reasoning written right there next to it, from modules/state-backend/main.tf:

data "aws_iam_policy_document" "kms" {
 # checkov:skip=CKV_AWS_111:kms:* on the CMK for the account root is the
 # AWS-documented pattern; narrowing it risks an unrecoverable lockout from
 # the key. See https://docs.aws.amazon.com/kms/latest/developerguide/key-policy-default.html
 statement {
 sid = "AllowAccountRootAdmin"
 effect = "Allow"
 principals {
 type = "AWS"
 identifiers = ["arn:aws:iam::${var.account_id}:root"]
 }
 actions = ["kms:*"]
 resources = ["*"]
 }
}

The skip carries a reason and a link to AWS’s own documentation of why this is the recommended default. That matters more than it looks. A bare # checkov:skip with no explanation is indistinguishable from laziness, and the next person to read it (quite possibly me, a year on) has no way to tell whether it was a considered decision or someone making a red mark go away. A skip with a documented reason is a decision you can audit. The finding is still visible in the sense that the suppression is right there in the code, attached to the thing it’s about, defensible out loud.

A scanner is an argument, not an order

The wider lesson is the one worth keeping, because it generalises well past this one key. A static-analysis finding is a prompt to think, not an instruction to comply with. Most of the time thinking leads you straight to “yes, fix it”, and you should. But a scanner encodes a general rule, and general rules meet specific contexts where they’re wrong, or merely irrelevant, or, in the rare and dangerous case, actively harmful to obey. kms:* for root on a customer-managed key is that last kind: the tool’s general rule (“wildcards are bad”) collides with a hard AWS-specific fact (“root must retain control of the key or it’s gone”).

The discipline that keeps this honest is the one in the code above. You don’t get to ignore a finding. You get to suppress it, scoped to the exact resource, with a reason a reviewer can weigh. Cheap enough that you’ll do it properly, costly enough that you won’t paper over a real finding by reflex.

The bottom line

checkov was right that the state-bucket key grants the account root kms:* on *. It was wrong that I should narrow it, because that statement is AWS’s documented default and the thing that stops the key becoming permanently unadministrable. The fix was a scoped checkov:skip carrying its reasoning and a link to the AWS docs, so the decision lives next to the code and can be defended rather than merely trusted.

Treat your scanner as a sharp, tireless colleague who’s usually right and occasionally, confidently, about to lock you out of your own key. Read every finding. Obey most of them. And write down, in the open, the rare one you mustn’t.

Two telemetry events, one mangled line

Sun, 03 May 2026 00:00:00 +0000

A line in a log file that no parser would touch. Not a wrong value, not a missing field. Half of one telemetry event spliced into the middle of another, like two people typing into the same text box at once. Which, it turns out, is pretty much exactly what had happened.

A format with exactly one rule

rust-tool-base writes its telemetry to a file as JSONL: one JSON object per line, newline at the end, next object on the next line. It’s a lovely format to work with precisely because it has one rule, and the rule is simple. Every line is a complete object. Honour that and you can tail it, grep it, stream it into anything. Break it once and the whole file is suspect, because now a reader can’t trust that a line is a line.

So the one job the file sink has, beyond writing the right bytes, is to never let two events end up sharing a line.

“Appending is atomic, though, isn’t it?”

The mental model I started with, and I suspect I’m not alone, was this: open the file with O_APPEND, write the serialised event, and the operating system tacks it onto the end atomically. Two writers can’t tread on each other because each write goes to wherever the end currently is, no questions asked. I’d half-remembered O_APPEND as the thing that makes concurrent appending safe, full stop.

It’s half true, and the half that’s missing is the half that bit me.

O_APPEND does guarantee one thing: the seek-to-end and the write happen as a unit, so you never get the classic lost-update where two writers compute the same offset and clobber each other. Good. What it does not guarantee, on POSIX, is that a single write() of arbitrary size is atomic with respect to other writers. That atomicity has a ceiling, and the ceiling is PIPE_BUF: 4096 bytes on Linux. Under it, a write is all-or-nothing against other writes to the same file. Over it, the kernel is entirely within its rights to split your write into chunks, and another writer’s bytes can land in the gap between them.

The fat event that went over the edge

For a long time nothing went wrong, which is the most dangerous way for a bug like this to behave. A typical event, a command name, a duration, a status, an attribute or two, serialises to a few hundred bytes. Comfortably under four kilobytes, so comfortably inside the atomic window. Hundreds of them a day, never a problem.

Then an event came along with a lot of attributes on it, and its serialised form sailed past 4 KiB. Two of those emitted at roughly the same moment, both over the line, and O_APPEND did the only thing it had ever promised: it put each write at the end. It said nothing about not interleaving the bytes on the way, because past PIPE_BUF that was never on offer. One spliced line, one file a parser would now choke on.

The fix isn’t a bigger write, it’s a smaller gate

You can’t buy your way out of this with a bigger buffer, because there’s no buffer size that’s reliably atomic above PIPE_BUF. The fix is to stop relying on the kernel for mutual exclusion you can do yourself: serialise the events through a lock, so only one write is ever in flight at a time. The FileSink carries a mutex for exactly that, and the doc comment on it is the whole post in a paragraph, from crates/rtb-telemetry/src/sink.rs:

pub struct FileSink {
 path: PathBuf,
 // Serialises concurrent `emit` calls. Shared across `Clone`s of
 // the same `FileSink` so multiple handles to the same path also
 // serialise correctly.
 gate: Arc<tokio::sync::Mutex<()>>,
}

If you don’t write Rust day to day (the primer has the rest of the basics): tokio::sync::Mutex is an async-aware lock, .await is where a task waits its turn for that lock without blocking the whole thread, and the Arc wrapper is shared ownership. That Arc is the load-bearing bit: it means every clone of the FileSink points at the same gate, rather than each getting its own lock that guards nothing.

The detail I like is where the lock sits. The event is serialised to a string first, outside the critical section, because turning an event into JSON is the expensive part and there’s no reason to hold the gate while you do it. Only then does emit take the lock, and it holds it across the whole open-write-flush, so no other emit can interleave a single byte:

// Serialise the line outside the critical section.
let mut line = serde_json::to_string(&redacted)?;
line.push('\n');

let _guard = self.gate.lock().await;
// ...create parent dir, open with append(true)...
f.write_all(line.as_bytes()).await?;
f.flush().await?;

write_all makes sure the whole line goes out as one logical write from our side, and the gate makes sure ours is the only one happening. The 4 KiB cliff is still there in the kernel. We just never walk near it any more, because we’ve serialised the writers ourselves rather than hoping the OS would.

The bit even the lock can’t fix

There is however a genuine limit, and the comment is upfront about it. The mutex lives in the process. Two FileSinks in two different processes, both pointed at the same file, are back to relying on O_APPEND alone, and back under the 4 KiB ceiling. The lock can’t reach across a process boundary, so it doesn’t pretend to. The guidance there is the older, duller, correct one: give each process its own file and aggregate them somewhere else. Don’t have two processes fighting over one log file and expect the filesystem to referee.

What it comes down to

O_APPEND is a real guarantee, just a much smaller one than its name talks you into. It keeps your write at the end of the file, and it keeps concurrent writes from interleaving only while they stay under PIPE_BUF, which on Linux is 4096 bytes. A fat JSON event slides straight over that and takes your file’s one rule with it.

The fix was never exotic. Serialise the line, take a mutex, do the write under it, and the interleave can’t happen because there’s only ever one writer at a time. The POSIX manual had all of this written down long before I went and learned it the interesting way, which is, I’m told, how most people meet PIPE_BUF too.

A state bucket that defends itself

Sat, 02 May 2026 00:00:00 +0000

OpenTofu’s remote state file is, quietly, the most sensitive thing in an infrastructure repo. It’s a plain JSON document listing every resource you manage, every ID, and, depending on your providers, the odd secret in clear text. So the S3 bucket that holds it can’t just be a bucket. It has to actively defend itself, on three separate fronts.

The most sensitive file in the repo

OpenTofu, like Terraform, keeps a state file: a JSON document recording every resource the stack manages, its real-world ID, and its attributes. It’s how the tool knows what already exists. It’s also, quietly, the most sensitive file in the whole repo. It can hold resource identifiers an attacker would value, and depending on the providers in play it can hold secret values in clear text.

Three bad things can happen to it. It can be deleted, and now the tool has forgotten everything it manages. It can be read by someone who shouldn’t. It can be corrupted by two runs writing at once. The bucket that holds remote state has to defend against all three, and terraform-aws-bootstrap’s state-backend module is built around doing exactly that.

The DynamoDB lock table is gone

Start with the corruption problem, because the answer changed recently.

The long-standing pattern for remote state on AWS was an S3 bucket plus a DynamoDB table. S3 held the state; the DynamoDB table held a lock, so two apply runs couldn’t write at once. Everyone who’s done Terraform on AWS has provisioned that table, probably more times than they’d care to count.

OpenTofu 1.10 made it unnecessary. The S3 backend gained use_lockfile, which does the locking with a small lock object in the same bucket, using S3’s conditional-write support. No separate table. The state backend is now genuinely one bucket and one key, with the lock living beside the state. It’s one fewer resource to create, one fewer thing to pay for, and one fewer moving part to reason about. The module takes the new path, and the DynamoDB table simply isn’t there.

A bucket you can’t delete by accident

Deletion is guarded with lifecycle { prevent_destroy = true } on the bucket. With that set, OpenTofu refuses to produce a plan that would destroy the bucket. A stray tofu destroy, a refactor that drops the resource, an accidental rename: all of them fail loudly instead of quietly taking the state bucket with them.

This is also why the state-backend module is hand-rolled from raw aws_s3_bucket resources rather than wrapping a community module like terraform-aws-modules/s3-bucket. prevent_destroy has to sit on the actual resource, and a lifecycle block isn’t something you can pass into a wrapper module as an input. Hand-rolling the bucket keeps prevent_destroy somewhere you can put it and, just as importantly, somewhere the next reader can see it. (There’s a whole post coming on why I hand-rolled every module; this is one of the reasons in miniature.)

Reject anything encrypted wrong

Confidentiality is the subtle one, because the obvious control isn’t enough.

The bucket has a default encryption configuration: server-side encryption with the customer-managed KMS key. But default encryption is a default. A client making a PutObject call can override it per request, asking for plain AES256 or a different KMS key, and S3 will honour the override.

So the module doesn’t rely on the default. The bucket policy explicitly denies the upload it doesn’t want. It denies any request not over TLS. It denies any PutObject that isn’t using SSE-KMS. And it denies any PutObject that names the wrong KMS key. The default encryption config says “this is what you get if you don’t ask”; the bucket policy says “and you’re not allowed to ask for anything else”. State can only ever land encrypted, in transit and at rest, under the one key the module controls.

One small companion setting: bucket_key_enabled. With per-object SSE-KMS, every object operation is also a KMS API call, which costs money and can throttle. An S3 Bucket Key collapses those into far fewer KMS calls, cutting per-object KMS traffic by well over ninety per cent. It’s a one-line setting the module turns on and most people forget exists.

In short

Remote state is the most sensitive file an infrastructure repo has, and the bucket that holds it has to defend against deletion, disclosure and corruption.

terraform-aws-bootstrap’s state backend handles corruption with OpenTofu 1.10’s use_lockfile, dropping the old DynamoDB lock table entirely. It guards deletion with prevent_destroy, which is also why the bucket is hand-rolled rather than wrapped. And it guards confidentiality with a bucket policy that denies non-TLS traffic and denies any upload not encrypted with the right KMS key, because default encryption is only a default and a client can override it. The state bucket isn’t just a place to put state. It’s built to refuse every wrong thing that could happen to it.

Supporting a provider, or actually using it

Sat, 02 May 2026 00:00:00 +0000

If your CLI tool talks to an AI model, you don’t want to hard-wire one vendor. So you reach for a single client interface over several providers, which is the right call. The trap is the next step: build that interface on only what every provider has in common, and you quietly throw away the very features that made you want a particular provider in the first place. rust-tool-base’s rtb-ai refuses to make that trade.

The pull toward one interface

If your CLI tool talks to an AI model, hard-wiring one vendor is a poor bet. One user has an Anthropic key, another an OpenAI key. Someone’s on Gemini. Someone runs Ollama locally because their data can’t leave the building. Someone points at an OpenAI-compatible endpoint from a provider you’ve never heard of. You don’t want a separate code path for each, so you want one AiClient that all of them slot behind.

rtb-ai gets that unification from the genai crate, which already speaks to Anthropic, OpenAI, Gemini, Ollama and OpenAI-compatible endpoints. One interface, five providers, the tool author picks one in config. The Go sibling makes the same bet: go-tool-base’s chat package also unifies several providers, behind an interface deliberately kept to four methods. So far this is the obvious design, and if it were the whole design there’d be nothing to write about.

What “unified” quietly costs you

Here’s the catch in any unified interface. It can only expose what every provider behind it has in common.

The common subset is plain chat. Messages go in, text comes out, optionally streamed token by token. That’s real and it’s useful and every provider does it. But the common subset is also the floor, and the features that make a particular provider worth choosing are almost never on the floor. They’re the things only that provider does.

Anthropic is the sharp example, because it has three features that matter and not one of them is common-subset.

Prompt caching. You can mark the stable parts of a request, the system prompt and the tool list, as cacheable. The provider keeps them warm, and on the next turn you aren’t billed to re-send and re-process text that didn’t change. On a long agent loop, where the same large system prompt rides along on every single turn, that’s a substantial saving in both cost and latency.

Extended thinking. The model works through a hard problem in a visible, budgeted reasoning pass before it commits to an answer, and you can see that reasoning.

Citations. Structured references back to source material in the response.

A client built strictly on the common subset can’t express any of those. It has no field for them, because four of the five providers wouldn’t know what to do with the field. So a purely lowest-common-denominator client would “support” Anthropic and then use it badly, leaving its best features unreachable. Support as a checkbox, not as the point.

The escape hatch

rtb-ai’s answer is to not choose. It runs two implementations under one interface.

For OpenAI, Gemini, Ollama and OpenAI-compatible endpoints, calls route through genai, the unified path. For Anthropic, every method drops to a direct reqwest implementation straight against the Messages API. Same AiClient on the surface, a different implementation underneath, selected by which provider the config names.

And the request type has deliberate room for the difference:

pub struct ChatRequest {
 pub system: Option<String>,
 pub messages: Vec<Message>,
 pub temperature: Option<f32>,
 pub max_tokens: Option<u32>,
 /// Anthropic-only: enables prompt caching at every stable point.
 /// Ignored on non-Anthropic providers.
 pub cache_control: bool,
 /// Anthropic-only: extended-thinking budget. `None` disables.
 /// Ignored on non-Anthropic providers.
 pub thinking: Option<ThinkingMode>,
}

Set cache_control and the Anthropic-direct path inserts cache breakpoints at the three stable points: the system prompt, the tool list, and the first message. Set thinking and it adds the thinking block, and streaming surfaces a separate ThinkingToken event so you can show the reasoning apart from the answer. On a non-Anthropic provider, both fields are simply ignored. The interface carries them; only the implementation that understands them acts on them.

A hatch, not a leak

It’s worth being precise about why this isn’t the thing it superficially resembles, which is a leaky abstraction.

A leaky abstraction is one where implementation details bleed through that you didn’t intend and can’t reason about. The abstraction quietly fails to abstract, and you’re left guessing which provider you’re really talking to.

This is the opposite of that. The two Anthropic-only fields aren’t a leak. They’re named, documented as Anthropic-only, inert everywhere else, and right there in the public type for anyone to see. The interface is uniform for the common case and deliberately, visibly non-uniform at exactly the points where uniformity would have cost you the good features. You opt into provider-specifics by setting a field. You stay fully portable by leaving it at its default. Nothing bleeds; you decide.

The same design line explains what does stay in the unified path. Structured output, chat_structured::<T>, sends a JSON Schema derived from your Rust type with the request and validates the reply against it before handing you a typed T. That’s a portability win that costs nothing across providers, so it belongs in the common interface. The split isn’t “Anthropic versus the rest”. It’s “features that are free to unify go in the unified path; features that aren’t get a designed door”. Prompt caching and extended thinking get the door, because flattening them away would be the expensive kind of convenient.

To sum up

A CLI tool that integrates AI wants one client over several providers, and a unified interface can only expose what those providers share. The shared floor is plain chat, and the features worth choosing a provider for, like Anthropic’s prompt caching, extended thinking and citations, are never on the floor.

rtb-ai keeps both. genai provides the unified path across five providers; an Anthropic-direct reqwest path drops below the abstraction for the features genai can’t reach, and ChatRequest carries the Anthropic-only fields openly, ignored elsewhere. Uniform where uniformity is free, with a designed escape hatch where it isn’t. That’s the difference between supporting a provider and actually using it.

Errors without an error handler

Fri, 01 May 2026 00:00:00 +0000

In the porting post I said go-tool-base’s error handler was one of the bits that didn’t survive the move to Rust, and promised to come back to it. Here’s the come-back. The short version is that Rust hands you, for free, the single consistent error exit that go-tool-base had to build a whole component to get.

What go-tool-base built

A while ago I wrote about error handling in go-tool-base. The core of it: an error should carry a hint, a separate field of human guidance telling the user what to do next, kept apart from the error’s identity so code can still match on it.

The other half of that post was about consistency. Every go-tool-base command returns its errors the idiomatic Cobra way, and they all funnel into one Execute() wrapper at the root, which routes every error through one ErrorHandler. One door out. Presentation decided in exactly one place, so no command can render a failure differently from its neighbour.

That handler is a real object. It exists, it’s wired in, it’s the thing every error passes through. Building it was a deliberate piece of work, and it was the right call for Go.

When I rebuilt this in Rust, the handler didn’t survive the move. Not because consistency stopped mattering. Because Rust gives you the single exit for free, and an object to enforce it would just be re-implementing something the language already does for you.

The shape of a Rust error

Start with the type. In rust-tool-base every crate defines its own error enum, and every one of them derives two traits:

#[derive(Debug, thiserror::Error, miette::Diagnostic)]
pub enum ConfigError {
 #[error("config file not found at {path}")]
 #[diagnostic(
 code(rtb::config::not_found),
 help("run `mytool init` to create one, or set MYTOOL_CONFIG"),
 )]
 NotFound { path: PathBuf },
 // ...
}

thiserror::Error makes it a proper error type. miette::Diagnostic is the interesting one. A Diagnostic is an error that also carries the things you’d want when presenting it: a stable code, a severity, a help string, and optionally source labels pointing at spans of input. The help line is the same idea as go-tool-base’s hint, the recovery step, except here it’s an attribute on the variant rather than a field threaded through a wrapper.

So the guidance lives on the error, structured, from the moment the error is created.

There is no handler, there’s a convention

Here’s where Rust does the work go-tool-base’s handler was built to do.

A rust-tool-base main looks like this:

#[tokio::main]
async fn main() -> miette::Result<()> {
 rtb::cli::Application::builder()
 .metadata(/* ... */)
 .version(VersionInfo::from_env())
 .build()?
 .run()
 .await
}

main returns miette::Result<()>. Every command’s run returns a Result too. In between, errors propagate with the ? operator: a function that hits an error returns it upward, immediately, and the caller does the same, all the way to main. Nobody writes a “check this error” call. ? is the propagation.

And when an error reaches main and main returns it, something has to render it for the user. That something is a report hook. rust-tool-base installs one at startup, and from then on any Diagnostic that exits main is rendered through it: the code, the severity, the help text, the source labels, with colour. One renderer, installed once.

Look at what that adds up to. Every error in the program flows to one place, main. It’s rendered by one thing, the hook. Presentation is decided in exactly one location and no command can deviate from it. That’s precisely the property go-tool-base’s ErrorHandler was built to guarantee. The difference is that nobody built it. The single exit is just where ? propagation ends, and the single renderer is one hook. The language’s own convention for returning errors from main is the funnel.

Errors are values, all the way

The thing that took me a moment to fully trust is that there’s no funnel to maintain, because there’s no funnel as an object. go-tool-base’s handler is a component: it can drift, it has to be kept in the path, a command could in principle be wired to bypass it. The Rust version cannot be bypassed, because bypassing it would mean a command not returning its error, and an error you don’t return is a compile-time warning at best and dead-obvious wrong code at worst.

So the model is just: errors are values, you return them, ? carries them up, main hands the last one to the hook. The consistency isn’t enforced by a guard. It’s the only thing the shape of the language really lets you do.

go-tool-base reaches a single, consistent error exit by building one and routing everything through it. rust-tool-base reaches the same exit by having errors be ordinary return values and letting them fall out of main. Same outcome. One of them is a component you own; the other is a convention you inherit.

Worth remembering

go-tool-base funnels every error through one ErrorHandler so presentation stays consistent. That handler is a deliberately built component, and it’s the right design in Go.

rust-tool-base has no handler. Every crate’s error type derives miette::Diagnostic, carrying its code, severity and help text. Errors propagate with ? to main, which returns miette::Result, and a framework-installed hook renders whatever comes out. The single consistent exit is the end of ? propagation, and the single renderer is one hook. The funnel go-tool-base built by hand is, in Rust, just the language’s return-from-main convention.

The bootstrap that does almost nothing

Fri, 01 May 2026 00:00:00 +0000

A brand-new AWS account is a slightly nerve-wracking thing. It can do almost anything, it’s hardened against almost nothing, and the list of stuff you ought to set up before you trust it with anything real is long. The natural instinct is to write one big “set up the account” module that does the whole list in a single apply. I want to talk you out of that, because the bootstrap module I’m happiest with does almost nothing, on purpose.

The first-apply problem

A brand-new AWS account is not ready for anything serious. Before you’d responsibly run real infrastructure into it, you want an account baseline: a password policy, account-wide S3 public-access blocking, default EBS encryption, CloudTrail, AWS Config, GuardDuty, alerting, a sensible human operator role. It’s a long list, and all of it matters.

The instinct, faced with that list, is to write one big “set up the account” module and have it do everything. One tofu apply, a fully prepared account, done.

That instinct is worth resisting, and terraform-aws-bootstrap resists it deliberately.

Three things, and a hard line

terraform-aws-bootstrap does three things:

state-backend, an S3 bucket and a customer-managed KMS key to hold remote Terraform state.
automation-iam, an OIDC identity provider and an IAM role that CI assumes to apply everything else.
nuke-config, which renders an aws-nuke configuration scoped to the account, for tearing a throwaway account back down.

That’s the whole module. Account hardening, CloudTrail, AWS Config, GuardDuty, the operator role, the alerting: none of it is in here. And it’s not absent by accident. The README has a section headed “what’s deliberately NOT in scope” that lists those exclusions out loud. The boundary is written down, because the boundary is the design.

Why the line is exactly there

The reason the line sits where it does is the most useful idea in the module.

Everything bootstrap excludes belongs in a separate stack, applied through the automation role bootstrap creates. Bootstrap’s only job is to get the account to the point where the next tofu apply can run properly: somewhere to store state, and an identity to run as. Once those two things exist, hardening the account isn’t a special bootstrapping act. It’s just another apply, done the normal way: in CI, reviewed, versioned, deployed through the role.

So the account baseline doesn’t need to be bundled into the bootstrap. It needs to be downstream of it. Bootstrap builds the on-ramp; it doesn’t also have to be the motorway.

A narrow module stays re-runnable

There’s a practical payoff to the narrowness, and it’s about fear.

Bootstrap is the one stack that can’t be applied through CI, because it’s what creates the CI identity in the first place. It runs locally, by a human, rarely. That’s exactly the kind of operation you want to be small, boring, and safe to repeat.

A bootstrap module that also did account hardening would be a large, stateful thing managing dozens of resources. Re-running it would be a held-breath operation. Keeping it to three concerns keeps it the opposite: a small stack you can read top to bottom, re-run without anxiety, and reason about completely. The narrowness isn’t minimalism for its own sake. It’s what keeps the one human-applied stack trustworthy.

The boundary is the feature

It’s tempting to judge a module by how much it does. A bootstrap module is the case where that’s exactly backwards. Its value is in how cleanly it stops.

terraform-aws-bootstrap does the bare minimum to make an account ready for the next apply, writes down everything it refuses to do, and hands off to a downstream stack for all of it. The next post follows the trickiest of its three jobs: the state backend has a genuine chicken-and-egg problem, because it has to store Terraform state in a bucket Terraform hasn’t created yet.

Where this leaves us

A fresh AWS account needs a long list of things before it’s safe, and the obvious move is one big module that does the lot. terraform-aws-bootstrap deliberately does only three: a state backend, a CI identity, and an account-scrub config. Everything else is written down as out of scope.

The boundary is the design. The excluded work belongs in a downstream stack applied through the CI role bootstrap creates, so hardening is just a normal reviewed apply rather than a bootstrapping special case. And keeping the one human-run, locally-applied stack small is what keeps it safe to re-run. A bootstrap module is judged by where it stops.

Two kinds of feature flag

Thu, 30 Apr 2026 00:00:00 +0000

go-tool-base has feature flags: switches that decide which built-in commands are live in a given run. rust-tool-base has those too. But it also has a second, completely separate kind of flag, and the difference between them is one of those distinctions that’s obvious the moment you see it and dangerously easy to conflate before you do. One decides what a command does. The other decides whether a chunk of code is in the binary at all.

A workspace of crates

Before the flags, the shape that makes them possible. go-tool-base is one Go module with packages under pkg/. rust-tool-base is a Cargo workspace of seventeen crates: rtb-app, rtb-config, rtb-cli, rtb-vcs, rtb-ai, rtb-mcp, rtb-docs, rtb-telemetry, and so on, with an umbrella crate called rtb that re-exports the public surface.

That isn’t tidiness for its own sake. Each subsystem being a separately compilable crate is what gives you a unit you can include or exclude wholesale. Hold onto that, because it’s the hinge for everything below.

The flag go-tool-base already has

go-tool-base has feature flags, and I’d describe them as runtime flags. A tool built on it can enable or disable built-in commands:

props.SetFeatures(
 props.Disable(props.InitCmd),
 props.Enable(props.AiCmd),
)

At startup the framework resolves that set and decides which commands are reachable for this run. The init command might be present in the binary but switched off; the ai command might be switched on. It’s about the user-facing surface: which commands exist for someone typing --help.

rust-tool-base keeps this idea. A command carries a CommandSpec with an optional feature field, and the runtime decides whether a feature-gated command is reachable. Same purpose: shape the surface per invocation.

If that were the whole story, there’d be nothing to write. The reason there’s a post is the other kind of flag, which Rust makes available and Go really doesn’t.

The flag Rust adds

Cargo features are a compile-time mechanism. The rtb umbrella crate declares them like this:

[features]
default = ["cli", "update", "docs", "mcp", "credentials", "tui"]
cli = ["dep:rtb-cli"]
update = ["dep:rtb-update"]
ai = ["dep:rtb-ai", "rtb-docs?/ai"]
vcs = ["dep:rtb-vcs"]
telemetry = ["dep:rtb-telemetry"]
full = ["cli", "update", "docs", "mcp", "ai", "credentials", "tui", "telemetry", "vcs"]

Each subsystem is an optional crate dependency, and a feature switches it on. This is a different kind of switch entirely, and the difference is the whole point.

A runtime flag decides what a command does while the program runs. The code is in the binary either way; the flag just gates it.

A Cargo feature decides what’s in the binary in the first place. Build a tool without the vcs feature and rtb-vcs is not compiled. Its dependencies are not compiled. gix, the pure-Rust Git implementation rtb-vcs pulls in, roughly two and a half megabytes of it, is not compiled and not linked. It isn’t switched off in the binary. It was never in the binary. The compiler never even saw it.

That’s something a runtime flag cannot do, because by the time anything runs, the binary already exists with everything in it.

Two axes, kept separate

So rust-tool-base has two flag systems answering two genuinely different questions.

Cargo features answer: what is this binary made of? They’re decided when you build the tool, in Cargo.toml. They control compilation, binary size, dependency surface, and compile time. A tool that never touches Git builds without vcs and is smaller, faster to compile, and has a smaller dependency tree to audit. A tool that wants everything turns on full.

Runtime feature flags answer: what can the user do in this run? They’re decided as the program starts. They control which commands appear, which paths are reachable.

These could have been mashed into one mechanism, and it would have been a mistake. The app-context design notes are blunt about it: feature gating doesn’t belong on the per-command context object, because a feature-gated command “either exists or doesn’t” rather than changing its behaviour mid-run. Compile-time composition is one decision, made by the person building the tool. Runtime gating is another, made per invocation. Conflating them would mean you couldn’t reason cleanly about either.

The Go version of this had to be hand-built

This isn’t a thing Go simply lacks. I wrote a whole post about how go-tool-base keeps its optional keychain dependency out of binaries that don’t want it, using a blank import and the linker’s dead-code elimination. It works. But it was a piece of deliberate engineering for one dependency, and getting it right took care.

Cargo features make that same outcome a first-class, declarative thing, and not for one dependency but for every subsystem the framework has. You don’t engineer the exclusion. You name a feature and leave it off. The crate, and its whole subtree, stays out. Rust’s build system was designed for exactly this, and rust-tool-base leans on it across the entire workspace rather than hand-rolling it once.

Where this leaves us

go-tool-base has runtime feature flags: they decide, per invocation, which built-in commands are reachable. rust-tool-base keeps that, and adds a second kind that Rust makes available.

Cargo features decide what the binary is compiled from. Each of the framework’s seventeen crates is an optional dependency, and a feature switched off means that crate and its entire dependency subtree are never compiled or linked. A runtime flag gates what code does; a Cargo feature gates whether code is there at all. Two axes, two questions, deliberately kept as separate systems.

forbid means forbid, until linkme needs a word

Wed, 29 Apr 2026 00:00:00 +0000

There’s a line at the top of every production crate in rust-tool-base that I’m quietly proud of: #![forbid(unsafe_code)]. And there are a couple of files that have to say #![allow(unsafe_code)] instead. Not because I wrote anything unsafe. Because a macro did, on my behalf, and forbid doesn’t care whose unsafe it is.

Why forbid, and why it isn’t the whole story

rust-tool-base makes a bold promise: no unsafe in its own code. The strong form of that is forbid, not deny. deny(unsafe_code) makes unsafe a compile error that any module can quietly re-permit with its own #[allow]. forbid can’t be overridden from inside the crate at all. That’s the appeal: nobody gets to wave unsafe through in a hurry.

So the workspace lint sits at deny, and every production lib.rs then tightens it to forbid:

# Cargo.toml
[workspace.lints.rust]
unsafe_code = "deny"

// crates/rtb-error/src/lib.rs
#![forbid(unsafe_code)]

Why deny at the workspace but forbid in each crate? Because deny leaves an escape hatch open for the rare file that genuinely needs one, while forbid slams it shut everywhere it can. Almost every file gets forbid. A tiny number need the hatch.

The files that need the hatch

The command and provider registries use linkme’s distributed_slice so backends can register themselves at link time, without life before main. And the linkme attribute expands to code carrying a #[link_section], which the unsafe_code lint counts as unsafe. So any file using the attribute, whether it declares a slice or registers into one, can’t live under forbid.

Here’s the Gitea release backend doing exactly that, from crates/rtb-vcs/src/gitea.rs:

#![allow(unsafe_code)]
// ...
/// Link-time registration entry.
#[distributed_slice(RELEASE_PROVIDERS)]
fn __register_gitea() -> Box<dyn ProviderRegistration> {
 Box::new(RegisteredProvider { source_type: "gitea", factory: factory as ProviderFactory })
}

That #![allow(unsafe_code)] isn’t there because the backend does anything dangerous. It’s there because the registration macro emits a #[link_section], and forbid would, correctly by its own rules, refuse to compile the file.

Where that leaves the promise

The guarantee survives, with an exception you can point at. Every production crate forbids unsafe outright. The workspace sits one notch looser at deny, precisely so the handful of files that use linkme (and a couple of test files that need Rust 2024’s unsafe env mutation) can open a narrow, module-scoped #![allow(unsafe_code)] with a written reason. The absolutist rule met a macro that writes a link_section for you. The answer wasn’t to drop the rule, it was to keep forbid everywhere it can hold and clearly label the one or two spots where it can’t.

A framework that contains no unsafe

Tue, 28 Apr 2026 00:00:00 +0000

“It’s written in Rust” gets thrown around as if it were a memory-safety guarantee. It mostly isn’t. Rust is memory-safe by default, which is a wonderful thing, but the unsafe keyword exists precisely so any crate, any module, can step outside that default when it needs to. So “written in Rust” really means “mostly safe, probably”. rust-tool-base makes the stronger claim about its own code, and gets the compiler to enforce it.

Safe by default is not the same as safe

People reach for Rust because of memory safety, and the reputation is earned. Write ordinary Rust and the compiler will not let you have a use-after-free, a data race, or a buffer overrun. That’s the default, and it’s a very good default.

But it’s a default, and defaults can be turned off. Rust has an unsafe keyword precisely so that, when you genuinely need to, you can dereference a raw pointer, call into C, or tell the compiler you’ve upheld an invariant it can’t check itself. Inside an unsafe block, the guarantees are yours to maintain, not the compiler’s to enforce.

That keyword has to exist. Some of the most foundational crates in the ecosystem are built on it, carefully. But it means a fact worth being precise about: a project being “written in Rust” tells you its code is mostly safe. It does not tell you the project’s own code contains no unsafe. Those are different claims, and only the second one is a guarantee.

rust-tool-base makes the second claim about its own code, and has the compiler back it up.

`forbid`, not just `deny`

The mechanism is one line at the top of every crate:

#![forbid(unsafe_code)]

unsafe_code is a lint, and Rust lints have levels. The interesting choice is forbid rather than deny, because the two are not the same strength.

deny makes the lint an error. But it’s an error a downstream module can locally override. Anyone can write #[allow(unsafe_code)] on a function or a block and the deny is lifted right there. As a policy, deny is “don’t do this unless you really mean to”, and “unless you really mean to” is a door.

forbid is the strict one. It makes the lint an error and it makes that error impossible to override from inside the crate. A module cannot #[allow] its way back out. Once a crate root says #![forbid(unsafe_code)], there’s no unsafe anywhere in that crate, and no local exception can be carved out. The compiler simply refuses.

So every rust-tool-base crate that ships in a built tool forbids unsafe at its root. Not “discourages”. Cannot contain it.

The one subtlety

There’s a wrinkle, and it’s worth showing rather than hiding, because it’s where the design got specific.

The workspace sets unsafe_code = "deny" as the baseline for everything, including test files. But test code occasionally has a real need for unsafe. In the 2024 edition, std::env::set_var became unsafe, because mutating the process environment isn’t thread-safe, and a test that exercises environment-driven configuration has to call it.

So the split is deliberate. The workspace-wide level is deny, which a test file can locally #[allow] when it genuinely needs that one environment call. But every production lib.rs and main.rs additionally carries #![forbid(unsafe_code)], and forbid cannot be relaxed. Test scaffolding gets a controlled, visible exception for a specific standard-library call. Shipping code gets none. The guarantee that matters, “the code in the binary contains no unsafe”, holds, and the place it’s slightly loosened is exactly the place that never reaches a user.

What the guarantee is actually worth

Two things, one for users and one for reviewers.

For users: an entire family of bug is ruled out of first-party code mechanically. Use-after-free, double-free, data races on shared memory, reading off the end of a buffer. These are the classic memory-safety vulnerabilities, and in a crate that forbids unsafe they cannot originate, because the constructs that produce them cannot be written. That’s not careful coding. It’s the compiler refusing to build anything else.

For reviewers: the cost of an unsafe block is mostly the review burden it carries. Every one is a spot where a human has to check, by hand, that an invariant holds, and has to re-check it whenever nearby code changes. A crate that forbids unsafe has zero of those. There’s no unsafe block to audit, ever, because the compiler guarantees there isn’t one.

The promise has a boundary. It covers rust-tool-base’s own code; its dependencies are another matter, and some of them do contain unsafe, correctly. Keeping that side honest is a different job, done by vetting the dependency tree and gating it in CI. Within first-party code, though, the guarantee is real, and there’s no Go equivalent to it. Go has an unsafe package, but nothing that lets a codebase prove, to the compiler, that it never touches it.

The bottom line

Rust is memory-safe by default, but the unsafe keyword exists so that default can be set aside. “Written in Rust” therefore does not by itself mean a project’s own code contains no unsafe.

rust-tool-base makes that the stronger claim. Every crate root carries #![forbid(unsafe_code)], and forbid, unlike deny, cannot be overridden from inside the crate. Test files get a narrow, visible deny-level exception for the one standard-library call that needs it; shipping code gets none. The payoff is a whole class of memory-safety bug ruled out of first-party code by construction, and not one unsafe block left for a reviewer to audit.

Reloading config without a restart

Mon, 27 Apr 2026 00:00:00 +0000

A config file changes. Someone edits a setting, rotates a credential, flips a feature flag. How does the running process find out? For most processes the answer is blunt: it doesn’t, until you restart it. For a short-lived CLI that’s completely fine. For a long-running service, “just restart it” is a much bigger ask than it sounds.

The default answer is a restart

Configuration lives in a file. The file changes: someone edits a setting, rotates a credential, flips a feature flag. How does the running process find out?

Overwhelmingly, the answer is that it doesn’t. A process reads its config once, at startup, and that snapshot is frozen for the life of the process. Change the file and nothing happens until you restart, at which point a fresh process reads the fresh file.

For a short-lived CLI invocation that’s completely fine. It reads config, does its job, exits, and the next invocation reads whatever the file says then. But the same frameworks are also used to build long-running services, and for a service “just restart it” is not the small thing it sounds like.

What a restart actually costs

Restarting a long-running service means every open connection drops. Any in-flight request is lost, or has to be retried by whoever sent it. Caches that took real time to warm are cold again. There’s a window, short but real, where the service simply isn’t serving.

If the thing you changed was a log level, or a feature flag, or a timeout, you’ve paid a disruption wildly out of proportion to the change. And the calculation only gets worse as the service gets more important, because the services you least want to bounce on a whim are exactly the ones that matter most.

Hot-reload: re-read in place

Hot-reload is the alternative, and both go-tool-base and rust-tool-base support it.

The process doesn’t read config once and freeze it. It watches the config file. When the file changes, it re-reads it, re-applies it, and carries on running. No new process, no dropped connections, no cold start. The change lands in the live process.

The shape is the same in both frameworks:

A file watcher notices the config file changed. Underneath, this is the operating system’s own file-notification facility, inotify on Linux and its equivalents elsewhere. rust-tool-base reaches it through the notify crate; go-tool-base, through the watcher built into Viper.
A debounce step waits for the writes to settle. Saving a file is often several separate operations, and you don’t want to reload three times for one edit.
The config is re-parsed from disk.
The new config is swapped in atomically.
Observers are notified, so the subsystems that care can react.

Steps four and five are the ones worth slowing down on, because they’re where a naive hot-reload quietly goes wrong.

The two details that make it safe

The atomic swap. You do not mutate the live config object in place. A reader on another thread, partway through reading it, would see a torn mix of old and new values, and that’s a genuinely nasty class of bug. Instead the process builds a new, complete config value and swaps the pointer to it in a single atomic operation. Any reader sees either the entire old config or the entire new one, never a blend. rust-tool-base does this with arc-swap; go-tool-base does the equivalent. Reads stay cheap and lock-free, and an update is one pointer swap.

The observer notification. Re-reading the file isn’t the end of the job. Some subsystems have to do something when config changes: a connection pool resizes, a logger changes level, a rate limiter takes a new ceiling. So a hot-reload system has to let those subsystems subscribe. rust-tool-base hands observers a watch::Receiver, a channel that always holds the latest value; go-tool-base exposes an Observable interface. A subsystem subscribes once and reacts every time config changes, for the life of the process.

Where this earns its keep: a Kubernetes pod

Hot-reload is a nicety on a developer’s laptop. Inside a Kubernetes pod it becomes genuinely valuable, and the reason is a neat fit between how Kubernetes delivers config and how a file watcher works.

In Kubernetes you don’t usually bake configuration into the container image. It lives in ConfigMap and Secret objects, and the clean way to consume them is to mount them as volumes. Mount a ConfigMap as a volume and each key becomes a file in the pod’s filesystem.

Here’s the part that connects to everything above. When you update that ConfigMap or Secret, Kubernetes does not restart your pod. The kubelet notices the object changed and rewrites the projected files inside the still-running pod. The files on disk change underneath a process that never stopped.

That file rewrite is exactly the event a hot-reload watcher exists to catch. So the whole chain becomes:

You kubectl apply an updated ConfigMap, or rotate a Secret.
The kubelet updates the projected files inside the pod.
The framework’s file watcher sees the write.
The config is re-parsed, swapped in atomically, and observers are notified.
The new configuration is live, and the pod never cycled.

You’ve changed a running service, in a running pod, with no rollout, nothing terminated and recreated, no dropped traffic. Rotate a database credential, raise a log level to debug an incident in progress, flip a feature flag: all of it live. For a service where a restart is the very thing you’re trying hard to avoid, the kind of long-running service these frameworks are built for, that’s the difference between a config change being routine and being an event.

The caveats

Two things, so this doesn’t read as magic.

First, not everything can be hot-reloaded. Some configuration genuinely needs a restart: the port a server binds to, the size of a thread pool, anything wired up exactly once at process start. Hot-reload covers the large category of settings a subsystem can re-read and re-apply; it doesn’t abolish restarts. A config system worth its salt is clear about which settings are live and which are not.

Second, a Kubernetes gotcha that catches people out. The in-place file update happens for ConfigMaps and Secrets mounted as volumes. Consume the same ConfigMap as environment variables instead, and those are fixed when the container starts and never update, short of a restart. If you want hot-reload in a pod, mount config and secrets as files, not env vars. And even with volumes the update isn’t instant: the kubelet syncs on a period, around a minute by default, so a reload is “within a minute or so”, not “the moment you hit apply”.

What it comes down to

A config file changes, and the default way to pick it up is to restart the process. For a long-running service that restart costs dropped connections, lost work and a cold start, often for a change as small as a log level.

go-tool-base and rust-tool-base both support hot-reload instead: a file watcher catches the change, the config is re-parsed and swapped in atomically so no reader sees torn state, and observers are notified so subsystems can react, all in a live process. The setting where it pays off most is a Kubernetes pod, where ConfigMaps and Secrets mounted as volumes are rewritten in place by the kubelet and the watcher catches that write directly. Mount them as volumes rather than env vars, allow for the kubelet’s sync delay, accept that some settings still need a restart, and within those limits “the config changed” stops meaning “cycle the pod”.

A signing key needs somewhere to live

Sun, 26 Apr 2026 00:00:00 +0000

I left a door open a couple of posts ago, and it’s been quietly bothering me ever since. When I wrote about verifying your own downloads, I was honest that a checksum sitting next to the binary only catches accidents. Anyone who can compromise the release platform can swap the binary and the checksum together, and the tool will happily verify one fake against the other.

Closing that gap needs a signature. And a signature, it turns out, needs a surprising amount of infrastructure standing behind it. This is the first post about building that.

The door the last post left open

A while back I wrote about verifying your own downloads: go-tool-base’s self-update command now checks the SHA-256 of every binary it downloads against the release’s published checksums.txt before installing it.

That post was honest about its own ceiling. A checksum file hosted next to the binary it describes shares a trust root with that binary. Both come from the same release, on the same platform. Corruption, truncation, a CDN serving a stale object: a same-origin checksum catches all of those, because they’re accidents and the checksum wasn’t part of the accident. What it can’t catch is an attacker who’s compromised the release platform itself. Someone who can replace the binary can replace checksums.txt in the same breath, and the tool will cheerfully verify the malicious download against the malicious checksum and call it good.

The post named the fix and then deferred it: a signature whose trust root sits somewhere the release platform can’t reach. “That’s the next phase of this work.” This series is that phase.

What a signature actually needs

It’s worth being precise about why a signature helps where a checksum doesn’t, because it’s easy to wave the word “signature” around and assume it settles everything.

A signature closes the gap only under two conditions. The verifying key, the public half, must reach the user by a path the release platform doesn’t control. And the signing key, the private half, must live somewhere the release platform can’t reach.

The second condition is the one people skip. If the signing key sits in the same CI system that builds the release, you’ve gained almost nothing. An attacker who owns the CI owns the key, and a key they own will sign whatever they hand it. The signature verifies perfectly and means precisely nothing. A signature is only worth the distance between the signing key and the thing being signed. Put them in the same place and the distance is zero.

So the signing key has to live in a different security domain from the release pipeline. Not a different folder. A different account, with a different blast radius, that the release platform has no standing access to.

“Just sign the binary” is not a small feature

That reframes a line item that sounds tiny. “Sign the release binary” unpacks into a list:

there must be a private signing key;
it must live outside the release platform, in its own security domain;
it must be access-controlled, audited, and protected from exfiltration;
only the release pipeline may ask it to sign, and only by proving a short-lived, federated identity, never by holding a copy of the key.

That’s not a feature you bolt onto a CLI. That’s infrastructure.

The shape of it: a cloud account, with the key held in a managed key service so the private key material never exists as a file on a disk that anyone, me included, can copy. The release pipeline authenticates to that account as itself, briefly, and asks the key service to produce a signature. The key never moves.

But an account you’re going to trust with a signing key is itself something you have to get right first. An account with a weak baseline, no audit trail, and long-lived credentials lying around is not a safe home for the most security-sensitive key in the whole system. Before the key can move in, the house has to be built and the locks have to actually work.

What this series builds

So this turned into a rather longer project than “add a signature”, and the series follows it in order.

It starts with bootstrapping a fresh AWS account: the deliberately minimal first tofu apply, and the remote state backend that has a genuine chicken-and-egg problem. Then the credential question, which is the heart of it: how a CI pipeline deploys to AWS with no stored access key at all. Then hardening the account, so it’s genuinely safe to hold something valuable. Then the discipline of deploying changes to it: plans reviewed before they’re applied. Then the shared tooling that makes all of it repeatable.

Every one of those pieces exists for the same reason. The signing key needs somewhere to live, and somewhere safe is not a default you’re handed. It’s a thing you build, deliberately, before you have anything worth protecting in it.

The series ends where the verifying-downloads post pointed: a signing service whose key the release platform can’t touch, so a self-updating tool can finally verify that the binary it’s about to become is genuinely the one I published.

The upshot

go-tool-base’s self-update verifies downloads against a checksum, and a same-origin checksum stops accidents but not a compromise of the release platform. The fix is a signature, and a signature is only worth the distance between its signing key and the release pipeline.

Holding that key safely means a private key that never leaves a managed key service, in a separate cloud account, reached only by a short-lived federated identity. That’s infrastructure, and a safe account is something you build before you trust it with anything. The rest of this series builds it, piece by piece, right up to the signing service itself.

Waivers with an expiry date

Sun, 26 Apr 2026 00:00:00 +0000

A vulnerability scanner gives you a yes or a no. Is there a known advisory on a path you actually use? Yes, or no. That’s genuinely useful, and you should run one. But it’s a snapshot, taken on the day you ask, and supply-chain risk in a framework is a bigger and more ongoing thing than a single yes-or-no can capture.

So rust-tool-base treats its whole dependency tree as something to have a policy about, not something to scan and forget.

A scanner answers one question

When I had go-tool-base security-audited, part of the routine was running a vulnerability scanner over the dependencies. Go has a good one. It looks at your dependency graph, cross-references known advisories, and tells you whether any of them reach code you actually call.

That’s useful and you should do it. But notice the shape of what it gives back: essentially a yes or a no. Either there’s a known vulnerability on a reachable path or there isn’t. It answers one question, on the day you ask it.

Supply-chain risk in a framework is broader than that one question, because a framework drags its entire dependency tree into every tool built on it. rust-tool-base treats the whole tree as something to have a policy about, and the tool for that is cargo-deny.

A gate, not a scan

cargo-deny reads a deny.toml and checks the dependency graph against four kinds of rule.

Licences. There’s an allowlist: MIT, Apache-2.0, the BSD variants, ISC, a handful of others. Every transitive crate’s licence has to be on it. A dependency that pulls in something copyleft, or something with no licence at all, fails the build. You find out the first time it enters the tree, not during a release scramble when someone finally reads the legal implications.

Advisories. It checks the RustSec advisory database, and yanked crates are set to deny, so a dependency that’s been pulled from the registry stops CI.

Bans. Wildcard version requirements (version = "*") are denied outright, because a dependency that floats to whatever’s newest is a supply-chain hole by construction. Duplicate versions of the same crate get surfaced too.

Sources. Crates may only come from the official registry. An unknown registry or a stray git dependency is denied. Nothing sneaks in from a URL.

That’s a gate. It encodes, as rules in a file, what the project will and won’t accept into its dependency tree, and it enforces them on every build instead of once an audit.

The honest part is the waiver list

Here’s the thing every real project runs into. Sooner or later there’s an advisory you genuinely can’t fix this week. It’s against a crate three levels down your tree. The fix needs an upstream release that hasn’t happened. The crate is scheduled to be reworked two milestones from now anyway. The gate is going to fail, and the work to satisfy it honestly isn’t available to you yet.

The lazy response is a blanket ignore: silence the advisory, move on, forget. Now your gate has a hole in it that nobody remembers opening.

rust-tool-base’s deny.toml does something better. Every waiver in the ignore list is a documented record. Each one carries a comment that names the crate, traces the exact dependency path that reaches it, gives the reason, and names the condition that lifts it:

ignore = [
 # `instant` - reached via async-openai -> backoff -> rtb-ai (v0.3).
 "RUSTSEC-2024-0384",
 # `paste` - reached via ratatui -> rtb-docs (v0.2) / rtb-tui (v0.4).
 "RUSTSEC-2024-0436",
 # ...
]

The file states the policy out loud: “Every waiver points at a deferred stub crate that will be reworked before its ship milestone. Lift each waiver when the owning crate lands its v0.1.”

Some waivers go further and carry a structured reason field, so the why travels with the entry rather than living only in a comment above it:

{ id = "RUSTSEC-2025-0140",
 reason = "gix-date via gix is a stub dependency; rtb-vcs v0.5 will upgrade" },

Read that list and you don’t see a project that quietly stopped caring about seven advisories. You see seven advisories the project knows about, can trace, and has tied to a specific milestone. The waiver has an expiry condition. When rtb-vcs reaches v0.5, that gix entry is meant to come out, and the comment is the reminder that it should.

Why this is the bit to copy

A gate that can’t be relaxed is a gate people route around. They’ll find the broadest possible ignore and use it, because the alternative is being blocked on someone else’s release. The pressure to do that is real, and it’s not unreasonable.

So the design that actually holds up isn’t a stricter gate. It’s a gate with an honest, structured escape hatch: you can waive an advisory, but a waiver costs you a documented record with a dependency path and an expiry condition. That price is small enough that nobody routes around it, and high enough that waivers don’t accumulate silently. The ignore list stays readable, and every line in it is something you could defend out loud.

Supply-chain hygiene framed this way isn’t an audit you survive once a year. It’s bookkeeping: a ledger of what you accepted, why, and when each exception is due to close. Which, now I write it down, is just the Boy Scout rule again, pointed at a dependency tree. Leave it tidier than you found it, and write down the bits you couldn’t tidy yet.

Where this leaves us

A vulnerability scanner answers one question on one day. cargo-deny is a standing policy gate: licences against an allowlist, advisories and yanked crates denied, wildcard versions banned, sources restricted to the official registry, enforced on every build.

The part of rust-tool-base’s setup worth copying is the waiver list. Every advisory that can’t be fixed yet is recorded with its crate, its dependency path, its reason and the milestone that removes it. A waiver is a dated note, not a shrug, and that’s what keeps the gate honest enough that nobody actually wants to bypass it.

A builder that won't compile if you forget a field

Sat, 25 Apr 2026 00:00:00 +0000

go-tool-base configures things with functional options, and if you forget a required one, the best case is a runtime failure and the worst case is an empty value sailing silently into everything downstream. Most builder patterns share the same hole. rust-tool-base closes it in a way I find genuinely delightful: the .build() method simply doesn’t exist until you’ve set every required field.

When is a required field actually required

Every framework has constructors with a mix of required and optional inputs. An Application in rust-tool-base needs tool metadata and a version. It optionally takes a custom config type, extra commands, feature toggles. The metadata needs a name and a summary; a description and a help channel are optional.

The interesting question is when “required” gets enforced. There are really only two moments available: when the program runs, or when it compiles. Most APIs pick the first without ever framing it as a choice.

How go-tool-base does it

go-tool-base uses functional options, the standard Go pattern:

tool := props.New(
 props.WithName("mytool"),
 props.WithVersion(version),
)

New takes a variadic list of options and applies them. It’s flexible and it reads well. But look at what the type actually says. New accepts zero or more options. The signature is satisfied by passing nothing at all. If WithName is required, nothing in the type system knows that. Forget it and the code compiles cleanly, and you find out when the program runs, or worse, when it doesn’t visibly fail but quietly carries an empty name into everything downstream.

A plain builder is no better here. builder.name("mytool").build() and builder.build() are both perfectly valid calls as far as the compiler is concerned. The builder hopes you set the name. It can check at the end and return an error, but that check still happens at runtime.

In every one of these the required-ness of a field is a fact that lives in documentation and in the author’s head, not in the code.

Typestate: putting “required” in the type

rust-tool-base builds these with bon, and the pattern it generates is a typestate builder. The idea is that the builder’s type changes as you call it, and that type tracks which required fields you’ve set so far.

let metadata = ToolMetadata::builder()
 .name("mytool")
 .summary("my CLI tool")
 .build();

ToolMetadata::builder() returns a builder in a state that records “name not set, summary not set”. Calling .name(...) consumes that builder and returns a different type, one whose state records “name set”. Calling .summary(...) does the same for the summary.

The part that matters is .build(). It isn’t a method on the builder in general. It only exists on the builder type that represents “every required field has been set”. So this:

let metadata = ToolMetadata::builder()
 .summary("my CLI tool")
 .build();

doesn’t compile. Not because a runtime check fired, but because in the state “name not set” there’s no .build() method to call in the first place. The compiler stops you, and the error points straight at the missing .name(...).

Optional fields stay optional. You can call .description(...) or skip it, and .build() is reachable either way, because the description was never part of the state that gates it. The required and the optional are genuinely different in the type, which is exactly the distinction the functional-options version could only keep in a comment.

Application::builder() works the same way. It won’t produce an Application until it has metadata and a version, and “won’t” there means the method is absent, not that a check returns Err.

Why the moment matters

Moving the check from run time to compile time changes who finds the mistake, and when.

A runtime check finds it when that code path executes, which might be in a test, might be in CI, might be on a user’s machine at the worst possible moment. A compile-time check finds it the instant you write it, in the editor, before anything has run at all. The same mistake, caught at the cheapest possible point instead of one of the expensive ones.

It also changes what the API documents about itself. A functional-options constructor can’t tell you, from its signature alone, which options you must pass. A typestate builder can, because the set of methods available to you at each step is the documentation. You literally cannot reach .build() without having been walked past every required field on the way.

This is one of those places where Rust’s type system earns its reputation. The builder isn’t more careful than the Go version. It’s that “this field is required” stopped being a convention and became something the compiler enforces. (Another entry, if you’re keeping score from the porting post, in the column of outcomes that survived while the Go mechanism got left behind.)

The short version

Required fields have to be enforced somewhere. Functional options and ordinary builders enforce them at runtime, if at all, because .build() is always callable and the type system never learns which inputs were mandatory.

rust-tool-base uses typestate builders generated by bon. The builder’s type changes as you set fields, and .build() only exists once every required field is present. Forgetting one is a compile error that names the missing call, not a runtime surprise. The required-versus-optional distinction stops being a comment and becomes part of the type.

Process isolation won't save you from the filesystem

Sat, 25 Apr 2026 00:00:00 +0000

A test that passed every single time I ran it on its own, and failed maybe one run in five when I ran the whole suite. The failure was always the same: the self-update test downloaded a release archive, went to extract it, and found the archive corrupt. Half-written. As if something had been scribbling in the file while it read it. Something had.

The comfort I was leaning on

The self-update tests are heavier than a unit test wants to be. They stand up a fake release, download the artefact, verify its checksum, extract it, swap a binary. Real files, real I/O. So they’d been built to run as separate processes, not just separate threads, each one its own little world.

And I’d quietly filed that under “solved”. Separate processes don’t share an address space. One can’t reach into another’s memory and corrupt a value mid-read. That whole category of data race, the kind you reach for a mutex to fix, simply can’t happen across a process boundary. So I’d stopped thinking about concurrency in these tests at all, because I’d convinced myself the isolation was total.

It wasn’t total. It was isolation of memory, and I’d let myself hear it as isolation of everything.

Two processes, one path

The thing two processes very much do still share is the filesystem. And the self-update flow, sensibly, caches its download rather than re-fetching it. The default cache directory is computed from the tool’s name and the release version, in crates/rtb-update/src/flow.rs:

pub fn cache_dir_for(tool_name: &str, version: &str) -> PathBuf {
 let base = directories::ProjectDirs::from("", "", tool_name)
 .map_or_else(std::env::temp_dir, |p| p.cache_dir().to_path_buf());
 base.join("update").join(version)
}

Read that with two parallel test processes in mind. They’re testing the same tool, against the same fake release tag. So tool_name matches and version matches, which means cache_dir_for hands both of them the identical path. Two processes, isolated in every way that involves memory, both downloading and extracting into one shared directory on disk, at the same time. One writes the archive while the other is partway through reading it, and you get exactly the corrupt half-written file the test kept tripping over.

Process isolation did nothing here, because the contention was never in memory. It was on a path string that came out the same for both of them.

Once it’s framed as “they share a path”, the fix writes itself: don’t share the path. Give each invocation its own cache directory. The updater builder already had the seam for it, and the doc comment now says exactly why it’s there, in crates/rtb-update/src/updater.rs:

/// Tools call this when they want isolation per-invocation
/// (e.g. CI runners, tests with parallel processes) or to honour
/// a user-supplied `--cache-dir` flag.
pub fn cache_dir(mut self, cache_dir: impl Into<PathBuf>) -> Self {
 self.cache_dir = Some(cache_dir.into());
 self
}

Each test now builds its updater with cache_dir(its_own_tempdir), so two parallel processes land on two different directories and never meet. No lock, no serialisation, no clever cross-process file mutex. Just the realisation that the shared thing was a directory, and the cure for shared mutable state is usually to stop sharing it, not to guard it.

The fix that turned out to be a feature

The part I’m quietly pleased about is that this didn’t stay a test-only hack. The override I needed to isolate the tests is exactly the override a real tool wants for its own reasons. A CI runner doing self-update wants a writable cache path it controls, not wherever directories-rs decides the system cache lives. A user might reasonably want to point the whole thing somewhere specific. That’s a --cache-dir flag, and cache_dir() is precisely the hook you’d wire it to.

So the thing I added to stop a flaky test is the same thing a downstream tool reaches for to expose --cache-dir. The test forced the seam to exist, and the seam was worth having anyway. I’ll take that trade every time over a fix that only the test suite benefits from.

What it comes down to

I’d treated “separate processes” as a synonym for “can’t race”, and it isn’t. Processes don’t share memory, so the memory races are gone. They absolutely still share the filesystem, the network, every named resource the OS will hand to anyone who asks for it by the same name. My two test processes computed the same cache path from the same tool and tag, and raced on the files in it, and no amount of address-space isolation was ever going to touch that.

Shared mutable state on disk is still shared mutable state. The fix wasn’t a bigger hammer, it was giving each process its own directory and letting the isolation I thought I already had actually be true.

Registering commands without life before main

Fri, 24 Apr 2026 00:00:00 +0000

I ended the last post promising to show how a Rust command registers itself when the language flatly refuses to run any of your code before main(). This is that post, and it’s a lovely example of reaching the same outcome by a completely different road.

The outcome I wanted to keep is self-registration.

What self-registration buys

A command in go-tool-base lives in its own file, and that file puts the command into the framework itself. There’s no central list of commands to keep in sync. You add a file, the command appears. You delete the file, it’s gone. Nothing else changes.

That property is worth protecting. The alternative, a hand-maintained registry that every new command has to be threaded into, is exactly the sort of central file that turns into a merge-conflict magnet and quietly falls out of date. So when go-tool-base moved to Rust, self-registration was firmly in the column of things that had to survive.

The way Go did it was not.

How Go does it

A Go package can declare an init() function, and the runtime guarantees every init() runs before main() starts. A go-tool-base command file uses this to append itself to a package-level slice:

func init() {
 registry.Register(&DeployCommand{})
}

By the time main() runs, every command file’s init() has already fired and the registry slice is populated. It’s a tidy trick, and it leans entirely on a Go feature: code that executes before main().

Rust doesn’t have that

Rust has no init(). There’s no language-blessed phase that runs your code before main(). This is a deliberate decision, not an oversight. Code running before main() across many files has no well-defined order, and a startup phase whose ordering you can’t see is a classic source of subtle, miserable bugs. Rust closed that door on purpose.

Which leaves a real question. If nothing runs before main(), how does a command file insert itself into a registry without a central list editing it in?

Distributed slices

The answer is a crate called linkme, and the mechanism is the linker rather than a runtime phase.

You declare a slice the framework will collect into:

#[distributed_slice]
pub static BUILTIN_COMMANDS: [fn() -> Box<dyn Command>];

(Box<dyn Command> is just “a pointer to some value that implements the Command trait, whichever concrete type it turns out to be”; the primer covers it if that’s unfamiliar.)

A command file then contributes one entry to it:

struct Greet;

impl Command for Greet { /* ... */ }

#[distributed_slice(BUILTIN_COMMANDS)]
fn register_greet() -> Box<dyn Command> {
 Box::new(Greet)
}

Here’s the part that makes it work. The #[distributed_slice] attribute doesn’t generate any code that runs at startup. It places each entry into a dedicated section of the compiled object file. When the linker builds the final binary, it gathers everything in that section and lays it out as one contiguous array. BUILTIN_COMMANDS is that array.

So by the time the program exists as a binary on disk, the registry is already assembled. main() doesn’t build it. No init() builds it. The linker built it, statically, as part of producing the executable. At runtime the framework iterates a slice that was complete before the process ever started.

What you get from it

The outcome is the one Go’s init() gave, and then a bit more.

A command still lives in one file and still self-registers. Adding a command is still adding a file. There’s still no central list.

But there’s no startup phase to reason about, because there isn’t one. There’s no global mutable slice being appended to as init()s fire, because nothing is appended at runtime; the slice is immutable and finished. There’s no ordering question, because the linker isn’t running your code, it’s collecting data. And it costs nothing at runtime: assembling the registry happened at link time, so program start just reads it.

It’s the same idea go-tool-base had, expressed by the tool Rust actually gives you. Go reaches the registry through a controlled phase before main(). Rust reaches it without any phase at all, because the linker did the assembly while the binary was still being built. Two roads, one destination… which, if you’ve been following along, is becoming the whole theme of the Rust side of this project.

In short

Self-registration, where a command file inserts itself into the framework with no central list, is a property worth keeping. go-tool-base achieves it with a package-level init(), leaning on Go’s guarantee that such functions run before main().

Rust has no equivalent and wants none, because code running before main() has no clear ordering. rust-tool-base uses linkme distributed slices instead: each command is placed into a dedicated linker section, and the linker assembles them into one contiguous, immutable slice as it builds the binary. The registry is complete before the program runs. Same outcome as Go’s init(), with no life before main required.

Verifying your own downloads: how I solved it for self-updating CLI tools

Fri, 24 Apr 2026 00:00:00 +0000

Way back in the introduction I promised I’d come back to the self-update integrity checks. Here we are. And the starting point is a slightly uncomfortable admission: for a good long while, go-tool-base’s update command was the most trusting line of code in the entire tool.

The most trusting line of code in the tool

Self-update is a lovely feature. The user runs yourtool update, the tool fetches the latest release, swaps itself out, and they’re current. go-tool-base has had this since early on, wired to GitHub, GitLab, Bitbucket, Gitea and a few others.

But look closely at what that feature actually does. It reaches out to the internet, pulls down a file, and then replaces the executable that’s currently running with that file. The next time the user invokes the tool, they’re running whatever those bytes turned out to be.

The original implementation downloaded the release asset over HTTPS and extracted it. HTTPS gets you transport security: the bytes weren’t tampered with in flight. It tells you nothing about whether the bytes were right when they left, or whether they’re even the bytes you meant to fetch. A truncated download, a CDN cache serving a mangled object, a release asset that got swapped after the fact… HTTPS waves all of those straight through. For the one operation in the whole tool that replaces the binary, “we didn’t check” is an uncomfortable place to be sitting.

GoReleaser already does half the job

The good news is that the build side was already producing exactly what I needed. GoReleaser, which builds go-tool-base’s releases, generates a checksums.txt for every release: one SHA-256 per published artefact, the same format sha256sum emits. It was sitting right there as a release asset and nothing was reading it.

So Phase 1 of the integrity work is exactly that: read it.

When update downloads the platform binary, it now also fetches checksums.txt from the same release, looks up the entry for the asset it just pulled, and compares the SHA-256 of the downloaded bytes against the expected hash before anything gets extracted or installed. Mismatch, and the update aborts before it has so much as touched the installed binary. The hash comparison runs in constant time, which is more defence-in-depth than strictly necessary here, but it costs nothing and means every hash comparison in the codebase is the same and reassuringly audit-boring.

Fail open, or fail closed?

The interesting design question wasn’t the hashing. It was: what do you do when there is no checksums.txt?

Plenty of older releases predate this feature. A release might have been cut by hand without GoReleaser. If go-tool-base flatly refused to update whenever a manifest was missing, the very act of shipping this feature would brick the update path for every existing tool the moment they upgraded into it. That’s a cure worse than the disease.

So the default is fail-open: no manifest, log a clear warning, proceed. It matches how the existing offline-update path already behaved with its optional .sha256 sidecar, and it keeps upgrades working.

Fail-open as a default is not the same as fail-open being right for everyone, though. A security-sensitive tool should be able to say “no manifest, no update, full stop”. Two ways to get there:

Tool authors flip a compile-time switch (setup.DefaultRequireChecksum = true in main()) and their binary ships fail-closed from day one.
End users override either way through config (update.require_checksum) or an environment variable.

go-tool-base itself ships with the strict setting turned on, because a tool whose entire job is being a careful framework should hold itself to the stricter bar.

The caveat

Security features oversell themselves constantly, so here is the limit, stated plainly.

A checksum hosted next to the binary it describes protects you from accidents. Corruption, truncation, a CDN serving stale junk, a release asset that got partially clobbered. It does not protect you from a determined attacker who’s compromised the release platform itself. If someone can replace the binary, they can replace checksums.txt in the same breath, and your tool will cheerfully verify a malicious download against a malicious manifest and pronounce it good.

That’s not a flaw in the implementation. It’s the inherent ceiling of same-origin integrity: the manifest and the artefact share a trust root, so they fall together. Closing that gap needs a signature whose trust root is somewhere the release platform can’t reach, a key the attacker doesn’t have. That’s the next phase of this work, and it’s a bigger piece: GPG-signing the manifest, with the public half both embedded in the binary and published independently so a single platform compromise isn’t enough.

Phase 1 is the floor, not the ceiling. But it’s a floor worth having, because the overwhelming majority of real-world “the download was wrong” incidents are accidents, not attacks, and accidents are exactly what a same-origin checksum catches.

Pulling it together

The update command is the most trusting code in a self-updating tool: it fetches bytes from the internet and then becomes them. go-tool-base now verifies the SHA-256 of every self-update download against the release’s own checksums.txt before installing. It fails open by default so shipping the feature doesn’t strand anyone on an un-updatable version, fails closed for tool authors who ask (go-tool-base itself does), and stays honest that a same-origin checksum stops accidents, not a platform compromise.

Verifying your own downloads is a low bar. The point is that the previous height of that bar was zero.

Two API decisions that quietly contradict each other

Thu, 23 Apr 2026 00:00:00 +0000

Two design decisions on one enum, each sensible on its own, that would have quietly fought each other if I’d let them. I didn’t, but only because the second one is easy to get wrong and the compiler wouldn’t have said a word either way.

Decision one: promise the list can grow

#[non_exhaustive] on the Feature enum. It tells downstream code it can’t match the enum exhaustively, so it has to keep a wildcard arm, which in turn means adding a variant later is a non-breaking, minor-version change. Nobody’s match stops compiling just because the enum grew. The doc comment says exactly that: it “keeps variant additions a minor-version change for downstream matchers.”

Decision two: hand out the whole list

A convenience all() returning every variant, because iterating over the lot is something you genuinely want to do. The tempting signature is a fixed-size array, [Feature; 11]: you know precisely how many there are, so why not put it in the type?

Why those two can’t both be true

The catch is a quirk of Rust that often trips up people arriving from other languages: the length of a fixed-size array is part of its type. [Feature; 11], an array of exactly eleven features, and [Feature; 12], exactly twelve, are not one type holding a different number of items the way they might be elsewhere. They are two genuinely different, incompatible types, about as interchangeable as i32 and i64.

So the moment you add a twelfth variant, a fixed-size all() forces an unhappy choice, and both options are bad. Bump the array to [Feature; 12] and you break every caller who wrote the old length down. Leave it at 11 and the new variant is silently dropped, leaving you a function called all that doesn’t return all of them. Either way the #[non_exhaustive] promise (adding a variant breaks nobody) is quietly cancelled by a return type that welded today’s count into the public API.

So `all()` returns a slice

Which is exactly what it does, and the doc comment spells out why, in crates/rtb-app/src/features.rs:

#[non_exhaustive]
pub enum Feature {
 Init, Version, Update, Docs, Mcp, Doctor,
 Ai, Telemetry, Config, Changelog, Credentials,
}

pub const fn all() -> &'static [Self] {
 &[Self::Init, Self::Version, Self::Update, /* ...the rest... */]
}

A slice length is a value, not part of the type. Add a variant, the slice gets one longer, and not a single downstream signature changes. The promise holds.

The thing to watch for

#[non_exhaustive] is a promise about the future. A fixed-size array is a fact about the present. You can’t keep both at once, and nothing will warn you that you’ve contradicted yourself, because each decision is individually fine. The trap is always the second API surface that quietly re-bakes the flexibility the first one promised. When you mark a type “free to grow,” go and check that nothing in its public interface has secretly written down how big it is today.

What survives a port, and what doesn't

Thu, 23 Apr 2026 00:00:00 +0000

Rebuilding go-tool-base in Rust turned out to be the most honest design review I’ve ever sat through, and I didn’t have to do anything except keep going. Porting a framework into a language with completely different idioms forces a separation you can’t fake: the parts that survive the move are design, and the parts that don’t are just habit.

Two columns

When you port a system between languages that don’t share idioms, every piece of it sorts itself into one of two columns, without you having to make the call.

In the first column is the outcome a piece of the design produces: every command receives the framework’s services, configuration is layered with a fixed precedence, commands register themselves, errors carry guidance to the user. In the second column is the mechanism that produced that outcome in the original language.

Things in the first column survive the port. You rebuild them, differently, because the tool genuinely needs them. Things in the second column do not survive. You find their replacement, and the Go version turns out to have been one valid implementation of an idea, not the idea itself. Doing this for go-tool-base, mechanism by mechanism, was more honest about my own design than any amount of sitting and staring at it would have been.

The container

go-tool-base hands every command a Props struct. It carries the logger, the config, the assets, the filesystem handle. Some of it is reached through loosely-typed accessors. It works well, and I wrote a whole post about it.

The outcome is column one: a command should receive one object, and that object should carry the framework’s services so the command doesn’t go assembling them itself. That survived. RTB hands every command an App.

The loosely-typed accessors were column two. In Rust an App is a plain struct with concrete fields, each one an Arc<T> so a clone is a few atomic increments rather than a deep copy. Nothing is keyed by string. Nothing is fetched by name and asserted to a type. The thing the container is for survived; the way Go expressed it did not.

Registration

A go-tool-base command self-registers using a package-level init() function, which Go runs before main() and which appends the command to a global slice.

The outcome, column one, is that a command lives in its own file and inserts itself into the framework with no central list to edit. That’s genuinely worth keeping.

The init() mechanism is column two, and Rust doesn’t even offer it: Rust deliberately has no code that runs before main(). The replacement is link-time registration through distributed slices, which gets its own post next. Same outcome, no global mutable state, assembled by the linker rather than by a startup phase.

Configuration

go-tool-base layers configuration with a precedence: flags over environment over file over defaults. Some of it is read back through key lookups.

The layering and the precedence are column one. They survived exactly. RTB layers config with the same ordering.

The key lookups were column two. In Rust the merged configuration is deserialised into your own serde struct, so a config value is a typed field you access like any other field, and a typo is a compile error instead of a missing key at runtime. The precedence survived; reading values back out of a string-keyed bag did not.

The error path

go-tool-base routes every error through one handler so presentation is consistent, which I also wrote up.

One consistent exit for errors is column one. It survived. What didn’t survive was the handler: RTB has no error-handler object at all, because Rust’s own return-from-main convention plus a report hook does the job the handler was built to do. That one has its own post too.

What the exercise was actually worth

Every mechanism told the same story. The container, the registration, the config access, the error path, the cancellation signal that go-tool-base carries on a context.Context and RTB carries on a CancellationToken. In every case the thing it achieved walked across to Rust untouched, and the Go code that achieved it was left behind.

That’s the useful result. Before this port I couldn’t have told you, for any given pattern in go-tool-base, whether it was load-bearing design or just the idiomatic Go way to write it that day. Now I can, because each one was forced to prove itself by being rebuilt from nothing in a language that flatly wouldn’t accept the original. Whatever survived was real. Whatever I had to replace was always replaceable, which means it was never really the point.

The upshot

Porting a framework into a language with different idioms separates design from habit for free. The outcome a pattern produces is design, and it survives the move. The mechanism that produced it is idiom, and it gets left behind for the new language’s equivalent.

go-tool-base’s Props bag, its init() registration, its key-based config access and its error handler were all idiom. The single context object, self-registration, layered precedence and a consistent error exit were all design, and all four came through to RTB intact. The next three posts take the most interesting replacements one at a time, starting with how a Rust command registers itself when the language won’t run anything before main.

The blank import that keeps a dependency out of your binary

Wed, 22 Apr 2026 00:00:00 +0000

go-tool-base can stash your credentials in the OS keychain, which most people building on it are perfectly happy about. But some of them ship into regulated and air-gapped environments where the binary isn’t permitted to contain keychain or session-bus code at all… not dormant, not unused, simply not there.

So I had a feature most users want and a minority must be able to provably not have. The way I ended up solving it is one of my favourite little bits of honest Go.

A feature some users have to be able to not have

go-tool-base needs somewhere to keep secrets: AI provider keys, VCS tokens, the occasional app password. The best home for those on a developer’s machine is the operating system’s own keychain. macOS Keychain, GNOME Keyring or KWallet on Linux via the Secret Service, Windows Credential Manager. So I wanted go-tool-base to support all three. (This is the keychain mode I mentioned back in the credentials post, finally getting the explanation I promised it.)

The Go library for that is go-keyring, and it’s good. The catch is what it drags in behind it. On Linux it talks to the Secret Service over D-Bus, which means godbus. On Windows it pulls wincred. Perfectly reasonable dependencies for a desktop tool.

Now here’s the constraint that made this interesting. Some of the people building tools on go-tool-base don’t ship to developer laptops. They ship into regulated sectors and air-gapped deployments where a security review will scan the binary, enumerate every dependency, and ask pointed questions about anything that does inter-process communication. For those builds, “the keychain code is there but we never call it” is not an acceptable answer. The reviewer’s position, and it’s a fair one, is that code which isn’t in the binary cannot be a finding.

So I had a feature that most users want, and a minority of users must be able to provably not have. Same framework, same release.

Why I didn’t reach for a build tag

The obvious Go answer is a build tag. Compile with -tags keychain to get it, leave the tag off to not. I started down that road. I even spent a while on an inverted version, a nokeychain tag, on the theory that the regulated build should be the one that has to ask, so a forgotten flag fails safe.

It works. It also isn’t very nice. Build tags are invisible at the call site. Nothing in the source tells you that a file only exists in some builds. The two worlds drift, because the tagged-out path isn’t compiled in your normal editor session and quietly rots. And the ergonomics for a downstream consumer are poor: every tool built on go-tool-base would have to know the right magic incantation and thread it through their own release pipeline correctly, forever.

I tried a second approach too: pull the keychain backend out into a completely separate Go module. That genuinely solves the dependency question (a module you don’t require can’t contribute to your go.sum). But a separate module for one backend is clunky. Separate versioning, separate release, separate repo, all for a single file’s worth of behaviour. It felt like using a shipping container to post a letter.

The shape that actually fits: a registry and an `init()`

The version I’m happy with leans on two boring, well-worn Go mechanisms and lets them do something quietly clever together.

First, pkg/credentials defines a Backend interface and a registry. By default the registry holds a stub backend that politely returns “unsupported” for everything. The framework only ever talks to the registered backend, whatever that happens to be.

Second, the keychain implementation lives in its own package, pkg/credentials/keychain, still inside the same module, no separate release to manage. That package has an init() that registers its go-keyring-backed backend:

//nolint:gochecknoinits // registration via import is the whole point
func init() {
 credentials.RegisterBackend(Backend{})
}

And go-keyring, godbus, wincred, the whole IPC dependency chain, are only imported by that package.

Now the trick. To switch keychain support on, you import the package. You don’t have to use anything from it. A blank import is enough, because a blank import still runs the package’s init():

// cmd/gtb/keychain.go - the entire file.
package main

import _ "gitlab.com/phpboyscout/go-tool-base/pkg/credentials/keychain"

That single line is the on/off switch for the shipped gtb binary. The blank import means init() runs, the keychain backend registers itself, and credential operations start routing through the OS keychain. No flag, no tag, no config.

The part that makes it provable

Here’s why this beats the build tag, and it comes down to one guarantee in the Go toolchain: the linker only includes packages that are actually imported.

If cmd/gtb/keychain.go exists, the keychain package is in the import graph, so go-keyring, godbus and wincred are linked in. Delete that one file and rebuild, and the keychain package is no longer reachable from main. The linker performs dead-code elimination, and the entire go-keyring chain is gone. Not dormant. Not present-but-unused. Absent from the binary.

That’s the bit a regulated build needs. It isn’t a promise that the code won’t run. It’s a structural fact that the code isn’t there, and you can hand a security reviewer an SBOM that proves it. go-keyring won’t appear, because it genuinely isn’t linked.

For a downstream tool built on go-tool-base the story is the same, and just as cheap. Want keychain support? Add the one-line blank import to your own cmd package. Must ship keychain-free? Don’t. Your binary’s dependency graph follows your import graph, exactly as Go always promised it would. The default (no import) is the locked-down one, which is the right way round for a safety property.

Why I like this more than I expected to

Build tags hide a decision in the compiler invocation. This pattern puts the decision in the source, as an import, where it’s greppable, obvious in code review, and impossible to get subtly wrong. There’s a real file called keychain.go whose entire content is one import, and it reads as exactly what it is: a switch.

It’s also just honest Go. No reflection, no plugin loader, no clever runtime. A registry, an init(), and the linker doing the one job it’s always done. The cleverness, such as it is, is in the arrangement, not in any individual piece.

Stepping back

go-tool-base needed OS keychain support for the many, and a way to provably exclude it for the few. Build tags could express the toggle but hid it in the build invocation and rotted in the dark. A separate module solved the dependency question but was far too much machinery for one backend.

Putting the keychain backend in its own package, activated by a blank import _ that fires its init(), gets you both: a one-line, in-source, code-reviewable switch, and, because the linker only links what’s imported, a build with the import omitted that contains none of the keychain dependency chain. Provable absence, not promised disuse.

If you’re carrying an optional dependency that some of your users need gone rather than merely idle, this is the pattern. Let the import graph be the feature flag.

Just enough Rust to follow along

Tue, 21 Apr 2026 00:00:00 +0000

I’m about to write a run of posts about building rust-tool-base, and they lean on a handful of Rust ideas that I’d otherwise have to keep stopping to explain. So here they are, up front, in one place. You don’t need to write Rust to follow the series. You need a feel for maybe six concepts, and this is a quick, friendly tour of them. If you already write Rust, skip it with my blessing.

Ownership and borrowing

This is the one everybody mentions, and the one the whole language is built around. Every value in Rust has exactly one owner, and when the owner goes away, the value is cleaned up. No garbage collector deciding when, no manual free. If you want to let another piece of code use a value without handing over ownership, you borrow it: &thing lends it out for reading, &mut thing for writing, and the compiler enforces that you can’t, say, change something while someone else is reading it.

The payoff, and the reason people put up with the up-front fuss, is that an entire family of bug (use-after-free, data races, dangling pointers) becomes a compile error rather than a 3am one. When a post says something “moves” or is “borrowed”, that’s all this is.

Traits are Rust’s interfaces

A trait is a named set of methods a type can promise to provide, exactly like an interface in Go or Java. impl Command for Greet { ... } reads as “the Greet type fulfils the Command contract.”

Two bits of syntax show up a lot. dyn Command means “some value whose concrete type I don’t know, but which implements Command”, decided at runtime. And because the compiler needs a known size, you usually see it wrapped: Box<dyn Command> is “a pointer to some Command, whatever it turns out to be.” Whenever the series talks about a registry of Box<dyn Something>, it just means a list of different types that all satisfy the same trait.

Enums, `match`, and `#[non_exhaustive]`

A Rust enum is more than a list of named numbers; it’s a proper “one of these” type, and each variant can carry its own data. You handle one with match, which is like a switch that the compiler forces you to make complete: miss a case and it won’t build.

That completeness is usually a gift, but it’s awkward for a library, because adding a new variant would break everyone’s match. The fix is the attribute #[non_exhaustive]: it tells code outside the library “you must keep a catch-all _ => arm, because I reserve the right to add variants later.” With that in place, growing the enum is a non-breaking change. (One whole post turns on a subtle way to accidentally cancel that promise.)

The type system carries facts, not just shapes

Here’s an idea that surprises people coming from other languages: a Rust type often encodes more than “this is a number” or “this is a list.” The size of a fixed array is part of its type, so [Feature; 11] and [Feature; 12] are genuinely different, incompatible types, not one type holding a different count.

Pushed further, you can make the type track state. A “typestate” builder changes type as you call it, so .build() literally doesn’t exist as a method until every required field has been set, and forgetting one is a compile error rather than a runtime surprise. When a post says the compiler “won’t let you” do something, this is usually how: the mistake was made unrepresentable in the types.

`Result` and the `?` operator

Rust has no exceptions. A function that can fail returns a Result<T, E>: either Ok(value) or Err(problem), and you can’t use the value without acknowledging the error case. Writing that check by hand everywhere would be miserable, so there’s a shorthand: the ? operator. let x = thing()?; means “if this returned an error, return it up to my caller right now; otherwise give me the value.” Errors travel up the call stack as ordinary return values until something handles them, or until they fall out of main.

Crates, the workspace, and features

A crate is Rust’s unit of compilation, roughly “a library or binary.” A workspace is a bundle of crates built together, which is how rust-tool-base is laid out: rtb-app, rtb-cli, rtb-config and so on, each its own crate. And Cargo features are compile-time switches declared in Cargo.toml: turn a feature off and the code it guards, and any dependency it pulled in, is never compiled into your binary at all. Not disabled at runtime; simply absent. That distinction does real work in one of the posts.

That’s the toolkit

Ownership and borrowing, traits and dyn, enums and match and #[non_exhaustive], types that carry facts, Result and ?, and crates with features. Six ideas, and they’re enough to read everything else in this series without tripping over the language itself. Where a post needs a seventh thing, it’ll explain it in passing. Now, on with the actual building.

Where should a CLI keep your API keys?

Mon, 20 Apr 2026 00:00:00 +0000

Your CLI tool needs the user’s API key. It has to come from somewhere, and it has to survive between runs, so the obvious move is to ask once and write it into the config file. One tidy api_key: line. Job done.

It works beautifully on the first afternoon. And then, months later, it’s quietly become a liability nobody actually decided to create.

The config file that quietly becomes a liability

Your CLI tool needs the user’s API key. It has to come from somewhere, and it has to survive between invocations, so the obvious move is to ask once and write it into the tool’s config file. ~/.config/yourtool/config.yaml, a nice api_key: line, done.

It works on the first afternoon. It keeps working. And then, slowly, it becomes a problem nobody decided to create.

The config file gets committed to a dotfiles repo. It gets caught in a tar of someone’s home directory that lands in a backup bucket. It scrolls past in a screen share. It sits, world-readable, on a shared build box. None of these are exotic. They’re just a Tuesday. The plaintext key was fine right up until the file went somewhere the key shouldn’t, and config files go places.

I didn’t want go-tool-base handing every tool built on it that same slow-motion liability by default. So credential handling got rebuilt around a simple idea: the config file should usually hold a reference to the secret, not the secret itself.

Three modes, and which one you get

go-tool-base supports three ways to store a credential.

Environment-variable reference, the default. The config records the name of an environment variable, not its value:

anthropic:
 api:
 env: ANTHROPIC_API_KEY

The secret itself lives in your shell profile, your direnv setup, or your CI platform’s secret store, wherever you already keep that sort of thing. The config file now contains nothing sensitive at all. You can commit it, back it up, paste it into a bug report. The reference is inert on its own.

OS keychain, opt-in. The config holds a <service>/<account> reference and the actual secret goes into the operating system’s keychain: macOS Keychain, GNOME Keyring or KWallet via the Secret Service, Windows Credential Manager.

anthropic:
 api:
 keychain: mytool/anthropic.api

This one is opt-in by design, because the keychain backend carries dependencies that some deployments simply aren’t allowed to ship. (That opt-in mechanism turned out to be an interesting little problem all of its own, and it gets its own post in a couple of days.)

Literal value, legacy and grudging. The old behaviour. The secret sits in the config in plaintext:

anthropic:
 api:
 key: sk-ant-...

It still works, because breaking every existing tool’s config on an upgrade would be its own kind of vandalism. But it’s the last resort, it’s documented as the last resort, and the setup wizard puts a warning in front of you when you pick it.

The one place literal mode is not allowed

There’s a single hard “no” in all of this. If go-tool-base detects it’s running in CI (CI=true, which every major CI platform sets) the setup flow will refuse to write a literal credential, and exits non-zero.

The reasoning is that a plaintext secret written during a CI run is a plaintext secret written onto an ephemeral, often shared, frequently-logged machine, by an automated process that no human is watching. That’s the exact situation where the slow-motion liability becomes a fast one. CI environments inject secrets as environment variables already; there’s no good reason for a tool to be writing one to disk there, so go-tool-base simply won’t.

How it decides at runtime

A credential can be configured more than one way at once. You might have an env reference and an old literal key still lurking. So resolution follows a fixed precedence, highest to lowest:

The *.env reference. If that env var is set, use it.
Otherwise the *.keychain reference. If a keychain entry resolves, use it.
Otherwise the literal *.key / *.value, the legacy path.
Otherwise a well-known fallback env var (ANTHROPIC_API_KEY and friends), so a tool still picks up the ecosystem-standard variable with no config at all.

The useful property here is that adding a more secure mode transparently wins. Drop an env reference next to an old literal key and the next run uses the env var. You can migrate a credential to a better home without first removing it from its worse one, which makes the migration safe to do incrementally instead of as one nervous big-bang edit.

The tool tells on itself

A precedence rule is no use if nobody knows their config still has a plaintext key three layers down. So the built-in doctor command grew a check for exactly that. Run doctor, and if any literal credential is sitting in your config it reports a warning, names the offending keys (the key names, never the values) and points you at how to migrate.

It’s not an error. Literal mode is still legal. But the tool will quietly keep reminding you that you left the campsite messier than you could have, until you go and tidy it. (Old Scout habits die hard, and they’ve leaked all the way into the framework.)

The gist

A CLI tool that writes your API key into a plaintext config file isn’t doing anything wrong, exactly. It’s just handing you a liability that activates later, when the file travels somewhere the key shouldn’t. go-tool-base’s answer is three storage modes: an env-var reference by default, the OS keychain on request, and a plaintext literal only as a documented last resort that CI environments can’t use at all. Runtime resolution runs in a fixed precedence so a more secure mode always wins, which makes migrating a credential safe to do gradually. And doctor keeps an eye on the config so a stray plaintext secret doesn’t get to hide forever.

The secret should live in a secret store. The config file should just know its name.

A configurable AI endpoint is an attack surface

Sun, 19 Apr 2026 00:00:00 +0000

“Let users point at their own AI endpoint” is one of those config options that looks completely harmless on the way in. People want it, for perfectly good reasons. Then you sit with it for a minute and realise you’ve handed every user a loaded gun and pointed it vaguely at their own API key.

Why you offer it at all

There are real reasons to let someone set a custom base URL. They’re running a local model and want localhost:11434. They’re behind a corporate proxy that fronts the real provider. They’re on Azure’s flavour of OpenAI, which lives at a different host. They’ve a self-hosted gateway doing rate-limiting. All reasonable, all things a framework should support rather than fight.

The bit that’s a loaded gun

Here’s what the config option quietly decides: the base URL is where your credential goes. The API key rides along in an Authorization header on every request, to whatever host that URL resolves to. So the moment the endpoint is user-configurable, the destination of your secret is user-configurable too.

And users do user things. They paste a URL from a gist that turned out to be a honeypot. They leave http:// on the front, so the key crosses the wire in plaintext. They copy https://user:token@host/v1 not realising the userinfo changes who they actually authenticate to. They never edit the https://api.example.com/v1 placeholder and wonder why the key’s been posted to a domain they don’t own. None of that is malice. It’s what happens when the destination of a secret is a free-text field.

Validate before the first byte leaves

So every chat.New routes through ValidateBaseURL before the provider is built. The threat model is written at the top of pkg/chat/baseurl.go: an operator who can influence config could “redirect chat-provider traffic to an attacker-controlled HTTPS host and capture the Authorization header.” The checks run cheapest-first: a length cap, no ASCII control characters, must parse, no userinfo, https only, a host must be present, and the host mustn’t be a placeholder.

The userinfo rule is the sharp one:

if parsed.User != nil {
	// Reject any userinfo, with or without password. Never log
	// the URL itself because it contains the credential.
	return errors.WithHint(ErrInvalidBaseURL,
		"base URL must not contain credentials; use the Token field instead")
}

The placeholder check rejects example.com and friends and any subdomain of them, so the unedited https://api.example.com/v1 from a setup wizard never reaches the wire and hits some typosquatted lookalike. And the HTTP escape hatch is test-only by construction: the AllowInsecureBaseURL field that permits plain http is tagged json:"-", so a config file physically cannot set it. This all came out of the 2026-04-17 security audit, finding M-3.

rust-tool-base enforces the same at its own boundary: validate_base_url rejects userinfo, any scheme but https (bar a test-only allow_insecure), and documentation placeholder hosts like example.com.

What it can and can’t do

It won’t stop a user who deliberately points the tool at a malicious HTTPS host they genuinely chose. If someone is set on sending their own key somewhere bad, validation can’t read their mind.

What it stops is the accidents: the plaintext slip, the userinfo confusion, the placeholder nobody changed. Those aren’t theoretical, they’re the ones that happen to careful people on ordinary days. Storing the key well is one job (where a CLI keeps it), stopping it leaking through a log is another, and this is the third side of the triangle: once you’ve stored it and stopped it leaking, make sure you don’t send it somewhere daft.

Redacting the secret you didn't know was in the string

Sat, 18 Apr 2026 00:00:00 +0000

Dammit! How did that get there?

A log line that should never have existed. Not a password I’d carelessly printed, nothing as obvious as that. An upstream API handed me back an error, and it had quoted my own bearer token inside the message, and that error went straight into the logs the way errors do. I didn’t put the secret there. The error did. And I’d never have caught it by being careful, because being careful only protects you from the secrets you know you’re handling.

The easy half of redaction

Hiding the secrets you know about is the part everyone does. You’ve got an API key field, a password flag, so you mask them at the point you print them. key=****. Done, and it feels like you’ve solved redaction, when really you’ve solved the half that was never going to bite you.

The half that bites

The secrets that escape are the ones that arrive inside strings you don’t control. An upstream service echoes your token back in a 401 body. A connection string with the password in the userinfo, https://user:pass@host, lands in a debug line. A library stringifies a whole request, headers and all, for a “helpful” trace. You cannot field-mask a secret you didn’t know was in the string, because you never watched it go in.

You can’t register a value you never had, so match the shape

This is the bit I got wrong in my own head at first. I assumed redaction meant handing it the secrets I was holding so it could watch for them. But the dangerous secrets are exactly the ones I’m not holding a copy of. So pkg/redact doesn’t keep a registry of your values at all. It knows what secrets look like.

pkg/redact/redact.go carries a set of RE2 patterns: a credential in URL userinfo, an Authorization: header sitting in free text, query-string credentials, and the well-known provider prefixes:

prefixPatterns = []*regexp.Regexp{
	regexp.MustCompile(`sk-[A-Za-z0-9_\-]{16,}`), // OpenAI / Anthropic-style
	regexp.MustCompile(`ghp_[A-Za-z0-9]{30,}`), // GitHub PAT classic
	regexp.MustCompile(`github_pat_[A-Za-z0-9_]{30,}`), // GitHub fine-grained PAT
	regexp.MustCompile(`xox[baprs]-[A-Za-z0-9-]{10,}`), // Slack
	regexp.MustCompile(`AIza[A-Za-z0-9_\-]{30,}`), // Google API key
	regexp.MustCompile(`AKIA[A-Z0-9]{16}`), // AWS access key ID
}

Run any string through redact.String and an OpenAI key, a GitHub token or an AWS access key ID gets caught wherever it’s hiding, in an error you didn’t write, in a URL, in a stack trace, because each has a recognisable shape. For the secrets that don’t announce themselves with a prefix there’s a fuzzy fallback: any opaque alphanumeric run of 41 characters or more. The 41 is chosen on purpose, to clear UUIDs (36), MD5 (32) and git SHA-1 (40) without flagging them, while accepting that a SHA-256 (64) will trip it. A deliberate, documented trade rather than a magic number.

Where it runs

At the boundary where a string leaves for somewhere you can’t reach back into. The telemetry backend runs every event argument and error message through redact.String before it emits anything (pkg/telemetry/telemetry.go), and both telemetry and HTTP logging drop the value of any header redact flags as sensitive. It doesn’t matter which code path produced the string, or whether you even wrote that path; everything goes through the same gate and gets the same scrub.

rust-tool-base’s rtb-redact crate takes the same shape-matching approach: regex patterns, the same family of well-known provider prefixes, and an is_sensitive_header check for header values.

A realistic limit

It isn’t a force field. A secret with no recognisable shape, shorter than the fallback threshold, will sail through. You cannot redact what you cannot recognise. But the leak that actually keeps happening isn’t some exotic unknown, it’s a well-known token turning up in a place you didn’t expect, and a shape-matcher sitting at the edge catches exactly that, including secrets you never told it about. Which is the one thing registering your own values could never have done. Storing the key safely is a separate job, where a CLI keeps it; this is about making sure that, having stored it, it doesn’t quietly fall out through a log.

I had the framework audited: every finding was the same shape

Fri, 17 Apr 2026 00:00:00 +0000

When a real security audit lands back in your inbox, the temptation is to read it as a shopping list of unrelated mistakes. Fix one, fix the next, tick them off, move on. I did exactly that the first time. The second time, I noticed something far more useful: the findings weren’t scattered at all. They clustered. Almost every one was the same sentence with the nouns swapped out.

Findings cluster, they don’t scatter

When you get a real security audit back, the instinct is to read it as a list of unrelated mistakes. Finding 1, unrelated to Finding 2, unrelated to Finding 3. Triage each, fix each, move on.

That’s not what the go-tool-base audits looked like once I stopped reading them as a list. The findings clustered. Strip away the specifics and almost every one was the same sentence with the nouns swapped: untrusted input reaches a powerful operation, and nothing checks it in between.

That reframe is worth more than any individual fix, because it turns “we patched some bugs” into “we know where to look next time”. A framework’s attack surface isn’t spread evenly. It’s concentrated at the boundaries: the handful of points where data from outside (a config file, a command-line flag, something typed into a TUI, an HTTP response) flows into machinery that can be made to misbehave. Audit the boundaries and you’ve audited most of the risk. Three examples make the pattern obvious.

Boundary one: a regex compiler

Somewhere in the tool, a user-supplied string gets compiled into a regular expression. A search pattern typed into the docs browser, a filter from a config file. Feeding user input to regexp.Compile feels harmless. It’s just pattern matching, after all.

It isn’t quite harmless. A regular expression is a tiny program, and some tiny programs are catastrophically slow. A pattern with the wrong kind of nested repetition can take exponential time to evaluate against a modestly hostile input. That’s the class of bug known as ReDoS. A user, or something feeding the user’s config, hands you a pathological pattern and your tool wedges, burning a whole core, on what looked for all the world like a search box.

The fix isn’t to ban user-supplied regexes. It’s to stop treating “compile this string” as free. go-tool-base routes any regex whose pattern came from outside the binary through a regexutil.CompileBounded helper. It caps the pattern length and puts a hard timeout on compilation. A pattern known at build time can still use plain regexp.MustCompile, because that isn’t a boundary, it’s a constant. The discipline only applies where the input genuinely crosses in.

Boundary two: a URL opener

The tool needs to open a URL in the user’s browser, a docs link or an OAuth flow. Under the hood that’s the OS handler: xdg-open, or open, or rundll32.

Now ask where the URL came from. If any part of it is influenced by config, by a server response, by user input, then “open this URL” has quietly become “ask the operating system to do something with an attacker-influenced string”. A file:// URL. A javascript: URL. Something with control characters smuggled into it. The browser-open was never the dangerous part. The unvalidated string was.

So go-tool-base funnels every URL-open through one package, pkg/browser, and that package is a gate. It enforces an allowlist of schemes (https, http, mailto, and nothing else), bounds the length, and rejects control characters before the OS ever sees the string. The rule that makes it stick is that nothing else is allowed to call the OS handler directly. One door, and the door has a lock. A scattered capability with no chokepoint can’t be secured; a capability that has a chokepoint can. (You’ll have spotted the “one door out” idea by now… it’s the same instinct as the single error handler, pointed at security instead of consistency.)

Boundary three: a log sink

This one’s the sneakiest, because it runs the wrong way round. The first two boundaries are about dangerous input coming in. This one is about sensitive data leaking out.

The tool handles credentials. It also logs, emits telemetry, and reports errors, and all three of those are exit boundaries: places where strings leave the process for somewhere more persistent and more public, like a log aggregator, an analytics backend, an error tracker. If a token ever ends up in a string that flows to one of those, you haven’t logged an event, you’ve published a secret.

The defence is pkg/redact. Any free-form string heading for an observability surface goes through it first, and it strips the usual suspects: credentials in URL userinfo, sensitive query parameters, Authorization headers, the well-known provider key prefixes (sk-, ghp_, AIza and friends), long opaque tokens. The places most likely to leak, command arguments and error messages in telemetry, get it applied automatically rather than relying on every caller to remember.

Same pattern as the other two. A boundary, and something standing on it checking what goes through.

The grunt work

None of these fixes is clever. There’s no exploit demo, no neat trick to show off. Bound a length. Check a scheme against an allowlist. Run a string through a redactor. The work was almost entirely in noticing the boundary existed, and then making sure everything routes through the one checked path instead of dotting raw calls all over the codebase.

That’s the actual lesson of a security audit, and it’s why the cluster reframe matters. The value wasn’t the dozen-or-so individual fixes. It was learning that the next risk will be at a boundary too, the next place untrusted input meets a powerful operation with nothing in between, and that the job is to find those points and put a single, mandatory, checked door on each.

To sum up

A security audit of a CLI framework reads like a list of unrelated bugs and isn’t one. go-tool-base’s findings nearly all reduced to the same shape: untrusted input reaching a powerful operation unchecked. A regex compiler that needed a length and time bound (regexutil.CompileBounded). A URL opener that needed a scheme allowlist and a single chokepoint (pkg/browser). Log and telemetry sinks that needed credentials redacted on the way out (pkg/redact).

The fixes were structural and dull, which is exactly right. Find your boundaries (config, flags, TUI input, network responses, log and telemetry sinks), give each one a single mandatory checked path, and you’ve spent your audit effort where the risk actually lives.

A mutex on a flag nobody writes twice

Thu, 16 Apr 2026 00:00:00 +0000

“Why is there a mutex around a boolean that only ever gets set once?”

It’s a fair question, and I’d half-asked it of myself before someone asked it of me. The answer turns out to be written, in as many words, in a code comment I’ve grown rather fond of.

The registry and its one-way latch

go-tool-base keeps a feature registry: the initialisers, sub-commands, flags and checks that each feature adds to the CLI. Features register themselves into it at startup, from init(), before main runs. Once everything’s wired, the framework calls SealRegistry() and the registry latches shut. Any Register call after that point panics, on purpose, because a sub-command or flag that turns up after the CLI has parsed its arguments is a bug I want to hear about at once, not discover three releases later.

So there’s a registrySealed bool. It starts false, SealRegistry flips it to true exactly once in normal operation, nothing flips it back outside of tests, and it’s read on every registration attempt. Written once, read many. The textbook shape of “you don’t need a lock for this.”

Except the comment disagrees, on purpose

Here is the actual declaration, in pkg/setup/registry.go:

// registryMu protects globalRegistry and registrySealed. Acquired for write
// by all Register* and Reset/Seal helpers; acquired for read by all Get*
// accessors. The mutex is required for memory visibility of registrySealed
// across goroutines, not only mutual exclusion on the maps.
var (
	registryMu sync.RWMutex
	registrySealed bool
)

That last sentence is the entire post. The mutex has an obvious day job: the registry is a clutch of maps that get appended to during registration, and concurrent appends need genuine mutual exclusion. registrySealed could have just hitched a ride on that lock and nobody would have thought twice. But the comment goes out of its way to say the lock is also required for the flag, for visibility, not only exclusion.

Why a write-once bool still needs the lock

The Go memory model makes no promise that a goroutine reading registrySealed will ever see the write SealRegistry made, unless there is a happens-before relationship between them. No synchronisation, no guarantee. A reader can sit there seeing false long after the seal happened on another goroutine, because the compiler may cache the read and the CPU may serve it from a core-local view. And a concurrent read and write of the same variable, with nothing ordering them, isn’t “probably fine”; it’s a data race, which Go defines as undefined behaviour.

“But registration is single-threaded, it’s all init().” It was, right up until we wanted the tests to run in parallel. This lock exists because of a deliberate campaign to restore t.Parallel() across the codebase after a stack of races forced us to drop it (the same campaign that retired the package-level mocking hooks). Tests build, register, seal and reset this registry from parallel goroutines. The instant that’s true, the seal check has to stay correct while racing, because the very thing it guards against is concurrency. So reads take registryMu.RLock, the write takes registryMu.Lock, and now there’s a happens-before edge: anyone who acquires the lock after SealRegistry released it is guaranteed to see true.

What the lock is actually for

It isn’t there to stop two goroutines both sealing the registry. There’s only ever the one seal. It’s there so that every reader can trust what it reads. A value written exactly once is precisely the case where you’re most tempted to skip the synchronisation, and precisely the case where skipping it can leave a reader legally staring at the stale value for good. The comment spells it out so that the next person to glance at registrySealed, think “that clearly doesn’t need a lock,” and reach for the delete key, reads the sentence first.

(There’s a sibling sealed flag in the middleware registry that follows the identical pattern, for the identical reason.)

The test-mocking pattern that races

Thu, 16 Apr 2026 00:00:00 +0000

I’m going to tell you about a bug go-tool-base shipped, because it’s one of those bugs that’s so reasonable-looking you’ll find it in textbooks, conference talks, and an awful lot of otherwise excellent Go code. We had it too. It passed every test on my laptop, every single time, and then quietly fell over on CI while blaming an innocent bystander.

It’s the classic Go trick for mocking a dependency, and it races.

A pattern that looks completely reasonable

Here’s a thing you need to do constantly in Go tests: stop a function from really shelling out. It calls exec.LookPath to find a binary, or exec.Command to run one, and your test very much does not want it touching the real $PATH or spawning a real process.

The Go community has a well-worn answer. Hoist the function into a package-level variable, call that, and let tests reassign it:

// production code
var execLookPath = exec.LookPath

func findTool() (string, error) {
 return execLookPath("sometool")
}

// test
func TestFindTool(t *testing.T) {
 old := execLookPath
 defer func() { execLookPath = old }()
 execLookPath = func(string) (string, error) {
 return "/fake/path", nil
 }
 // ...assert...
}

It’s tidy. No interface to thread through, no constructor to change. You’ll find it in a great deal of Go code, including some very respectable Go code indeed. go-tool-base had it too.

And it works. It works on your machine, it works in code review, it works the first hundred times CI runs it. Which is precisely what makes it dangerous, because it’s wrong, and it’s just been biding its time.

Add one line and it detonates

Go’s t.Parallel() is more or less free performance. Mark your tests with it and the runner overlaps them instead of plodding through one at a time. On a package with a few hundred tests it’s a real, worthwhile speed-up, so naturally you reach for it.

Now picture two tests, both using the pattern above, both marked t.Parallel(). They run concurrently. Test A assigns its fake to execLookPath. Test B assigns its fake to execLookPath. Test A reads execLookPath expecting its own fake. Two goroutines, one variable, writes and reads with nothing synchronising them. That’s a textbook data race, and the textbook is right: the behaviour is undefined. Test A might see B’s fake. The deferred restore might land in the wrong order and leave the variable pointing at a fake after both tests have finished, poisoning a third one for good measure.

The truly nasty part is the intermittency. Whether the race actually bites depends on goroutine scheduling, which depends on machine load and core count. Your laptop running eight tests at once might never lose the coin-toss. A CI runner under load, scheduling differently, loses it and fails a test that has nothing obviously to do with the change in the commit. You re-run the pipeline, it passes, everyone shrugs and moves on. A test suite that fails one run in twenty trains your team to ignore it, and an ignored CI failure is worse than no CI at all.

I can tell you this one from direct, slightly embarrassed experience, because go-tool-base shipped exactly this bug and CI caught it the honest way: green on the laptop, red on the runner, with the failure cheerfully pointing at innocent bystander tests rather than the global that was actually the culprit. go test -race will name it for you if you crank the parallelism up high enough to lose the toss reliably… but you have to go looking, and you only go looking once it’s already ruined an afternoon.

The fix isn’t synchronisation, it’s structure

The instinct is to slap a mutex around the variable. Resist it. A mutex makes the race defined, but it doesn’t make the design any good. You’ve still got global mutable state, you’ve just queued the fight instead of cancelling it. And tests that serialise on a shared lock aren’t really parallel any more, so you’ve also handed back the speed-up you came for in the first place.

The real fix is to not have a shared variable at all. The dependency was always an input to the code; the package-level var was just a way of avoiding saying so out loud. So say it. Inject it.

A struct field:

type Finder struct {
 lookPath func(string) (string, error) // defaults to exec.LookPath
}

func (f *Finder) find() (string, error) {
 return f.lookPath("sometool")
}

Or a functional option, if you’d rather keep the zero value clean. Either way, each test constructs its own Finder with its own fake. There’s no shared variable, so there’s no race, and t.Parallel() is free again because the tests genuinely don’t touch each other.

go-tool-base wrote this straight into its standing rules: no package-level mocking hooks, full stop. Dependencies come in through struct fields, functional options, or config fields. (The same injection discipline that makes Props so testable, applied one rung further down.) And to stop everyone hand-rolling the same exec fakes, there’s a small internal package, internal/exectest, with ready-made LookPath and CommandContext doubles you construct per-test. The pattern is gone, and the door it came in through is shut.

The rule worth taking away

A package-level variable that tests reassign is shared mutable state. It reads as a harmless convenience because in a single-threaded test run it behaves like one. t.Parallel() is the thing that reveals it was never harmless, only unobserved.

The general lesson is older than Go: if a value is an input to your code, make it an input. Smuggling it in as a global is borrowing test-time convenience against a debt that comes due, with interest, the day someone wants their tests to run in parallel. Pay cash. Inject the dependency.

Worth remembering

Mocking via a reassignable package-level variable is a beloved Go shortcut and a latent data race. It survives because single-threaded test runs hide it; t.Parallel() exposes it as intermittent, bystander-blaming CI flake that’s miserable to trace. A mutex only makes the bad design defined. The fix is structural: inject the dependency as a struct field or functional option, so each test owns its own double and there’s no shared state to race over. go-tool-base banned the global-hook pattern outright and ships internal/exectest so nobody’s tempted back to it.

If a piece of code depends on something, let it say so in its signature. Your future self, staring at a CI failure that flatly refuses to reproduce, will thank you.

OpenSSF Scorecard graded my supply chain

Tue, 14 Apr 2026 00:00:00 +0000

I turned OpenSSF Scorecard on expecting a pat on the head. go-tool-base is a security-minded project, I’m careful, surely the robot would agree. The robot did not agree. It handed back a report card with a fair bit of red ink, and the most pointed finding on it wasn’t about my code at all. It was about me.

A linter for the things you don’t call code

Scorecard is an automated set of checks that grades a repository’s supply-chain hygiene: are your CI dependencies pinned, are your workflow tokens least-privilege, is your branch protected, do commits get reviewed. It’s a linter, but pointed at the part of the project you don’t usually think of as code, the build and release machinery and the practices around them. And like any good linter, its value is mostly in catching the things you’d swear you’d already got right.

Three of its findings were worth the price of admission on their own.

Pin the actions you don’t control

The first was about how go-tool-base’s GitHub Actions referenced other actions. Like nearly everyone, I’d written uses: actions/checkout@v6. Scorecard doesn’t like that, and it’s right not to.

@v6 is a tag, and a tag is mutable. Whoever controls that action can move v6 to point at different code tomorrow, and your CI will pick it up silently on the next run. For an action that runs in a job holding your repository token, that’s a supply-chain hole the width of a barn door: compromise the tag, compromise every pipeline that trusts it. The fix is to pin to an immutable commit SHA, with the human-readable version left as a comment, which is exactly what I changed:

-  - uses: actions/checkout@v6
-  - uses: actions/setup-go@v6
+ - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
+ - uses: actions/setup-go@4a3601121dd01d1626a1e23e37211e3254c1c06c # v6

Now the action is frozen at bytes I reviewed. Dependabot still bumps the SHA when a real new version lands, so I get updates as reviewable pull requests rather than as silent tag movements. The pin doesn’t stop me updating. It stops me updating without noticing.

Give the workflow token the least it can do

The second finding was about permissions. My workflow declared its token permissions at the top, once, for the whole file:

permissions:
 contents: read
 security-events: write
 id-token: write

That reads as careful, and it’s still too broad, because top-level permissions apply to every job in the workflow. A job that only needs to read the repo is now also holding id-token: write and security-events: write, for no reason other than that some other job in the same file needed them. Scorecard rejects exactly this, and the fix is to default the whole workflow to read-only and grant write narrowly, in the job that actually needs it:

permissions: read-all

Write permissions moved down into the single job that uses them. It’s the same least-privilege instinct that runs through everything else in these projects, just applied to a CI token instead of an IAM role: a credential should be able to do the one thing it’s for, and nothing else, no matter how convenient the broad grant looked.

The finding that was about me

The third one stung, because there was no YAML to fix. Scorecard’s Code-Review check scores how consistently changes are reviewed before they land, and mine scored badly for the most embarrassing possible reason: I’d set up branch protection on main, and then, being the solo maintainer in a hurry, I’d been merrily bypassing it to push straight to main whenever it suited me.

So I had a rule, written down and enforced by the platform, that I was personally and routinely ignoring. Scorecard noticed, totted up the unreviewed commits, and graded me on it. There’s something properly humbling about a robot reading your git history and pointing out that the person breaking your security policy most often is you. The fix wasn’t code. It was going through a pull request like everyone else, even when “everyone else” is just me on a different day.

The bottom line

OpenSSF Scorecard is a linter for your supply chain, and like any linter it’s most useful when it tells you something you were sure you’d already handled. It dinged go-tool-base for referencing actions by mutable tag instead of pinned SHA, for granting workflow-token write permissions at the top level where every job inherited them, and for a Code-Review score I’d earned fair and square by bypassing my own branch protection.

The first two were quick, satisfying changes with a clear security story. The third was the one that stuck, because the tool I’d added to grade the project ended up grading the maintainer, and was entirely right to. Turn it on. Brace yourself a little.

Testing code that calls an LLM: yes, you actually can

Wed, 08 Apr 2026 00:00:00 +0000

“You can’t test code that calls an AI.” I’ve heard it said with great confidence, and it’s half right, which is the most dangerous kind of right. You genuinely can’t assert on what a non-deterministic model says. But the model isn’t your code, and the bits sitting either side of it most certainly are.

“You can’t test AI code”

It’s a fair worry. Your command calls an LLM. The LLM returns something slightly different every run. A test that asserts response == "..." is broken before you’ve finished typing it. So the conclusion arrives quickly: the AI path can’t be tested, leave it uncovered.

Which is a shame, because the AI call is usually the riskiest line in the whole command.

The conclusion is also wrong. It mistakes “I can’t test the model” for “I can’t test my code”. The model is not your code. Your code is the two pieces sitting on either side of it.

Your code is a prompt and a handler

Strip the command down to what it actually does:

It builds a prompt. It assembles a system prompt, the user’s input, perhaps some context, and sends it.
The model does something. This is not your code.
It takes the response and does something with it. It parses it, branches on it, prints it, stores it.

Steps one and three are entirely yours, and entirely deterministic. The same inputs build the same prompt and handle the same response the same way, every single time. That’s testable. Step two is the only part that isn’t, and step two was never yours to test in the first place.

So the job is to pin step two to a known value, and then test one and three properly.

Test the prompt: snapshot it

Step one produces a prompt, and a prompt is just a string, which means you can pin it.

Both frameworks lean on snapshot testing here. go-tool-base uses a golden-file approach: the prompt your code generates is recorded to a file, and the test re-generates it and compares against that file. rust-tool-base does the same with insta, snapshotting the request body the client would send.

The reason this matters is that the prompt is load-bearing and quietly easy to break. You refactor how context gets assembled. Without noticing, you’ve changed the wording, or the ordering, or dropped a line the model was leaning on. Nothing fails to compile. The behaviour just drifts, silently.

A snapshot test catches exactly that. It fails, it shows you the diff between the old prompt and the new one, and it makes you stop and make a decision. Was this change intended? If yes, you accept the new snapshot and move on. If no, you’ve just caught a bug before it shipped. Either way the prompt never changes by accident, which for AI code is most of the battle.

Test the handler: mock the response

Step three needs a response to handle, and in a unit test you don’t get that response from the real model. You supply it.

go-tool-base ships generated mocks for the ChatClient interface. A test builds a mock client, tells it “when Ask is called, return this canned value”, and runs the command against it:

mockClient := mock_chat.NewMockChatClient(t)
mockClient.EXPECT().
 Ask(mock.Anything, mock.Anything, mock.AnythingOfType("*main.Analysis")).
 RunAndReturn(func(_ context.Context, _ string, target any) error {
 *(target.(*Analysis)) = Analysis{Severity: "critical"}
 return nil
 })

Because the interface is only four methods, that mock is trivial to set up and complete by construction. rust-tool-base takes the same idea one layer down: HTTP-bound tests use wiremock, which stands up a fake server returning a canned response body. The client makes a real HTTP request; it just goes to a fake endpoint the test controls.

Either way, step two is now fixed to a value you chose, which makes step three deterministic. And that unlocks the tests that actually matter: given a malformed response, does the command fail gracefully? Given a rate-limit error, an empty answer, a field missing? Those are the cases a live model almost never hands you on demand, and a mock hands you every time, on the first run.

This is, incidentally, the same discipline as the test-mocking work elsewhere in the framework: the dependency is injected, so the test gets to decide what it does.

What you deliberately don’t test

One boundary worth stating. None of this tests whether the model gives good answers. That question is real, but it’s a different activity (evaluations, run as their own suite) and not something to mix into the unit tests.

The unit suite’s job is your code: that it builds a sound prompt, and that it handles every shape of response correctly, including the ugly ones. Keep that well away from “is the model clever today”. A unit test that depends on the model being clever is a unit test that fails when the weather changes, and a flaky test just teaches people to ignore the whole suite.

What it comes down to

Code that calls an LLM is testable; the model is not, and those are different statements. Your code is a prompt builder and a response handler, both deterministic, with the model sat in between.

go-tool-base and rust-tool-base converge on the same approach. Snapshot the prompt, with golden files or insta, so a refactor can’t change what you send without a test noticing. Mock the response, with generated ChatClient mocks or a wiremock server, so tests run with no network and you can feed in the malformed and error cases a real model won’t reliably produce. Leave “are the answers any good” to a separate evaluation suite. Test the two halves you own, and the non-determinism in the middle stops being an excuse to leave the riskiest line uncovered.

The AI provider that isn't an API

Mon, 06 Apr 2026 00:00:00 +0000

go-tool-base’s chat package puts five AI providers behind one interface. Four of them are exactly what you’d guess: HTTP calls to OpenAI, Claude, Gemini, and anything OpenAI-compatible. The fifth one isn’t an API at all. It shells out to a binary.

That sounds like a slightly mad thing to want, right up until you’ve worked somewhere the network says no.

The fifth provider shells out

The chat package speaks to five providers through one ChatClient interface. Four of them are what you’d expect: HTTP requests to OpenAI, to Claude, to Gemini, to any OpenAI-compatible endpoint. The tool author picks one in config, and the rest of the code never knows the difference.

The fifth, ProviderClaudeLocal, is different in kind. It doesn’t make an HTTP request at all. It shells out. It runs the claude CLI binary as a child process, passes the prompt in, and reads the answer back from the binary’s output.

That sounds like an odd thing to want until you’ve been stuck in the environment it was built for.

Why you’d want that

Picture a corporate network with its egress locked right down. Outbound HTTPS to api.anthropic.com is blocked by policy. A tool built on go-tool-base that uses AI would simply fall over there. It tries to reach the API, there’s no route, and that’s the end of the feature.

But the developer at that machine has the claude CLI installed, and has run claude login. That binary is permitted. It’s an approved, managed tool, and it has its own sanctioned path out. The direct API call is blocked; the claude command is not.

ProviderClaudeLocal is what bridges those two facts. If your tool’s AI calls go through that already-blessed binary instead of straight at the API, they work, in an environment where the direct call cannot. That’s the whole reason the provider exists. It isn’t faster (a real API call has lower latency) and it isn’t more capable. It’s for the place where the API call simply isn’t an option, and “isn’t an option” is a surprisingly common place to find yourself inside a large organisation.

What it costs

It’s worth being straight about the trade, because ProviderClaudeLocal is the reduced-capability provider.

It doesn’t do tool calling. It doesn’t do parallel tools. It doesn’t stream. Those need a live, structured connection to the model’s API, and a subprocess that runs once and prints an answer is not that. What it does support is plain chat and structured output, the latter through the binary’s own --json-schema flag.

So the positioning, and the package’s documentation says exactly this, is: prefer the API providers when you can reach them, because they’re lower latency and feature-complete. Reach for ProviderClaudeLocal when API access is restricted. You accept the narrower capability set as the price of working at all. For a tool whose AI feature is “answer a question” or “return a structured analysis”, that price is often nothing you’d even notice. For one built on an agentic tool-calling loop, it’s a real limitation, and you’d know to expect it.

How it stays behind the same interface

Here’s the part that makes it pleasant rather than a special case to maintain. Despite being a subprocess and not an API, ProviderClaudeLocal is still a ChatClient. Your feature code calls Chat and Ask exactly the way it would for any other provider.

Everything that makes a subprocess provider awkward stays inside the provider. Spawning the binary, feeding it the prompt, parsing its output, capturing stderr and surfacing it when the binary exits non-zero, and threading multi-turn continuity through session identifiers passed back on the next call with --resume: all of that is the provider’s problem, and all of it sits behind the interface. The code in your tool that uses AI doesn’t know, and has no way to find out, that this particular provider is a child process rather than an HTTPS call.

That’s a unified interface genuinely earning its place. It’s easy to put a uniform face on four things that already work the same way underneath. The real test of the abstraction is whether something that works in a completely different way, a subprocess instead of a socket, can still slot in without the caller changing a line. Here it can. You swap one config value, and a tool that talked to an API now talks through a binary, and nothing downstream so much as blinks.

The bottom line

go-tool-base’s chat package puts five providers behind one ChatClient interface, and ProviderClaudeLocal is the one that isn’t an API. It runs the locally installed, pre-authenticated claude CLI as a subprocess.

It exists for the locked-down environment where outbound HTTPS to the AI API is blocked but the claude binary is allowed: there, AI features keep working where a direct call would fail. The trade is a narrower capability set (no tool calling, no streaming, plain chat and structured output only) so you prefer the API providers when you can reach them and fall back to this when you can’t. And because it’s still a ChatClient, all the subprocess machinery stays hidden, and your code uses it without knowing it’s there. That last part is the real test of an abstraction: a provider that works in an entirely different way still slots in unchanged.

AI conversations you can resume

Sat, 04 Apr 2026 00:00:00 +0000

An AI conversation is, fundamentally, its own history. The model’s next answer depends on everything said so far. And a CLI tool, by its very nature, forgets everything the moment it exits. Put those two facts together and you get the problem: run an AI command, exit, run it again, and you’re talking to someone who’s never met you.

A CLI forgets everything

A long-running service keeps its state in memory for as long as it runs. A CLI tool doesn’t get that luxury. It starts, does one thing, exits. The next invocation is a brand-new process with no memory of the last one.

For most commands that’s exactly right, and you wouldn’t want it any other way. But an AI conversation is a different kind of beast, because a conversation is its history. The model’s next answer depends on everything said so far. Run an AI command, exit, run it again, and you’ve started a fresh conversation with someone who’s never met you. For an interactive assistant, or any AI workflow that unfolds across several invocations, that’s plainly the wrong behaviour. The user expects to pick up where they left off.

Save and restore

The chat package handles this through a PersistentChatClient interface. Like streaming, it’s an optional capability discovered with a type assertion, sitting beside the four-method core rather than bloating it. A client that supports persistence also satisfies this interface:

if pc, ok := client.(chat.PersistentChatClient); ok {
 snapshot, err := pc.Save()
 // store the snapshot somewhere
}

A snapshot is a serialisable value that captures the conversation. You store it. Next run, you load it, Restore it onto a fresh client, re-register your tools, and call Chat again. “Where were we?” works, because the model is handed back the whole history.

A snapshot is opinionated about what it carries

The interesting part is what a snapshot does and doesn’t contain, because that’s a series of deliberate decisions.

It carries the messages, the system prompt, the model name, and tool metadata: the names, descriptions and parameter schemas of the tools that were registered.

It does not carry tool handlers. Handlers are code, not data; you can’t serialise a function meaningfully, so after a restore you re-register them with SetTools. The snapshot remembers that a tool called read_file existed and what its shape was; it doesn’t try to remember the Go function behind it.

And it does not carry API tokens. This is the one to dwell on. A snapshot is a file. A file gets synced, backed up, copied between machines, attached to a support ticket by a user trying to be helpful. A snapshot that carried the API key would be a credential leak the moment it left the laptop it was made on. So the snapshot never contains a token, at all. On restore, the client picks the credential up again the ordinary way, from the environment or the keychain. The conversation and the secret are kept in separate places on purpose, and only one of them is ever in the file.

Encrypted at rest, if you want it

The package ships a FileStore that writes snapshots as JSON files, with 0600 permissions in a 0700 directory, and it can encrypt them. Pass WithEncryption a 32-byte key and snapshots are written with AES-256-GCM.

That option exists because a conversation can hold sensitive content even when it holds no credential. The log a user pasted in for analysis, the source file they asked the model to review, the internal details tucked into their questions: none of that is an API key, and all of it might be something you’d rather not have sitting in plain JSON in a backup somewhere. Encryption at rest covers it.

The FileStore is also careful about the snapshot identifiers it’s handed. An ID has to be a canonical UUID, and the resolved file path is checked to lie inside the store directory, so a snapshot ID arriving from an untrusted source (a CLI flag, a request payload) can’t be bent into a path-traversal that reads or writes somewhere it shouldn’t. Persisting conversations adds a small filesystem surface, and the store treats it as exactly that.

The short version

A CLI tool forgets everything between invocations, which is correct for most commands and wrong for an AI conversation, because a conversation is its history.

go-tool-base’s chat package lets you persist one. PersistentChatClient saves a snapshot you can store and restore later, picking the conversation back up where it ended. The snapshot is deliberate about its contents: messages, system prompt and tool metadata yes; tool handlers no, because they’re code you re-register; API tokens never, because a snapshot is a file and a file travels. The built-in FileStore can encrypt snapshots at rest with AES-256-GCM and validates snapshot IDs against path traversal. Resumable conversations, without the conversation file turning into a place secrets leak from.

An AI agent that has to make the build pass

Thu, 02 Apr 2026 00:00:00 +0000

Most AI code generation works on a charming little principle I’ll call generate-and-hope. The model writes the code, the model stops at the closing brace, and whether the thing actually compiles is left as an exercise for you. For a snippet you paste into an editor, fine. For a whole generated command, that’s just outsourcing the disappointment.

go-tool-base does something I’m rather happier with: the AI has to make the build pass before it’s allowed to claim it’s done.

Generate and hope

The usual shape of AI code generation is this. You ask for code, the model produces it, and the model’s job ends at the closing brace. Whether it compiles, whether the tests pass, whether the imports even resolve, none of that has been checked. The model produced something that looks right. You find out whether it is right when you build it.

For a snippet you paste into an editor, that’s perfectly fine. The compiler tells you in a second. But go-tool-base’s generator, driven by gtb generate command --script or --prompt, produces a whole command: the implementation, its tests, the lot. “Generate and hope” at that scale means handing the user a project that may or may not build, and quietly making them the one who finds out which.

Drafting is only step one

So the generator doesn’t stop at drafting. Writing the first version of the implementation and its tests is step one of two. Step two is an autonomous repair agent.

Once the draft is on the filesystem, a separate agent takes over. It’s an LLM running in a loop, but a loop aimed at one narrow, checkable job: make this project build and pass its tests. It isn’t asked to be creative. It’s asked to get to green.

A fixed set of tools, and no shell

The agent is not handed a shell. It’s given a fixed, defined set of tools and nothing else. Three of them let it explore and edit the project: list_dir, read_file, write_file. Four of them let it verify the project:

go_build runs the build and captures the compiler errors.
go_test runs the tests and captures the failures.
go_get resolves a missing dependency.
golangci_lint runs the project’s linter.

That restriction is the design, not a limitation of it. The agent can’t delete arbitrary files, can’t reach the network, can’t run anything that isn’t on the list. It has exactly what it needs to make code compile and nothing it would need to do damage. Its file writes are confined to the project directory by an explicit path check, so even write_file can’t go wandering up into /etc. A coding agent you’d actually let near a filesystem is one whose abilities are an allowlist, not a denylist. (I keep coming back to that principle through this series… safety as a boundary you draw, not a behaviour you hope for.)

The loop

The repair loop is a ReAct loop, the same reason-act-observe shape as the tool-calling loop, only this time pointed at a goal:

The draft is on disk.
Verify: run go_build and go_test.
If verification failed, read the error logs, the compiler error or the failing test.
Reason about the cause: an undefined variable, a missing import, a wrong signature.
Act: call write_file to patch the code, or go_get to add the dependency.
Loop. Steps two to five repeat until the project is green, or the agent hits its bounded step limit.

What makes this work is treating the error output as feedback rather than as a failure to log and walk away from. A compiler error is the single most useful sentence you can hand a model that’s trying to fix code. It says what’s wrong, and usually where. The loop feeds it straight back in, and the model fixes against it.

Verification changes what “done” means

Here’s the real shift, and the agent’s own documentation puts it well: the agent “doesn’t just say it fixed a bug; it uses a Test tool to verify the fix before reporting success.”

A generate-and-hope model reports success when it finishes writing. It has no idea whether the code works, and it isn’t really claiming otherwise. “Done” means “I produced text”. The repair agent reports success when go_build and go_test actually pass. “Done” means “the build is green”. Those are two completely different claims, and only the second is worth anything to the person who asked for the command.

That’s the line between an AI that’s a creative writer and an AI that’s a collaborator you can hand a task to. And when the agent can’t reach green, when it spends its whole step budget and the project is still broken, the generator fails safely: it leaves the best-attempt code in place, commented out so the project still compiles, and tells the user what to finish by hand. There’s also an --agentless flag for anyone who’d rather have a plain single-shot retry than the multi-step agent. The default, though, is the agent, because the default should be code that’s been checked.

Where this leaves us

Most AI code generation generates and hopes: the model writes code and the user discovers whether it works. For a whole generated command, that pushes a may-or-may-not-build project onto the user.

go-tool-base’s generator drafts the command and then hands it to an autonomous repair agent. The agent has a fixed set of tools (explore and edit the project, build it, test it, lint it, fetch dependencies) and no shell at all, with file writes confined to the project directory. It runs a ReAct loop, reading each error and patching against it, until the build is green or it exhausts its steps. The point is what “done” comes to mean: not “the model finished writing”, but “the build passes”. Only one of those is a claim worth trusting.

Stop regex-ing the LLM's prose

Tue, 31 Mar 2026 00:00:00 +0000

Ask an LLM a question and it hands you back prose. Lovely to read, miserable to program against. You wanted the one number buried in the middle of it, and now you’re writing a regular expression to fish a word out of three well-written paragraphs that phrase themselves slightly differently every single time you run them.

There’s a much better way, and it’s the difference between forever interpreting an LLM and actually building on one.

The problem with a paragraph

You ask an LLM to analyse a log file and tell you the severity of what it found and a suggested fix. It comes back with three well-written paragraphs. Somewhere in there is the word “critical”, and somewhere is the fix.

Your program now has to extract those two facts from prose, and prose has no contract. The next run, the model phrases it differently. It leads with a caveat. It says “severe” where last time it said “critical”. It puts the fix first. Anything that worked by finding “critical” in the text is now quietly wrong, and you didn’t change a line. Parsing free text for structured facts is a game you lose slowly.

What you actually wanted was never a paragraph. It was a value: a thing with a severity field and a fix field, that you can branch on and store and pass around like any other.

Ask for the struct, not the prose

go-tool-base’s chat package draws the line with two methods. Chat gives you text. Ask gives you a struct.

You define the Go type you want back:

type Analysis struct {
 Severity string `json:"severity"`
 Fix string `json:"fix"`
}

var result Analysis
err := client.Ask(ctx, "Analyse this log file: "+logText, &result)

The framework generates a JSON Schema from that struct, sends it to the model as the required response format, and unmarshals the reply straight into result. You never lay a finger on the prose. You get result.Severity and result.Fix, typed, ready to use. If you want the model’s answer to drive a switch statement, this is the method that lets it.

The struct is the schema is the contract

The detail that makes this hold up over time: you don’t write the schema. The struct is the schema.

The framework derives the JSON Schema from your type. In go-tool-base that’s GenerateSchema[T](); in rust-tool-base the schema comes from your Rust type through schemars. (Yes, there’s a Rust sibling now. I’ll introduce it properly in a few weeks, but it keeps gatecrashing these posts because the two frameworks deliberately share ideas.) Either way there’s one definition, your type, and the schema is just a projection of it.

That matters, because otherwise two things have to agree. There’s the schema you tell the model to obey, and there’s the type you unmarshal the answer into. Hand-write the schema and those two can drift: add a field to the struct, forget to add it to the schema, and the model is never told to produce it, so it silently never appears. Deriving the schema from the type collapses the two into one. They can’t disagree, because there’s only one of them.

Both frameworks, with one extra step in Rust

go-tool-base does this with Ask and a ResponseSchema set on the client config. rust-tool-base does it with chat_structured::<T>, where T is any type that’s both deserialisable and JsonSchema.

rust-tool-base adds one step worth calling out. Before it deserialises the model’s reply into your T, it validates the raw response against the schema with a JSON Schema validator. That splits the failure into two distinct, named cases: the response didn’t match the schema, or it matched the schema but still wouldn’t deserialise. A model that returns subtly wrong JSON fails loudly and specifically, with an error that tells you which of those happened, instead of quietly handing you a zero-valued struct that you end up debugging an hour later.

When you’d reach for it

The line is simple, and it’s about who reads the answer.

If a human reads the answer, prose is right. Chat, free text, let the model write well. A summary, an explanation, an interactive reply: leave all of those as prose.

If a program consumes the answer, you want a value. Classification, extraction, a code review scored out of a hundred with a list of issues, a yes-or-no with reasons: anything where the next thing that happens is your code branching on the result. There, Ask and chat_structured turn the LLM from something you have to interpret into something that returns a value, and a typed value is a thing you can actually build on.

To sum up

An LLM returns prose by default, and prose has no contract, so a program that picks structured facts out of it breaks the moment the model rephrases.

Structured output asks for the value instead. You define a struct, the framework derives a JSON Schema from it, the model is constrained to that shape, and you get a typed result. go-tool-base’s Ask and rust-tool-base’s chat_structured both work this way, with the schema derived from your type so the schema and the type can’t drift; rust-tool-base additionally validates the response against the schema before deserialising. Use it whenever the answer feeds code rather than a human. It’s one of the four methods that make up go-tool-base’s small chat interface, and it’s the one that makes an LLM safe to program against.

Telemetry that asks first

Mon, 30 Mar 2026 00:00:00 +0000

Usage telemetry is genuinely useful. Knowing which commands people actually run, where the errors cluster, whether anyone ever touched the feature you spent a fortnight on… that’s the stuff that makes you a better maintainer. Wanting it is completely legitimate.

The trouble is that the usual way of getting it, on by default and quietly hoovering up everything, is a small betrayal of the people who installed your tool to get a job done. I wasn’t willing to build that, so go-tool-base’s telemetry starts from a different question.

The data you want, and the line you shouldn’t cross

If you maintain a tool, you want to know how it’s actually used. Which commands matter and which are dead weight. Where the error rate spikes. Whether anyone touched the feature you spent that fortnight on. That information makes you a better maintainer, and, to say it again, wanting it is completely legitimate.

The trouble is the standard way of getting it. Telemetry on by default. An opt-out buried three levels down in a settings file nobody reads. And once it’s running, it quietly collects far more than it ever admitted to: the arguments people passed, the paths they were working in, an IP address for good measure.

Every one of those is a small betrayal of someone who installed your tool to get a job done, not to become a data point. And the cost when users notice isn’t a slap on the wrist. It’s trust, and trust in a developer tool does not grow back quickly. A tool that surprises you once with what it was quietly collecting is a tool you uninstall and warn your colleagues about.

So go-tool-base’s telemetry started from a different question. Not “how do we collect the most data” but “how do we collect useful data without ever putting the user in a position they didn’t choose”.

Rule one: it is off until you say otherwise

The foundation is the simplest possible rule, and it’s absolute. Telemetry is never enabled by default. A freshly installed tool built on go-tool-base sends nothing. Not a heartbeat, not a ping, nothing at all.

It only starts collecting when the user makes an explicit, visible choice to let it. Three honest doors: they run telemetry enable, they say yes to a clear prompt during init, or they set TELEMETRY_ENABLED themselves. All three are deliberate acts. None of them is a pre-ticked box or a default they have to discover and then undo.

This is opt-in, and the distinction from a well-hidden opt-out is the entire point. Opt-out telemetry treats consent as something to be assumed and grudgingly reversed. Opt-in treats it as something that has to be given. Only one of those is actually consent.

Rule two: no personally identifiable information, full stop

Consent to “some telemetry” is not consent to “any telemetry”, so the second rule constrains what can ever be collected, even from a user who’s opted in.

No personally identifiable information. The framework does not record command arguments (they routinely contain paths, hostnames, the occasional secret someone’s pasted in). It does not record file contents. It does not record IP addresses.

It does need some notion of “distinct installations” for the numbers to mean anything, so it derives a machine ID from a handful of system signals and runs it through SHA-256. What leaves the machine is a hash. It tells you “this is the same install as last week” and tells you precisely nothing about whose install it is, and the hash can’t be walked backwards into the signals it came from.

The events themselves are deliberately thin. Which command ran, roughly how long it took, whether it errored. The shape of usage, not a transcript of it.

Rule three: the author picks the destination

Even with consent given and PII excluded, there’s a third question: where does the data actually go? go-tool-base doesn’t answer that for you, because it can’t. A corporate internal tool, an open-source CLI and an air-gapped utility have completely different right answers.

So the backend is the tool author’s choice. The framework ships several (a noop backend, stdout, a file, plain HTTP, and OpenTelemetry over OTLP) and supports custom ones. The noop backend matters more than it looks: it lets a tool wire up the whole telemetry surface, commands and all, while sending data precisely nowhere. A perfectly reasonable, fully supported configuration.

Pluggable backends also mean the data never has to touch any infrastructure I run. It goes where the tool’s author decides, on their terms. The framework provides the plumbing and stays well out of the destination.

And a way back out

One last thing, because it’s the part that makes the opt-in real rather than decorative. A user who opted in can opt straight back out, and the package includes a GDPR-aligned deletion path, so “stop, and remove what you have” is an actual supported request rather than a polite fiction.

Consent you can’t withdraw isn’t consent. It’s a one-way door with a friendly sign on it. The deletion path is what keeps the front door an actual door.

The bottom line

Telemetry is genuinely useful to a maintainer and genuinely dangerous to the trust of the people running the tool, and the usual implementation (on by default, opt-out buried, collecting everything) spends that trust recklessly. go-tool-base’s telemetry holds three lines: never enabled without an explicit user action, never collecting personally identifiable information even once enabled, and always sending data to a destination the tool’s author chose, up to and including nowhere. A real deletion path makes the opt-in something you can take back.

You can have your usage numbers. You just have to ask for them, the way you would for anything else that wasn’t yours to begin with.

Letting the AI call your Go functions

Sun, 29 Mar 2026 00:00:00 +0000

An AI that can only produce text can describe your system. An AI that can call your Go functions can actually operate it. That gap, between describing and doing, is the difference between a chatbot and something genuinely useful, and crossing it comes down to one fiddly mechanism: tool-calling, and the loop that drives it.

Talking about the system versus operating it

Wire an AI provider into a CLI command and you get something that can talk. Ask it a question, get a paragraph back. Useful, up to a point.

But notice the ceiling. An AI that can only generate text can describe things. It can tell you what it would do. What it can’t do is look at the actual current state of your system, or take a real action, because it has no hands. It’s reasoning in a vacuum about a world it can’t reach out and touch.

The thing that gives it hands is tool-calling. You hand the AI a set of functions it’s allowed to call. Now, mid-conversation, it can decide it needs to read that file before it can answer, or run that query, or check that status, and actually go and do it, and then reason about the real result. The AI stops describing your system and starts operating it.

The loop is the hard part

Tool-calling has a shape, and the shape is a loop. The literature calls it ReAct: Reason, Act, Observe.

The AI reasons about the prompt and decides whether it needs a tool.
If it does, it acts, asking for a specific tool with specific arguments.
Your code runs the tool and feeds the result back. The AI observes that result.
Round again. Reason about the new information, maybe call another tool, maybe several. Keep going until the AI has what it needs and produces a final text answer with no more tool calls.

Conceptually simple. Tedious and error-prone to implement by hand every single time: parsing the model’s tool-call requests, dispatching to the right function, marshalling arguments in and results out, feeding observations back in the exact format the provider expects, knowing when to stop, and not looping forever if the model gets itself stuck.

That orchestration is pure plumbing, and it’s identical for every tool and every command. So you can probably guess what’s coming: go-tool-base’s chat package owns it. You don’t write the loop. You write the tools.

Defining a tool

A chat.Tool is four things: a name, a description, a parameter schema, and a handler. The description is what the AI reads to decide whether to use the tool, so it’s worth writing well. The schema describes the arguments, and you don’t hand-write it. You write a tagged Go struct and let it generate:

type ReadFileParams struct {
 Path string `json:"path" jsonschema_description:"Relative path to the file"`
}

The struct is the contract. The framework derives the JSON Schema the AI is given straight from those tags, so the schema and the Go type the handler receives can’t drift apart, because they share a single source. The handler is then just an ordinary Go function that takes those parameters and returns a result.

You register your tools with SetTools, call Chat, and that’s the whole of your involvement. The framework runs the ReAct loop and Chat returns the AI’s final text answer once the loop settles.

Two details that show it was built for real use

A couple of decisions in the loop tell you it’s meant for production, not a demo.

Tool errors don’t abort the conversation. When a handler returns an error, the framework doesn’t crash the loop. It hands the error back to the AI as a string, as just another observation. That’s deliberate, and it’s right. A real agent should be able to call a tool, watch it fail, and react: try different arguments, take a different route, or tell the user it couldn’t manage it. A loop that aborted on the first tool error would be far more brittle than the model driving it.

The loop is bounded. There’s a MaxSteps limit, default 20. An AI that gets confused could otherwise call tools forever, and a CLI command that never returns is a worse failure than a wrong answer. The cap guarantees the command terminates. The agent gets room to genuinely work a problem across many steps, but not infinite room to flail about in.

There’s also parallel tool execution: when the model asks for several tools in a single step (three independent file reads, say) the framework runs them concurrently rather than one after another, because there’s no reason to make the AI sit and wait out a sequence of things that don’t depend on each other.

Boiling it down

A text-only AI can describe your system; an AI that can call your functions can operate it. Bridging that gap means tool-calling, and tool-calling means the ReAct loop (reason, act, observe, repeat) whose orchestration is fiddly, identical every time, and not a problem worth solving twice.

go-tool-base’s chat package runs the loop for you. You define chat.Tool values (name, description, a tagged parameter struct that generates its own schema, a handler), call SetTools and Chat, and get the final answer. Tool errors go back to the AI as observations so it can recover, and a MaxSteps cap guarantees the command always terminates. You write Go functions. The framework turns them into things an agent can reach for.

Nobody reads the manual

Sun, 29 Mar 2026 00:00:00 +0000

Let me describe the actual lifecycle of a user meeting your CLI tool, because it’s a bit humbling. They run it. It doesn’t quite do what they expected. They run it again with --help. They get a wall of monospaced flag descriptions, skim it, don’t find the thing they wanted, and either give up or go and ask a human who already knows.

Your documentation might be magnificent. It doesn’t matter, because the user never reached it.

The manual loses on location, not quality

That’s the lifecycle, and notice exactly where it breaks. The documentation might be excellent. It might answer their precise question in full. It doesn’t matter, because it’s on a website, in another window, behind a search box, and the user is here, in the terminal, mid-task. The docs lost not on quality but on location. They simply weren’t where the work was.

go-tool-base’s answer starts with a decision about location: the documentation gets embedded into the binary itself. Your docs/ folder ships inside the tool, the same way its default config does. Wherever the tool is installed, the docs are right there alongside it, no network, no browser. That embedding is what makes everything else possible, and there are two things built on top of it.

A browser, in the terminal

The first is the docs command, and it’s not --help with extra steps. It launches a proper Terminal User Interface, built on Bubble Tea.

It has a sidebar, structured from the project’s own mkdocs.yml, so the docs are a navigable tree rather than one flat scroll. Markdown renders with real formatting through Glamour (colour, tables, lists, headings) instead of collapsing into monospaced soup. There’s live search across every page, regex included.

Compared with man and --help, the difference isn’t a nicer coat of paint. man gives you linear scrolling and grep; this gives you a structured tree, rich rendering and real search. It’s the documentation experience a modern developer expects, except it followed the tool into the terminal instead of demanding the user leave it.

A documentation assistant that won’t make things up

The second thing built on the embedded docs is the one I find genuinely transformative: docs ask.

The user doesn’t navigate anything. They just ask:

mytool docs ask "how do I point this at a self-hosted server?"

and get a direct, specific answer. Under the hood, the framework collates the tool’s embedded markdown and hands it to the configured AI provider (Claude, OpenAI, Gemini, Claude Local, any OpenAI-compatible endpoint) as the context for the question.

Now, “an AI answers questions about my tool” should immediately make you nervous, and the correct thing to be nervous about is hallucination. An AI that confidently invents a flag that doesn’t exist, or describes behaviour the tool simply doesn’t have, is worse than no assistant at all, because the user trusts it.

This is where embedding the docs pays off a second time, and it’s why I keep stressing that the corpus is closed. The model is instructed to answer only from the tool’s actual documentation, and the context it’s handed is exactly that documentation and nothing else. It isn’t drawing on a vague memory of similar tools from its training data. It’s answering from this tool’s real, shipped, version-matched docs. The corpus is small, closed and authoritative, which is the combination that keeps the answers honest. “Zero hallucination by design” isn’t a slogan about the model. It’s a property of bounding what the model is allowed to look at, which is the same instinct I leaned on with the mcp command: the safety comes from the boundary you drew, not from trusting the AI to behave itself.

There’s a nice second-order effect, too. The answer is always about the version of the tool the user actually has, because the docs were embedded into that build. No mismatch between a website documenting the latest release and the slightly older binary sitting on the user’s machine.

The upshot

Documentation usually loses to --help not on quality but on location: it’s in a browser, and the user is in the terminal. go-tool-base embeds the docs into the binary and surfaces them two ways: a docs command that’s a real TUI browser with a sidebar, rich markdown and search, and docs ask, which answers natural-language questions using the embedded docs as context.

Because that context is the tool’s own closed, shipped documentation and the model is told to use nothing else, the assistant stays grounded, and it’s always describing the exact version the user is holding. The fix for unread documentation was never to write more of it. It was to put it where the work happens and let it answer back.

BDD where it earns its place, and nowhere else

Sat, 28 Mar 2026 00:00:00 +0000

I have a slightly complicated relationship with BDD. I’ve watched it turn a tangled test suite into something the whole team could read and reason about, and I’ve watched it turn a perfectly good unit test into a paragraph of ceremonial English that nobody benefits from. So when go-tool-base brought in Cucumber-style BDD, the interesting decision wasn’t adopting it. It was being ruthless about where not to.

Two tests that hurt for different reasons

Most of go-tool-base’s tests are ordinary table-driven Go tests, and they’re absolutely fine. A function, a slice of input/expected pairs, a loop. Nobody needs Gherkin to understand a parser test.

But two areas were genuinely painful, and they were painful in the same way: the test had become harder to understand than the thing it was testing.

The first was pkg/controls, the service-lifecycle package. It runs a small state machine (Unknown, Running, Stopping, Stopped) with signal handling, health monitoring, restart policies and graceful shutdown all woven through it. The integration tests for graceful shutdown had grown to over three hundred lines of imperative goroutine and channel coordination. They worked. But reviewing them was a slog, and a test you can’t review with confidence is a test you can’t trust when it fails. The behaviour being checked, “when a shutdown signal arrives mid-startup, the controller stops cleanly”, was a simple sentence buried under a heap of synchronisation scaffolding.

The second was the CLI itself. init, update, doctor are user workflows. “Given a config file with a custom value, when I run init, then the custom value survives the merge.” That’s already a Given/When/Then; it just happened to be written out as Go.

Godog, and the line I drew

Godog is the official Go implementation of Cucumber. You write .feature files in plain Gherkin and bind each step to a Go function. The shutdown scenario stops being three hundred lines of channels and becomes this:

Scenario: graceful shutdown completes within the deadline
 Given a controller with two registered services
 When a shutdown signal is received
 Then both services stop in registration order
 And the controller reports a clean shutdown

The goroutine choreography doesn’t vanish, of course. It moves into the step definitions, written once and reused. What changes is that the scenario is now readable by someone who’s never opened the file before, including someone from an ops team who’ll never write a line of Go but absolutely has opinions about how shutdown should behave.

Here’s the part I want to dwell on, because it’s the part most BDD adoptions get wrong. The first design decision written down for this work was: strategic, not universal. Use Godog only where BDD adds clarity. Keep table-driven Go tests as the baseline everywhere else.

That sounds obvious written down. It is not obvious in practice, because BDD has a gravitational pull. Once a team has feature files, there’s a powerful urge to express everything as feature files, for consistency. And that’s how you end up with Gherkin scenarios for a pure function (Given the number 2, When I double it, Then I get 4) which is pure ceremony. You’ve wrapped a one-line table test in a paragraph of English and a step-definition indirection, and made it actively worse.

The test for whether BDD belongs is this: is this test a narrative, or is it a matrix?

A matrix is the same logic with many input/output pairs. That’s a table-driven test, that’s most unit tests, and Gherkin actively harms them. A narrative is a sequence of steps where the ordering and the state between steps is the thing under test, and that’s where Gherkin pays for itself. Lifecycle transitions are narratives. A user running three commands in sequence is a narrative. Doubling a number is not.

go-tool-base drew that line and stuck to it. Feature files live in features/ at the project root, where a non-Go developer can find and read them. Step definitions live in test/e2e/, kept well away from the unit tests. And the unit tests stayed exactly what they were, because they were already the right tool.

Made to fit, not bolted on

A couple of smaller decisions kept the BDD layer from feeling like a foreign object.

It runs under go test. There’s no separate Cucumber runner to install or remember. A godog.TestSuite is invoked from an ordinary TestFeatures(t *testing.T), so the BDD scenarios run in the same go test ./... as everything else. CI didn’t need a new concept bolted onto it.

And the CLI end-to-end tests build the gtb binary once and reuse it across every scenario. Compiling a binary per scenario would make the suite slow enough that people would quietly start skipping it, and a test suite people skip is just decoration. Build once, test many.

Stepping back

go-tool-base brought in Godog for BDD, but the decision worth writing about is the restraint. BDD was applied to exactly two things: the service-lifecycle state machine, where a 300-line goroutine tangle became a four-line scenario anyone can review, and CLI workflows, which are Given/When/Then by their very nature. Everywhere else, table-driven Go tests remained the baseline, because wrapping a matrix test in Gherkin makes it worse, not better.

The useful rule: BDD fits a narrative, ordered steps with meaningful state in between, and fights a matrix. Adopt it as a scalpel for the narratives. Resist the pull to turn it into a religion.

An AI interface that fits on one screen

Fri, 27 Mar 2026 00:00:00 +0000

The moment you decide a CLI tool should talk to an LLM, there’s a strong gravitational pull towards reaching for LangChain, or one of its many relatives. It’s the obvious move. It’s also, for most CLI work, a bit like hiring a removals firm to carry a single box up the stairs.

Let me explain why go-tool-base went the other way, and what “the other way” actually looks like.

The instinct, and why it overshoots

When you add AI to a tool, the instinct is to reach for the big general-purpose framework. LangChain and its relatives are capable, and they exist for a real need: orchestrating complex multi-step AI applications, with retrieval pipelines, memory stores, chains of calls, whole fleets of agents.

Now look at what a CLI tool actually needs from an LLM. It needs to send a prompt and get text back. Sometimes it wants structured data back instead of prose. Sometimes it wants to let the model call a few of the tool’s own functions. That’s pretty much the whole list.

Pulling in a framework built to orchestrate retrieval and agent swarms in order to do that is a poor trade. You take on a large new vocabulary of concepts, a wide dependency surface, and a great deal of abstraction you’ll never touch, all to perform three or four operations. The framework isn’t wrong. It’s just answering a far bigger question than the one a CLI tool is asking.

What go-tool-base chose instead

go-tool-base didn’t reach for a framework. The decision is on the record in its own design notes: before a single line was written, LangChain Go, go-openai, Vercel’s AI SDK and around ten other options were evaluated, and not one of them matched what a CLI framework actually needs. So the chat package was built deliberately small.

How small? The entire core ChatClient interface is four methods:

type ChatClient interface {
 Add(ctx context.Context, prompt string) error
 Chat(ctx context.Context, prompt string) (string, error)
 Ask(ctx context.Context, question string, target any) error
 SetTools(tools []Tool) error
}

Add appends a message to the conversation. Chat sends a prompt and returns text. Ask sends a prompt and returns a typed Go struct, the model’s answer unmarshalled straight into a value you defined. SetTools hands the model a set of your own functions it’s allowed to call. That’s the whole surface. Downstream code that uses AI never holds anything larger than this, and never has to know which provider is behind it.

The package’s own documentation has a word for this: right-sized. Large enough to solve genuine provider-abstraction complexity, small enough that the full interface fits on a single screen.

“Thin” is not the same as “does little”

This is the part worth being precise about, because “four methods” can sound like “barely does anything”, and that’s the wrong read entirely.

Behind those four methods sits genuinely awkward work. Five providers (OpenAI, Claude, Gemini, a locally installed claude binary, and any OpenAI-compatible endpoint) each with a different wire API, all normalised behind the one interface. A tool-calling loop. Structured output via JSON Schema, made to behave consistently across providers that each express it differently. Error normalisation. Token chunking.

The point of a thin abstraction is not that there’s little underneath it. It’s that the interface stays small while the implementation quietly absorbs the complexity. Four methods on the surface; five provider integrations and a tool-calling loop below the waterline. The thinness is a property of what the caller sees, not of what the package does. A reach-for-LangChain decision gets that backwards: it exposes the caller to all the machinery, whether or not the caller will ever need it.

The core stays small even as features grow

There’s a neat detail in how chat keeps the interface from creeping. The package also supports streaming responses and conversation persistence, both of which are real features with real surface area. Neither of them is in the four-method core.

Instead they’re separate, optional interfaces. A streaming-capable client also satisfies StreamingChatClient; a persistable one also satisfies PersistentChatClient. Code that wants those capabilities does a type assertion to ask for them, and code that doesn’t simply never sees them. So the common path stays four methods forever. New capabilities arrive as opt-in interfaces alongside the core, not as new methods bolted onto it. The thing that fits on one screen keeps fitting on one screen.

Extensible without forking, testable without a network

Two more properties keep the package small without making it limiting.

It’s extensible. The provider list isn’t closed. A RegisterProvider call lets any package contribute a new provider, and chat.New will route to it. You add a backend without forking pkg/chat or sending a patch upstream.

And it’s testable. The package ships generated mocks. A downstream tool’s AI features can be tested against a mock ChatClient returning canned responses, with no network, no API key, and no flakiness. Because the interface is four methods, that mock is trivial to set up and complete by construction. A sprawling framework interface is a sprawling thing to fake; a four-method one is not. (I’ll come back to testing AI code properly in a later post, because it deserves a whole article of its own.)

The right size

When a CLI tool needs AI, the instinct is a large framework like LangChain. For orchestrating retrieval pipelines and agent swarms, that’s exactly the right tool. For sending a prompt, getting a struct back, and letting the model call a few functions, it’s enormous overkill.

go-tool-base’s chat package is the deliberate alternative, chosen only after LangChain Go and a dozen others were weighed up and rejected. Its core ChatClient interface is four methods. Underneath sit five normalised providers, a tool-calling loop, structured output and error handling, but the caller sees four methods and never learns which provider is active. Streaming and persistence are opt-in interfaces beside the core, not additions to it. It extends without forking and tests without a network. Right-sized: the complexity is real, but it lives under the interface rather than in it.

The config key that quietly did nothing

Fri, 27 Mar 2026 00:00:00 +0000

I once spent the better part of an hour convinced a timeout setting was broken. I’d set it in the config file, the tool ignored it, and the code that read it looked perfectly correct. The setting was tiemout. I’d typed it wrong, and not one thing in the entire stack had thought that worth mentioning.

Config loaders are too polite

Most config loaders have the same agreeable flaw: they’ll read whatever’s in the file and quietly ignore anything they weren’t expecting. Put a key the tool doesn’t know about and it sails straight past. No error, no warning, nothing. The loader assumes you meant it, or assumes some other layer will care, and neither turns out to be true.

That politeness costs you in two directions. A key you misspelled is silently dropped, so the setting you thought you’d changed keeps running on its old value. And a key you forgot leaves the field at its zero value, which you then discover at runtime, usually at the least convenient moment, when something downstream divides by a timeout of zero. The file looked fine. It parsed fine. It was just quietly wrong, and nothing was watching for that.

The struct already knows the answer

The thing is, the program already has a complete description of what valid config looks like. It’s the struct you unmarshal into. The field names, the types, which ones matter. That description exists; it’s just not being used to check anything.

go-tool-base’s config package puts it to work. You hand it a tagged struct and it derives a schema from the tags, in pkg/config/schema.go:

// WithStructSchema derives a schema from a tagged Go struct.
// Supported tags: `config:"key" validate:"required" enum:"a,b,c" default:"value"`.
func WithStructSchema(v any) SchemaOption { ... }

So a feature’s config type carries its own rules inline:

type ServerConfig struct {
 Host string `config:"host" validate:"required"`
 Port int `config:"port" validate:"required"`
 LogMode string `config:"log_mode" enum:"text,json"`
}

There’s no second artefact to keep in sync, which is the same instinct go-tool-base leans on for structured AI output: the type is the schema, and the schema is a projection of the type, so the two can’t drift apart because there’s only one of them. Each package describes its own slice of config on its own struct, and NewSchema composes them into the schema the loaded config gets checked against.

Strict mode turns the typo into an error

Deriving the schema is half of it. The half that actually catches tiemout is this one, also from schema.go:

// WithStrictMode treats unknown keys as errors instead of warnings.
func WithStrictMode() SchemaOption { ... }

By default a key the schema doesn’t recognise is a warning: surfaced, but not fatal, which is the right call when a config file might legitimately carry extra keys for tools other than yours. Turn on strict mode and an unknown key becomes an error. tiemout isn’t in the schema, so the tool refuses to start and tells me which key it didn’t recognise, instead of shrugging and using the default for an hour while I lose my mind. The validator walks every key actually present in the file and checks it against the known set, so a typo has nowhere to hide.

What it deliberately doesn’t do

There’s one decision in here I think is worth calling out, because the obvious feature is conspicuously absent. The schema knows each field’s default value. It would be the easiest thing in the world to have validation fill in missing fields from those defaults.

It doesn’t, on purpose. Validation validates. It tells you what’s wrong and what to do about it, and it stops there. Defaults are a separate job, handled by the embedded default config that every feature ships and merges in before validation ever runs. Keeping the two apart means the validator has exactly one responsibility, and the defaults live in one place rather than being half in an embedded file and half injected by a check. A field’s default tag is there for the documentation and the error hint, not as a sneaky second source of values.

Errors you can act on

The output isn’t a bare boolean. Validation returns a result that separates the fatal from the advisory: the missing required field and the wrong type are errors that stop the tool; the unrecognised-but-harmless key is a warning that informs you without blocking. And because each problem carries the offending key by name and a hint about the fix, the message tells you what to change, in the spirit of errors that tell you what to do next.

The short version

A config loader that silently ignores keys it doesn’t recognise will, sooner or later, ignore one you meant. go-tool-base derives a validation schema straight from your tagged config struct, so there’s no separate schema to maintain, and strict mode promotes an unknown key from a quiet shrug to a real error that names the typo. It validates without injecting defaults, because defaults are the embedded config’s job and a validator with one responsibility is easier to trust. Set tiemout now and the tool tells you, which is roughly fifty-nine minutes sooner than I found out.

One variadic, and I'd already spent it

Thu, 26 Mar 2026 00:00:00 +0000

I had a constructor I was rather pleased with. Hand go-tool-base’s root command its props and as many sub-commands as you like, and off it goes. Then I needed to thread some config file paths through it, reached for the obvious “just add a parameter,” and discovered I’d already spent my one variadic with no second one going spare.

The ergonomics I’d happily bought

NewCmdRoot ends in subcommands ...*cobra.Command. That trailing ... is a small luxury: callers write NewCmdRoot(props, build, deploy, status) and never have to think about slices. Variadics are lovely for exactly this, the “and as many of these as you fancy” argument.

The parameter I couldn’t add

Then config arrived, and the root command needed to know about some extra configuration file paths. The instinct is to add a parameter. The instinct is wrong, because there’s nowhere legal to put it.

You can’t write NewCmdRoot(props, configPaths ...string, subcommands ...*cobra.Command). Go allows a function exactly one variadic, and it must be the final parameter. Two variadics results in a compile error before you’ve finished the line (assuming your IDE does compile time checks for you), and fairly so: at the call site, how would Go ever know where the strings stopped and the commands began? So the variadic I’d spent on sub-commands was spent. There wasn’t another to hand.

The choices, and the one I took

You can demote the variadic. Make it subcommands []*cobra.Command and you’re free to add configPaths []string next to it. Correct, and it breaks every existing call: NewCmdRoot(props, build, deploy) becomes NewCmdRoot(props, []string{}, []*cobra.Command{build, deploy}). Uglier at every site, to solve a problem only some callers have.

You can reach for functional options, and for plenty of go-tool-base’s constructors that is exactly what happened. But the root builder is the one everybody calls first, with the simplest signature in the codebase, and I didn’t want the common case lugging option machinery around for the sake of the rare one.

What I actually did was add a second door. From pkg/cmd/root/root.go:

// NewCmdRoot creates the root command with Props wiring and optional subcommands.
func NewCmdRoot(props *p.Props, subcommands ...*cobra.Command) *cobra.Command {
	return NewCmdRootWithConfig(props, []string{}, subcommands...)
}

func NewCmdRootWithConfig(props *p.Props, configPaths []string, subcommands ...*cobra.Command) *cobra.Command {
	// ...
}

The new argument goes in as a plain []string, sat before the variadic, which is perfectly legal: one variadic, still last. Callers who care about config use NewCmdRootWithConfig explicitly, and NewCmdRoot becomes a one-line wrapper that delegates with an empty slice, so every existing caller compiles untouched and none the wiser. Two doors into the same room, granted, but the original door is exactly where everyone left it.

What it comes down to

A trailing variadic is a slot you fill once. It buys gorgeous ergonomics for the “as many as you like” argument, and in exchange it quietly forecloses on ever appending another parameter, because the next one has nowhere to stand. Once it’s spent, new arguments come in as ordinary parameters before the variadic, and the kind thing to do for your callers is to put that behind a second constructor and let the original keep delegating.

So spend the variadic deliberately. Give it to the argument that genuinely wants to be a loose list, not the first one that happens to be plural, because you don’t get a second.

Half your users don't have eyes

Wed, 25 Mar 2026 00:00:00 +0000

Run a command in your favourite CLI tool and look at what comes back. Colour. Neatly aligned columns. A friendly little summary sentence. Lovely… if you happen to be a human with eyes.

But a good half of any tool’s users aren’t people at all. They’re scripts, CI pipelines, bits of automation. And that pretty output you’re so proud of is, to them, actively hostile.

Your tool has two audiences and only serves one

I made more or less this same point about AI assistants when I argued that your CLI is already an AI tool. The machines are users too. Here it isn’t an AI doing the calling, it’s a humble shell script, but the principle is identical.

Run a CLI command and look at what comes back. Colour. Aligned columns. A friendly summary sentence. It’s designed for a person reading a terminal, and for a person reading a terminal it’s great.

Now picture the other half of your users. A deploy script that needs to know which version is installed. A CI job that runs doctor and wants to fail the build on one specific check. A bit of automation gluing your tool to three others. None of them have eyes. They have parsers.

So what do they do with your beautiful human output? They butcher it. They grep for a keyword, awk out the third field, sed off a prefix. It works in the demo. Then someone rewords a status line, or adds a column, or the colour codes shift, and every script downstream breaks at once. Silently, too, because a broken grep returns nothing rather than an error. You changed a sentence and quietly took out somebody’s pipeline without ever knowing.

The human-readable output was never the contract. It just got used as one, because it was the only output there was.

Give the machines their own channel

The fix is not to make the human output more parseable. That’s a trap. You’d be constraining prose meant for people in order to satisfy programs, and end up serving neither of them well. The fix is to give programs their own output format, declared and stable, kept well away from the prose.

So every command built with go-tool-base gets a --output flag. Leave it alone and you get the friendly human rendering. Pass --output json and you get something a parser can actually rely on.

And not just some JSON. JSON with a fixed shape.

One envelope, every command

The temptation with JSON output is to let each command emit whatever structure happens to suit it. Don’t. A consumer scripting against five of your commands then has to learn five shapes, and “where’s the actual payload?” has a different answer every single time.

go-tool-base wraps every command’s JSON in one standard Response envelope:

{
 "status": "success",
 "command": "deploy",
 "data": {
 "environment": "production",
 "version": "1.4.0",
 "replicas": 3
 }
}

status says how it went. command says what produced it. data holds the command-specific payload, and only the payload. Every built-in command (version, doctor, update, init) emits exactly this shape. So does every command you write, because pkg/output hands you the envelope rather than letting you freelance:

format, _ := cmd.Flags().GetString("output")
w := output.NewWriter(os.Stdout, output.Format(format))

return w.Write(output.Response{
 Status: output.StatusSuccess,
 Command: "deploy",
 Data: result,
})

The consumer-side payoff is the whole point. A script can check .status without ever touching .data. It can pull .data.version and know the field is there because it’s typed, not scraped. It learns the envelope once, and every command in your tool, and every tool built on the framework, honours it. The contract is explicit, versioned, and the same everywhere, which is precisely what the abused human output never was.

The human output gets to relax

There’s a quiet second benefit, and it’s my favourite kind: the sort you get for free. Once programs have their own reliable channel, the human output is freed. It no longer has to stay accidentally parseable. You can reword a status line, add colour, restructure a table, make it genuinely nicer to read, and not break a single script, because no script is reading it any more. They’re all over on --output json, where the real contract lives.

Two audiences, two formats, each one actually suited to its reader. That’s the deal a CLI tool ought to be offering, and most of them don’t.

In short

A CLI tool that only emits human-readable output is only half-built, because half its users are programs that end up grep-ing prose and shattering the moment that prose changes. go-tool-base gives every command a --output json flag and one standard Response envelope (status, command, data) used identically by every built-in command and by anything you write through pkg/output. Machines get a stable, explicit, learn-it-once contract; humans get output that’s now free to be properly readable, because nothing fragile depends on its wording any more.

If your tool will ever be called by another program (and it will), give that program a front door. Don’t make it climb in through the window.

Lifecycle management for when your CLI grows up into a service

Tue, 24 Mar 2026 00:00:00 +0000

There’s a moment in the life of a lot of CLI tools where they stop being a CLI tool. Nobody quite decides it. It just happens. Someone needs the thing to also expose a little HTTP endpoint, or poll a queue, or run a scheduler, so it grows a serve command… and the honest command-line utility you wrote is suddenly a long-running service wearing a CLI as a hat.

And a service needs a whole pile of production plumbing that a one-shot command never did.

The command that stops being a command

go-tool-base is CLI-first. It is not CLI-only, and the reason is a pattern I’ve watched play out more times than I can count.

A tool starts its life as an honest command-line utility. It runs, it does its thing, it exits. Then someone needs it to expose a small HTTP endpoint. Or poll a queue. Or run a scheduler. So it grows a serve command, or a run command, and the moment it does, the thing that was a CLI tool is now a long-running service that happens to have a CLI bolted on the front.

And a long-running service needs a whole category of plumbing a one-shot command never did. It has to start things up in a sensible order. It has to shut them down gracefully when someone sends a SIGTERM, finishing in-flight work rather than dropping it on the floor. It has to tell an orchestrator whether it’s alive, and whether it’s ready. It has to do something sensible when one of its internal services quietly falls over at 3am.

Hand-rolled, that’s a few hundred lines of goroutine choreography, channel-wrangling and signal handling that every such tool reinvents, slightly differently and slightly wrong each time. It’s the first-afternoon problem all over again, just turning up later in the project’s life. So go-tool-base ships it: pkg/controls.

A controller and the things it controls

The model is small. A Controller manages any number of services. You register each with Register(id, opts...) and describe it with functional options: WithStart takes a StartFunc, WithStop a StopFunc. An HTTP server, a background worker, a scheduler, anything with a “begin” and an “end”.

You register your services with the controller and it owns their collective lifecycle. They share a common set of channels (errors, OS signals, health, control messages) so the whole set can react together. A SIGTERM doesn’t get caught by one service off in a corner; it reaches the controller, and the controller takes everything down in order, each StopFunc handed a context with a deadline so that one sulking service can’t wedge the whole shutdown forever.

That ordering and timeout handling is the bit nobody enjoys writing and everybody needs. Centralising it means a tool that adds a second service later inherits correct coordinated shutdown for free, rather than discovering on its first production SIGTERM that it only half shuts down.

Probes, because something is usually watching

If the service ends up in Kubernetes (and a lot of them do) the orchestrator wants to ask two different questions, and they really are different questions.

Liveness: are you alive, or are you wedged and in need of a kill? Readiness: are you alive and able to take traffic right now? A service can quite easily be live but not ready… still warming a cache, still waiting on a dependency. Conflate the two and you get yourself killed during a slow startup, or sent traffic before you can actually serve it.

controls keeps them separate. You attach a WithLiveness probe and a WithReadiness probe to a service, each just a function returning a health report, and the controller exposes them. The tool answers Kubernetes honestly, in Kubernetes’ own terms, without you hand-wiring two more HTTP handlers.

Self-healing, but only if you ask

The last piece is what happens when a service fails. A worker’s StartFunc returns an error. Health checks start failing. In a hand-rolled setup this is where you either crash the whole process or write yourself a bespoke restart loop.

controls has a supervisor that can restart a failed service for you, and the important word in that sentence is can. It’s off by default. A service is only supervised if you hand it a RestartPolicy at registration:

controls.WithRestartPolicy(controls.RestartPolicy{
 MaxRestarts: 5,
 InitialBackoff: time.Second,
 MaxBackoff: 30 * time.Second,
 HealthFailureThreshold: 3,
})

With a policy in place, the controller restarts the service if its StartFunc errors out, or if it racks up more consecutive health-check failures than the threshold allows. Restarts back off exponentially, from InitialBackoff up to a MaxBackoff ceiling, so a service that’s failing because its database is down doesn’t sit there hammering that database flat with a tight restart loop. MaxRestarts caps the attempts, because a service that’s failed five times in a row is not going to be rescued by a sixth go, and at that point honest failure beats a thrashing pretence of health.

Opt-in matters here. Automatic restarts are exactly right for a resilient daemon and exactly wrong for a tool where a failure should stop the line and get a human’s attention. The framework doesn’t make that call for you. It gives you the supervisor and lets you point it at the services that genuinely want it.

The bottom line

A surprising number of CLI tools become long-running services the day they grow a serve command, and the day they do, they need coordinated startup, graceful ordered shutdown, real liveness and readiness probes, and a considered answer to a service falling over. That’s a few hundred lines of fiddly, easy-to-get-wrong plumbing.

pkg/controls provides it: a Controller over Controllable services with shared channels and deadline-bounded graceful shutdown, separate Kubernetes-style liveness and readiness probes, and an opt-in supervisor that restarts failed services with exponential backoff and a restart ceiling. Your tool can start as a command and grow into a daemon without that growth turning into a rewrite.

CLI-first, but not stuck there.

Middleware for CLI commands, not just web servers

Tue, 24 Mar 2026 00:00:00 +0000

Every CLI tool past a certain size grows a category of logic that doesn’t really belong to any one command, and yet has to happen for loads of them. Timing. An auth check. Panic recovery, so a crash becomes a clean error instead of a stack-trace all over someone’s terminal. A log line saying the command started and how it finished.

Web frameworks sorted this out years ago. CLIs, for some reason, mostly still copy-paste it around.

The logic that belongs to no single command

That category of logic doesn’t belong to any one command, yet needs to happen for many of them. Time how long the command took. Check the user is authenticated before a command that needs it. Recover from a panic so a crash becomes a clean error rather than a stack-trace vomited across the screen. Log that the command started and how it ended.

None of that is the command’s job. The deploy command’s job is to deploy. But timing and recovery and auth still have to happen around it, and around build, and around sync.

Put that logic inside each command’s RunE and you’ve copied the same six lines into thirty functions, which means thirty places to fix when the logging format changes and thirty chances to forget one of them. Cross-cutting concerns copied by hand don’t stay consistent. They drift, every time.

Web frameworks already solved this

This is not a new problem. It’s about the oldest problem in web frameworks, and they settled on an answer a long time ago: middleware. Gin has it, Echo has it, every HTTP stack you’ve ever touched has it. A middleware is a wrapper that sits around a handler, runs its cross-cutting logic, and calls through to the handler in the middle.

A CLI command is, structurally, just a handler too. So go-tool-base brings the same pattern to the Cobra command tree, with the same functional Chain shape:

type Middleware func(
 next func(cmd *cobra.Command, args []string) error,
) func(cmd *cobra.Command, args []string) error

A middleware receives the next handler in the chain and returns a new handler that wraps it. You compose a stack of them, and each command’s real RunE runs in the middle of the onion. Write the timing logic once, as one middleware, and every command in the chain is timed. Change the log format once and all thirty commands change with it, because there was only ever one copy. (The “write it once, in a place where everyone inherits it” drum again, which I will keep banging until the series runs out.)

“But Cobra already has PreRun”

It does, and this is the objection worth answering properly, because Cobra ships PersistentPreRun and PreRun hooks and they look, at a glance, like they cover this.

They don’t, and the reason is structural. A PreRun hook is a thing that happens before the command. That’s all it is. It can’t run anything after. It can’t wrap the command in a defer. It can’t catch a panic the command throws. It can’t measure how long the command took, because measuring a duration needs a start point and an end point, and the hook only owns the start.

A middleware wraps the entire execution. Because it’s a function that calls next() in its own body, it straddles the command (with the handler signature abbreviated to HandlerFunc here for readability):

func TimingMiddleware(next HandlerFunc) HandlerFunc {
 return func(cmd *cobra.Command, args []string) error {
 start := time.Now()
 err := next(cmd, args) // the command runs here
 log.Debug("command finished", "took", time.Since(start))
 return err
 }
}

Before, after, and around. A recovery middleware can put a defer recover() in place that a PreRun hook structurally cannot. An auth middleware can check a condition and return an error instead of calling next() at all, refusing to let the command run in the first place. PreRun can’t veto the command; it runs, and then the command runs regardless.

PreRun is a notification that the command is about to happen. Middleware is control over whether and how it happens. For genuine cross-cutting concerns you need the second thing, not the first.

To sum up

Timing, auth, recovery and logging are cross-cutting concerns: necessary for many commands, owned by none. Hand-copied into every RunE, they drift out of sync. Web frameworks fixed this with middleware years ago, and a CLI command is structurally just another handler.

go-tool-base brings the functional Chain middleware pattern to the Cobra command tree. A middleware wraps a command’s whole execution, so it acts before and after and can decide whether the command runs at all… strictly more than Cobra’s PreRun hooks, which only fire beforehand and can’t wrap, recover, time, or veto. Write the concern once, wrap the chain, and every command inherits it consistently.

A logging interface that doesn't leak its backend

Mon, 23 Mar 2026 00:00:00 +0000

The same tool, in two different lives, wants two completely different kinds of log.

On my laptop I want logs I can actually read: colour, alignment, friendly timestamps. The very same tool running as a daemon in a container wants none of that. It wants structured JSON, one object a line, ready for a log aggregator to swallow. And in a test I want the logger to shut up entirely. The interesting question is what it costs you to move between the three.

The same tool wants different logs

On a developer’s machine the tool is a CLI. You want logs that are pleasant to read in a terminal: colour, alignment, human-friendly timestamps. The charmbracelet logger does that beautifully.

Then the very same tool grows a serve command and gets deployed as a daemon in a container. Now coloured terminal output is worse than useless. The log aggregator wants structured JSON, one object per line, machine-parseable. slog does that.

And in tests you want neither. You want the logger to exist, satisfy the interface, and stay completely silent.

That’s three different logging backends, wanted by one tool across three different lives. So what does switching between them actually cost?

What it costs depends on what your packages imported

If your packages import a concrete logger, if pkg/config and pkg/setup and twenty others each have import "github.com/charmbracelet/log" and take a *log.Logger, then the backend is welded into the entire codebase. Switching to JSON for the container build means editing the import and the parameter type in every single one of those packages. The backend has leaked. A detail that should have been one decision has become a property of a hundred files.

go-tool-base doesn’t let it leak. Every package in the framework accepts a logger.Logger, an interface, and nothing else. No package anywhere imports a concrete logging library. A package states, in its types, “I need something I can log through”, and stops right there. It has no idea, and no way to find out, what’s actually on the other end.

// what every package depends on
type Logger interface {
 Debug(msg string, keyvals ...any)
 Info(msg string, keyvals ...any)
 Warn(msg string, keyvals ...any)
 Error(msg string, keyvals ...any)
 // ...
}

The backend gets chosen once, at the top, when the tool builds its Props. It travels down to every package as the interface, through the Props container. The packages underneath never see the concrete type, so the concrete type can change without a single one of them noticing. (There’s that “decide it once, in one place” theme again. I did warn you it runs through everything.)

Three backends, and the swap is one line

go-tool-base ships three implementations of that interface:

charmbracelet (logger.NewCharm(w, opts...)). Coloured, styled, for humans at a terminal. The CLI default.
slog JSON, a slog-backed backend emitting structured JSON, for daemons and containers feeding a log aggregator.
noop, which does precisely nothing, for tests that want a real Logger and total silence.

Switching the tool from a friendly CLI logger to container-ready JSON is a change to the one line in main() that constructs the logger. That’s the lot. pkg/config doesn’t change. pkg/setup doesn’t change. None of the twenty packages change, because none of them ever knew which backend they had. The decision was always one line; the interface is what kept it one line.

The noop backend deserves its own mention, because it’s the one people underrate. A test for a command shouldn’t be spraying log output all over the test run, but the command still needs a non-nil Logger to function. logger.NewNoop() gives you exactly that: interface satisfied, output binned, test quiet. And because it’s just another implementation of the same interface, no test needs any special logging machinery. It passes a different backend, exactly the way the container build does.

The general shape

There’s nothing exotic going on here. It’s “depend on interfaces, not implementations”, which every Go developer has had drilled into them at some point. The bit worth holding onto is where the rule actually pays out, and it’s at the seams between a stable core and a detail you know full well you’ll want to vary.

A logging backend is exactly such a detail. You will want it different in a terminal, in a container, and in a test. So the thing your code depends on has to be the interface, and the concrete backend has to be chosen at one well-known point and nowhere else. Get that boundary right and “we need JSON logs in production” is a one-line change. Get it wrong and it’s a refactor and a bad afternoon.

What it comes down to

One tool legitimately wants three different logging backends across its life: coloured output in a terminal, structured JSON in a container, silence in a test. The cost of moving between them is decided entirely by whether your packages imported a concrete logger or an interface.

go-tool-base’s packages depend only on logger.Logger, never a backend. Three implementations ship (charmbracelet, slog JSON, noop) and the backend is chosen once, in main(), then carried everywhere as the interface through Props. Switching is one line at the top, because the detail was never allowed to leak into the hundred files below it.

Errors that tell the user what to do next

Sun, 22 Mar 2026 00:00:00 +0000

Here’s an error message I’ve been on the receiving end of more times than I’d care to count:

error: failed to read config file

True. Also completely useless! I now know something is broken and I haven’t the faintest idea what to do about it. Which file? Why couldn’t it be read? Should I create it, run some init command, fix a permission, set an environment variable? The message states the problem and then abandons me at it, rather like a sat-nav cheerfully announcing “you have arrived” in the middle of a motorway.

A message is not a fix

The instinct, the moment you notice this, is to go and write a better message:

error: failed to read config file at ~/.config/mytool/config.yaml.
Run 'mytool init' to create one, or set MYTOOL_CONFIG to point at an existing file.

Better for the human, no question. But look at what you’ve just done to the error as a value. The recovery advice is now welded into the error string. Any code that wants to ask “is this the config-missing error?” is reduced to substring-matching English prose. Reword the advice and you break the check. So you’ve helped the user and quietly sabotaged the program at the same time, because you’ve made one poor little string do two completely incompatible jobs… being a stable identity for code, and being friendly guidance for people.

Why I changed error libraries

go-tool-base started out on github.com/go-errors/errors. It’s a perfectly fine library and it gave us stack traces. What it didn’t give us was any way to attach human guidance to an error without shoving it into the message string. So the codebase did exactly the daft thing I just described: multi-line suggestion text baked straight into errors.Errorf calls, user-facing content and programmatic identity all mashed into one value.

That’s the whole reason for the migration to github.com/cockroachdb/errors. Not novelty, and not because I fancied a weekend of find-and-replace. One specific capability: cockroachdb/errors lets you attach a hint to an error as a separate, structured field.

return errors.WithHint(
 errors.New("failed to read config file"),
 "Run 'mytool init' to create one, or set MYTOOL_CONFIG to point at an existing file.",
)

Now there are two things, cleanly apart. errors.New("failed to read config file") is the identity… stable, matchable, the program’s handle on the error. The hint is the guidance… for the human, and rewordable as much as you like without breaking a single check, because no check ever looks at it. errors.Is and errors.As work properly through every wrapper layer, so code matches on identity and never has to read prose.

The migration brought a few other things worth having. Stack traces print with a plain %+v instead of a type assertion. Errors can carry structured, machine-readable metadata. Multiple errors from concurrent work can be combined as a first-class value. But the hint is the one that actually changed the user’s day, because the hint is the recovery step, stored where it belongs.

One door out, and it knows where the help is

Separating the hint is only half of it. The other half is making sure those hints actually reach the user, every time, and that comes down to having a single way out.

Every go-tool-base command returns its errors the idiomatic Cobra way, through RunE. They all funnel into one Execute() wrapper at the root, which routes every error (runtime failure, flag parse error, pre-run failure) through one ErrorHandler. One door out. So error presentation gets decided in exactly one place, and no command can render an error differently from the command sat next to it.

And because there’s one handler, it can pull off something the individual commands never could. The framework knows your tool’s metadata, including its configured support channel, be it a Slack workspace or a Teams channel. So the error handler can finish a fatal error not just with the what and the recovery hint, but with where to go if the hint didn’t help:

error: failed to read config file
hint: Run 'mytool init' to create one, or set MYTOOL_CONFIG.
 Still stuck? Ask in #mytool-support on Slack.

The user is never left at a dead end. The error tells them what broke, the hint tells them the most likely fix, and if that’s still not enough the handler tells them which door to go and knock on. A failure becomes a signpost instead of a full stop.

The short version

An error that only reports what went wrong leaves the user stranded, and the obvious fix (writing the recovery advice into the message) quietly wrecks the error as a value, because now your code has to substring-match prose just to work out what it’s looking at.

go-tool-base moved from go-errors to cockroachdb/errors to get hints: a structured, separate field for human guidance that leaves the error’s identity clean for errors.Is and errors.As. Every command’s errors leave through one Execute() wrapper and one ErrorHandler, so presentation stays consistent, and because that handler knows the tool’s support channel it can point a stuck user at real help.

State the problem for the program. Give the fix to the human. And for pity’s sake, keep the two in different fields.

Many embedded filesystems, one merged view

Sat, 21 Mar 2026 00:00:00 +0000

Go’s embed package is one of those features that makes you slightly giddy the first time you use it. One //go:embed directive and your default config, your templates, your docs are all baked into the binary. The tool just works the moment it’s installed, with nothing external to lose or forget to ship.

And then you go and build something modular on top of it, and you discover the catch nobody warned you about.

`embed.FS` is an island

An embed.FS has a property that’s easy to miss until it bites: it’s local to the package that declared it. The //go:embed directive can only see files at or below its own source file. So in any project bigger than a toy, you don’t have an embedded filesystem. You have many. The root package embeds one. Each feature, each subcommand that ships its own templates or defaults, embeds another. They’re islands, one per package, and Go gives you no native way to make them behave as a whole.

For most files that’s perfectly fine. A feature’s templates can stay on the feature’s island; nothing else needs them.

It stops being fine the moment features need to contribute to something shared.

The shared-config problem

Here’s the case that forces the issue. A go-tool-base tool has a global config.yaml of defaults, embedded at the root. Now you add a feature, and that feature has its own configuration keys, with their own sensible defaults.

Where do those defaults go?

The naive answer is: edit the root config.yaml and add the feature’s section. And that’s a genuinely bad answer, because it inverts the dependency. The root config now has to know about every feature. Add a feature, edit the centre. Remove one, edit the centre again. The central file becomes a pinch point that every feature has to reach into, and a modular architecture where you can’t add a module without editing the core isn’t really modular at all… it just has more files.

What you actually want is for the feature to ship its own slice of default config, on its own island, and for the global config the tool reads to somehow already contain it. The feature contributes; the centre doesn’t budge.

`props.Assets`: merge the islands

That’s the job of props.Assets. (Yes, it lives on Props, the load-bearing container I keep going on about. Most of the good stuff does.) It’s a layer that implements the standard fs.FS interface, and into it you Register each embed.FS under a name:

// root main.go
Assets: props.NewAssets(props.AssetMap{"root": &assets}),

// a feature's command constructor
//go:embed assets/*
var assets embed.FS

func NewCmdFeature(p *props.Props) *cobra.Command {
 p.Assets.Register("feature", &assets)
 // ...
}

Now Props carries one Assets value that represents all the islands as a single filesystem. The root’s files and every registered feature’s files, addressable through one fs.FS. Each registration is named, so the islands stay individually identifiable, but they read as one.

That alone solves the addressing problem. The genuinely clever part is what happens for structured files.

Opening a file that exists in several places

When you Open a path through props.Assets and that path has a structured extension it recognises (.yaml, .yml, .json, .toml, .csv, and a few more) it doesn’t simply return the first match it stumbles across. It does this:

Discovery. It finds every instance of that path, across every registered filesystem.
Parsing. It unmarshals each one.
Merging. It deep-merges the parsed data, using mergo.
Re-serialisation. It hands you back a single fs.File whose contents are the combined, merged result.

So picture the shared-config problem again, only solved this time. The root ships a config.yaml with the base defaults. Each feature ships a config.yaml on its own island carrying only its own keys. Nobody edits anybody else’s file. When the init command opens config.yaml through props.Assets, it doesn’t get the root’s copy. It gets the deep-merge of the root’s copy and every registered feature’s copy: one config.yaml that contains every default in the tool, assembled at runtime from contributions that never knew about each other.

A feature contributes its defaults simply by existing and registering. The centre never changes. That’s the modular property the naive approach couldn’t give you, and it generalises well beyond config… the same merge applies to a shared commands.csv, or any structured file features want to add rows or keys to.

There’s also a Mount method for attaching an arbitrary fs.FS at a virtual path, which is handy for surfacing something external (a temp directory, say) as part of the same tree. But the structured merge is the feature that really earns Assets its place.

Boiling it down

embed.FS is per-package by design, so a modular CLI ends up with many embedded filesystems, one island per feature. Most of the time that’s fine. It fails specifically when features need to contribute to a shared resource like the global config.yaml, because the naive fix forces every feature to reach in and edit a central file.

props.Assets merges all the registered islands into a single fs.FS, and for structured files it goes further: opening a .yaml, .json or .csv discovers every copy across every island, deep-merges them, and returns the combined whole. A feature drops its own defaults onto its own island, registers, and the merged config the tool reads already includes them. Contribution without coupling, which is rather the whole point of being modular in the first place.

Props: the container that does the heavy lifting

Sat, 21 Mar 2026 00:00:00 +0000

I name-dropped Props back in the introduction and then rather glossed over it, which was a bit unfair of me, because it’s the single most important design decision in the whole framework. So let’s give it the attention it actually deserves.

And the best place to start, oddly enough, is the name.

Start with the name

The container at the centre of go-tool-base is called Props, and the name is doing real work, so we’ll start there.

It is not short for “properties”, though it does hold a few. A prop is the heavy timber or steel beam that stops a structure quietly collapsing in on itself. And for anyone who follows the rugby: a prop is the position in the scrum, the broad-shouldered forward whose entire job is to provide structural support so everyone else can get on with the game.

That’s the design brief, in a single word. Props is not where the clever, flashy work happens. It scores no tries. It’s the thankless, load-bearing thing that holds the framework up so that your actual command logic gets to be the interesting part. Understand the name and you understand what the struct is for.

What it carries

Props is the single object passed to every command constructor in a go-tool-base tool. It holds the dependencies a command might need:

Tool, metadata about the CLI (name, summary, release source).
Logger, the logging abstraction.
Config, the loaded configuration container.
FS, a filesystem abstraction (afero), so a command never touches the real disk directly.
Assets, the embedded-resource manager.
Version, build information.
ErrorHandler, the centralised error reporter.
Collector, the telemetry collector (always present, a no-op when telemetry is off).

A command constructor’s signature is, accordingly, boring on purpose:

func NewCmdExample(p *props.Props) *cobra.Command { ... }

One parameter. Everything the command could possibly need is reachable through it. No globals, no init()-time wiring, no twelve-argument constructor that quietly grows a thirteenth argument next month.

Why a struct, and not `context.Context`

Here’s the design decision I actually want to defend, because it’s the one Go developers tend to raise an eyebrow at. Go already has a well-known way to carry things through a call tree: context.Context. So why not just put the logger and the config in the context and pass that around?

Because context.Context carries its values as interface{}, and that’s the wrong trade for dependencies.

Pull a dependency out of a context and you get this:

l := ctx.Value("logger").(logger.Logger) // a runtime type assertion

That one line has two separate ways to hurt you. The key is a bare string, so a typo compiles perfectly happily and then fails at runtime. The type assertion is unchecked, so if the wrong thing is sitting under that key, your tool panics in front of a user. Neither failure is visible to the compiler. Neither is visible to your IDE. You find out when it breaks, which is to say at the worst possible time.

Pull the same dependency out of Props and you get this:

p.Logger.Info("starting") // a field access

p.Logger is a typed field. If it doesn’t exist, or you’ve used it wrong, the code simply doesn’t compile. Your IDE autocompletes it. Refactor the Logger interface and every misuse lights up at build time. There’s no runtime type assertion, because there’s no interface{} to assert from in the first place.

context.Context is the right tool for what it was designed for: cancellation, deadlines, request-scoped signals that genuinely cross API boundaries. It’s the wrong tool for “here are my program’s services”, because it trades away the compiler’s help for a flexibility you really don’t want here. Dependencies should be declared, somewhere the compiler checks them. Props is that somewhere.

What you get back for it

That one decision pays out in three currencies.

Testability. A command is now a pure function of its Props. To test it, you build a Props with the doubles you want (an in-memory FS instead of the real disk, a no-op Logger, a config you’ve populated by hand) and call the constructor. No global state to reset between tests, no monkey-patching, no init() order to puzzle over. The dependency is an argument, so the test just passes a different one.

Consistency. Cross-cutting changes have exactly one place to happen. When the global --debug flag flips the log level, it does so on the Logger inside Props, and because every command reads its logger from the same Props, every command gets the new level. No command can drift, because none of them owns its own copy.

Extensibility. Adding a new framework-wide service is just adding a field to one struct. Every command can immediately reach it; none of them needed touching to make it reachable.

To sum up

Props is the dependency-injection container at the heart of go-tool-base: one struct, passed to every command, holding the logger, config, filesystem, assets, error handler and tool metadata. It’s a concrete struct rather than a context.Context payload entirely on purpose, because dependencies belong somewhere the compiler can check them, not behind a string key and a hopeful runtime type assertion. That single choice buys you testability, consistency and easy extension.

The name says it best, really. Props doesn’t score the tries. It’s the broad-shouldered thing in the scrum that stops the whole framework folding, so the rest of your code is free to go and play.

Design your whole CLI in one file

Fri, 20 Mar 2026 00:00:00 +0000

Here’s a question that sounds trivial and really isn’t: where, exactly, does a CLI tool’s structure live? Not the logic of each command… the structure. Which commands exist, what they’re called, which flags they take, what’s nested under what.

I’d never properly thought to ask it until go-tool-base forced me to, and the answer turned out to be a little bit embarrassing.

Where does a CLI’s structure actually live?

Picture a CLI tool with twenty commands, some nested under others. In a typical project, where does its structure live? The answer is “smeared across the codebase”. It’s in twenty cmd.go files. It’s in the AddCommand calls that stitch them together. It’s in the flag registrations. To understand the shape of the tool you have to read all of it and assemble the picture in your head, because the picture exists nowhere as a single thing you can point at.

That’s a strange state of affairs for the single most important design fact about a CLI. The command tree is the tool’s interface, it’s the thing users actually touch, and yet it hasn’t got a home.

The manifest gives it one

go-tool-base’s generator gives that structure a home: .gtb/manifest.yaml. The manifest is a single readable file describing the command tree. Every command, its name, its short description, its flags, its place in the hierarchy, whether it carries assets or an initialiser. The shape of the whole tool, in one place you can open and read top to bottom.

And the manifest isn’t documentation about the project. It’s the thing the project’s wiring is generated from. When you run regenerate project, the generator reads the manifest and rebuilds the boilerplate to match it: the command registration, the AddCommand wiring, the flag definitions. The manifest is the source of truth, and the Go wiring is its output.

Design-first, when you want it

This unlocks a way of working that the smeared-across-the-codebase approach simply can’t offer. You can design the interface first, in the manifest, and let the code follow.

Want to rename a command? Edit one line in the manifest, run regenerate, and the rename propagates through every wiring file that ever mentioned it. Want to move a subcommand under a different parent? Change its place in the manifest hierarchy and regenerate. Want to add a flag to three related commands? Add it in the manifest, in three obvious places, and regenerate, instead of going on a little hunting expedition for three flag-registration blocks scattered across the tree.

You’re editing the tool’s interface as a design, in the file whose entire job is to hold that design, and the generator does the mechanical donkey-work of making the code reflect it. The thing you change is the thing that describes the structure. The code is downstream.

If that shape sounds familiar, it should. It’s the same instinct behind spec-driven and test-driven development: write down what the thing should be before you assemble how it works, and keep that statement of intent as a first-class, living artefact rather than a comment that quietly rots in a corner. The manifest is a spec for your command tree, and regenerate is what keeps the implementation honest to it.

It doesn’t trap you

There’s an obvious worry about any generated-from-a-manifest system: am I now locked into editing the manifest? What if I just want to open a Go file and write some Go like a normal person?

You can. The generator is careful not to own everything. It owns the wiring (the registration and the structural boilerplate) and it leaves your command logic well alone. The RunE function where your command actually does its work is yours; the manifest hasn’t got an opinion about it. And the generator tracks the files it produces by content hash, so if you do hand-edit something it generated, regeneration notices and asks before overwriting rather than steamrolling you. That mechanism turned out interesting enough to get its own post.

So the manifest is an option, not a cage. Design-first via the manifest when that suits the change. Drop into Go directly when that suits it better. The two stay in sync because regeneration reconciles them, not because one of them has been forbidden.

Pulling it together

A CLI’s command tree is its most important design surface, and in most projects it has no single home… it gets reconstructed in your head from twenty scattered files every time you need to reason about it. go-tool-base gives it one: .gtb/manifest.yaml, a readable description of the whole tree that the generator rebuilds the wiring code from. Edit the manifest, run regenerate, and the boilerplate follows.

It makes CLI structure something you design in one place, in the spirit of spec-driven development, while still leaving you free to write Go directly when that’s the better tool for the job. The manifest is the spec for your interface. The generator just keeps the code faithful to it.

Scaffolding that respects your edits

Fri, 20 Mar 2026 00:00:00 +0000

When I introduced go-tool-base I made a passing promise to come back to “the generator that won’t clobber your edits”. This is me keeping it, partly because it’s the feature I’m quietly most proud of, and partly because it took the most head-scratching of anything to get right.

The problem it solves is one that every code generator runs into eventually, usually the hard way and usually at the worst possible moment.

The generator’s awkward second act

A project generator has an easy first act. gtb generate project, and you’ve got a complete, wired, idiomatic Go CLI project. Everyone’s happy, me included.

The second act is the hard one. The framework moves on. A convention changes, a new built-in capability appears, the recommended CI shape shifts. Your project, scaffolded three months ago, is now subtly out of date, and you’d quite like the generator to drag it back up to spec.

Except by now it isn’t a fresh scaffold. It’s your project. You tuned the CI workflow. You rewrote the justfile. You added a stanza to the Dockerfile that took an afternoon and a fair bit of swearing to get right. The generated files and your edited files are one and the same files.

A naive generator handles this with breathtaking confidence: it regenerates everything from the template and overwrites the lot. Run it once, lose your afternoon. You learn that lesson exactly once and then never run regeneration again, which means the upkeep feature you were sold is dead on arrival. A scaffold you can’t safely re-run is just a one-shot cp with extra steps.

What the generator needs to know

The thing standing between “safe to overwrite” and “absolutely do not” is a single fact: has this file changed since the generator last wrote it?

If it hasn’t, the file is still pristine boilerplate and the generator owns it. Overwrite away. If it has, a human has been in there, and the generator must not touch it without asking first.

The generator can’t just eyeball that, of course. It needs a record. So every time gtb generate writes a file, it computes a SHA-256 of the content and stores it in the project’s manifest, .gtb/manifest.yaml, as a Hashes map of relative path to hash. The manifest is the generator’s memory of the exact bytes it last produced.

Regeneration becomes a three-way decision

With that record in hand, regeneration stops being “overwrite everything” and becomes a per-file decision with three branches.

The file doesn’t exist. Easy. Write it, store its hash.

The file exists and its current hash matches the manifest. It’s byte-for-byte what the generator last wrote, so nobody has touched it. The generator owns it outright, regenerates from the template and updates the stored hash. No prompt, no fuss. This is the common case, and it’s silent precisely because it’s safe.

The file exists and its hash does not match. Someone has been in there since generation. The generator stops and asks. It will not silently overwrite your hard-won afternoon. You decide: take the new version, or keep yours.

The detail I’m genuinely fond of is what happens when you decline. Declining is non-fatal. Generation carries on with the rest of the files, and the manifest keeps the file’s stored hash rather than dropping it. That matters more than it looks, because it means the file stays tracked. Next time you regenerate, the generator can still tell that file has been modified, and still asks. Skipping a file once doesn’t quietly evict it from the generator’s awareness forever. It stays a known, watched, customised file across every future run.

When you want it to stop asking

Per-file prompting is the right default, but for files you’ve permanently taken ownership of, being asked on every single regeneration is just noise. If you’ve rewritten the CI workflows wholesale and you are never, ever going back to the generated version, you don’t want a prompt. You want the generator to leave them well alone and not bring it up again.

That’s what .gtb/ignore is for. It sits next to the manifest and takes gitignore-style patterns:

# I own the CI workflows now
.github/workflows/**

# ...except the release workflow, keep that managed
!.github/workflows/release.yml

# and my build config
justfile
Dockerfile

Anything matching is skipped during regeneration with no prompt at all. Patterns evaluate top to bottom and later ones win, so the negation (!) behaves the way you’d expect from .gitignore: exclude a whole directory, then claw one file back.

It’s a deliberate escalation ladder. Unmodified files are handled silently. Modified files get a prompt. Files you’ve formally claimed get total silence. Each rung asks for less of your attention than the last, and you choose how far up to climb, file by file.

Stepping back

A generator earns its keep twice: once when it scaffolds your project, and then continuously, every time it drags that project back up to the framework’s current shape. The second job is worth nothing if regeneration flattens your customisations, because you’ll simply stop running it, and who could blame you.

go-tool-base’s generator gets around that by remembering. It hashes every file it writes into .gtb/manifest.yaml, and on regeneration it re-hashes before overwriting: unchanged files it owns and updates silently, changed files it stops and asks about, and .gtb/ignore lets you mark files as permanently yours. Skipped files stay tracked, so the generator never loses sight of what you’ve made your own.

The point of a scaffold isn’t the first five minutes. It’s that you can still run it in month three without holding your breath.

Your CLI is already an AI tool

Thu, 19 Mar 2026 00:00:00 +0000

“Make it work with AI” has become one of those requests that lands on a developer’s desk with a thud and not much further detail attached. My instinct, the first time, was to brace for a big lump of integration work… a bespoke adapter for this assistant, another for that one, a treadmill of little wrappers stretching off into the distance.

Turns out I’d already done most of the work. So have you, if your CLI tool is any good. Let me explain what I mean.

You already described your capabilities

Stop and think for a second about what a well-built CLI tool actually is. It’s a set of named operations, each with a human-readable description, each taking a set of typed, named, documented parameters. You wrote all of that already, because a CLI without it is unusable by people.

Now look at what an AI assistant needs in order to call a tool. A set of named operations. A description of each, so it knows when to reach for them. A typed parameter schema for each, so it knows how to call them.

It’s the same list! A good CLI is already, structurally, a description of a set of capabilities. The information an AI agent needs isn’t extra work you have to go and do. It’s work you finished the moment your --help output was any good.

The only thing missing is a translator. Something that takes “this is a CLI” and presents it as “this is a set of tools an AI can call”.

MCP is that translator, and it’s a standard

The temptation, when you want your tool to be AI-usable, is to sit down and write an integration. A little adapter for Claude Desktop. Another for Cursor. Another for whatever turns up next month. Each one a bespoke wrapper, each one a thing to maintain, and the list never stops growing because new assistants keep appearing. That’s the treadmill I was bracing for.

The Model Context Protocol exists to kill that list. MCP is an open standard for how an AI model discovers and calls local tools. Implement it once and your tool works with every assistant that speaks it. Write once, not once-per-client.

So go-tool-base implements it once, in the framework, for everyone. (That’s rather the theme of this whole series, if you hadn’t spotted it yet… do the annoying thing once, properly, in a place where every tool inherits it.)

The `mcp` command, and the mapping it does for free

Every tool built on go-tool-base inherits a built-in mcp command. Run it:

mytool mcp

and the tool starts a JSON-RPC server over standard I/O, speaking MCP. That’s the whole user-facing surface. One command.

Behind it, the framework walks your Cobra command tree and maps it straight onto MCP tool definitions:

Each command becomes a tool.
Each command’s short description becomes the tool’s description, the text the AI reads to decide whether this is the tool it wants.
Each command’s flags and arguments become the tool’s JSON Schema parameters.

There’s no second schema to write and then keep in sync (and we all know how well “keep these two things aligned by hand” tends to go). The command tree is the schema. Add a new command to your CLI and it’s a new tool for the agent, automatically, with the description and flags you already gave it. Nobody has to remember to update an MCP manifest, because there’s no separate MCP manifest to forget about.

Configuring an assistant to use it

On the assistant’s side it’s just as undramatic. You tell your AI client (Claude Desktop, Cursor, anything MCP-aware) to launch mytool mcp. From then on the assistant:

Starts your tool in MCP mode when it boots.
Discovers every command as a callable tool.
Calls the right one, with the right parameters, when a user’s request needs it.

Your CLI tool has quietly become something the AI can pick up and use, mid-conversation, on its own initiative.

The safety property worth noticing

Now, “let an AI run things on my machine” is rightly a sentence that makes people nervous. It makes me nervous, and I built the thing. So it’s worth noticing the constraint sitting quietly in this design.

The AI can only call what you defined. The tools it sees are exactly the commands in your tree, and the parameters it can pass are exactly the flags and arguments you declared, validated against the JSON Schema generated from them.

It can’t invent a command. It can’t pass a parameter you never defined. The boundary of what the agent can do is the boundary of what your CLI does, and you drew that boundary already, back when you built the tool. Exposing the CLI over MCP doesn’t widen the surface one inch. It just makes the existing surface reachable. The AI isn’t running things. It’s running your commands, the ones you wrote, tested and shipped, and nothing else.

The gist

A CLI tool, built properly, is already a structured description of a set of capabilities: named operations, descriptions, typed parameters. Which is also exactly what an AI agent needs in order to call a tool. The gap between the two is only a translator, and writing a bespoke one per assistant is a treadmill you don’t need to step onto.

go-tool-base puts the translator in the framework. Every tool gets an mcp command that serves the command tree over the Model Context Protocol… commands become tools, descriptions become descriptions, flags become JSON Schema parameters, with no second schema to maintain. Point any MCP-aware assistant at it and your CLI is an agent-callable tool, bounded to exactly the commands you shipped.

You did the hard part when you built a good CLI. MCP just opens the door you’d already framed.

Pre-populating Neo4J using Kubernetes Init Containers and neo4j-admin import

Wed, 15 Jul 2020 00:00:00 +0000

Recently there has been an uptake in the use of Neo4j by the Data Scientists. This is a good thing! they are wanting to use the right tool for the job. However we need to run it inside our k8s cluster as a portable readable data source that has been dynamically populated from a pile of data in a combination of PostgreSQL and MongoDB.

This isn’t a problem for them working locally, they install and spin up a local copy of Neo4j and can interact with it quite happily. They even realised that they can generate CSV’s from PostgreSQL and MongoDB and then import them, blindingly fast, into Neo4j using the neo4j-admin tool that comes bundled. Fantastic!

At least until they come to want to run their Neo instance inside our k8s cluster. That’s where I step in and turn them aside from creating their own custom neo4j image with a bespoke entry point that loads all the data for them in some crazy threaded bash scripting!

“No, No, No!” I tell them. “It’s far easier to just add an init container to your pod, that will preload the data before Neo starts up”.

Init containers, if you haven’t come across before, them are a special type of container that lives inside a k8s pod and are set to run BEFORE your main container runs. In this case it means we can easily sequence a bash script to run the neo4j-admin import before Neo4j is even started. And here is how we did it!

The script

The data scientists had been using Neo4j 3.5.x locally because they had a need for the graph algorithms plugin (https://github.com/neo4j-contrib/neo4j-graph-algorithms) which at the time they were looking didn’t support Neo4j 4.x. The plugin is now deprecated and its replacement (https://github.com/neo4j/graph-data-science) thankfully supports 3.5.x and 4.x.

As Neo4j 4.x introduces a lot of new features and improves performance so I recommended we switch to using that. This meant a refactor of their bash script for neo4j-admin there some very subtle differences and a few caveats to work with. This is what they came up with

#!/bin/bash
DBNAME="neo4j"
if [ "$#" -eq 1 ]; then
 DBNAME=$1
fi

# extract data from SQL
python3 extract_data.py

# remove old db for rebuild
rm -rf "/data/databases/$DNBAME"

neo4j-admin import \
 --database=$DBNAME \
 --delimiter="|" \
 --nodes=Protein=${NODE_DIR}/nodes_protein_header.csv,${DATA_DIR}/nodes_proteins.csv \
 --nodes=UniProtKB=${NODE_DIR}/nodes_uniprot_header.csv,${DATA_DIR}/nodes_uniprot.csv \
 --relationships=HAS_AMINO_ACID_SEQUENCE=${EDGE_DIR}/edges_protein_sequence_header.csv,${DATA_DIR}/edges_protein_sequence.csv \  
 --relationships=HAS_AMINO_ACID_SEQUENCE=${EDGE_DIR}/edges_chembl_protein_biotherapeutic_molregno_header.csv,${DATA_DIR}/edges_chembl_protein_biotherapeutic_molregno.csv \
 --skip-bad-relationships=true \
 --skip-duplicate-nodes=true

The import command here is significantly shorter for example purposes, as the original is about 120 lines long. As you can see it’s pretty straight forward, they had another script in extract_data.py, that I wont bore you with suffice to say that it pulled out all the data they wanted from PostgreSQL and MongoDB, which got saved to disk as CSV files in the relevant directories.

Great, it worked on their local version!

The Dockerfile

ROM neo4j:latest
ENV NEO4JLABS_PLUGINS ["graph-data-science"]
RUN apt update && apt install -y python3
WORKDIR /srv
COPY src /srv/src
COPY headers /srv/headers

The plan is always to keep it simple. We have one image that we can run for both the init container and the main container. This docker file gives a vanilla neo4j instance with python and our scripts for extracting the data loaded into it

The k8s Manifest

apiVersion: v1
kind: Pod
metadata:
 name: neo4j
spec:
 containers:
 - name: neo4j
 env:
 - name: NEO4J_AUTH
 value: neo4j/password
 image: registry.example.com/phpboyscout/rnd_graph:latest
 imagePullPolicy: Always
 volumeMounts:
 - mountPath: /data
 name: neo4j
 subPath: data
 initContainers:
 - name: importer
 args:
 - neo4j_import.sh
 command:
 - /bin/bash
 env:
 - name: DATA_DIR
 value: /import/data
 - name: HEADER_DIR
 value: /srv/headers
 image: registry.example.com/phpboyscout/rnd_graph:latest
 imagePullPolicy: Always
 stdin: true
 workingDir: /srv/src
 volumeMounts:
 - mountPath: /data
 name: neo4j
 subPath: data
 - mountPath: /import
 name: neo4j
 subPath: import
 - name: neo4j
 persistentVolumeClaim:
 claimName: neo4j

Now we can pull it all together with our k8s manifest. From here you can see that we have our default neo4j container that we pass in our default authentication details to and an init container that runs our import.sh script. Both containers have access to a shared volume for the /import and /data folders.

And now we get to…

Troubleshooting

So right off the bat it didn’t work! No surprises there but here are a few things that caused us some issues and how we resolved them.

Database offline

At first glance everything seemed to work. Until we tried to connect to the neo4j database with the default UI, at which point we were presented with the error message

Database "neo4j" is unavailable, its status is "offline."

This took a little sleuthing and shelling into the neo4j container to take a look at the /var/debug.log file which gives significantly more useful information about whats going on with the server. First we were getting stack traces that contained messages like

Component 'org.neo4j.kernel.impl.transaction.log.files.TransactionLogFiles@59d6a4d1'
was successfully initialized, but failed to start. Please see the attached cause 
exception "/data/transactions/neo4j/neostore.transaction.db.0"

From experience this sounded like a permissions issue and lo and behold, checking the files on the filesystem showed that because the import script was run as root the database files were owned by root. We resolved this by adding:-

chown -R neo4j:neo4j /data/

to the bottom of the import script. Next we were then presented with an error that looked like

2020-07-14 16:56:33.919+0000 WARN [o.n.k.d.Database] [neo4j] Exception occurred while
starting the database. Trying to stop already started components. Mismatching store id.

This one seems like it would be an obvious one to google and I did come up with few pages that seemed to describe what was happening to me but gave some varied solutions, from starting and stopping the sever and running neo4j-admin unbind in between to deleting various files. It seemed very strange because we did test this with the 3.5.17 version of Neo and it worked fine.

The solution we needed was to wipe the slate clean properly. The line in our script to remove the previous build of the db

# remove old db for rebuild
rm -rf "/data/databases/$DNBAME"

just didn’t cut it. It turns out that because the 4.x version of Neo4j supports multiple databases the import command writes additional information to the system database and transactions database in the form of some identifiers for each database, BUT if you don’t do something to clear that value for the database your are building it wont match up when the server starts and you get a declaration of Mismatching store id

I’m not sure if the developers are aware of this flaw, so in the mean time we have to expand our cleanup to:

# clean up for fresh import
rm -rf /data/databases/*
rm -rf /data/transactions/*

removing the neoj4, system and store_lock databases and transaction logs from the data store. This solved the problem and the server was able to start and we could connect to neo4j database successful.

Its not an ideal solution, I can foresee definite situations we will have to work around when we get to a point where multiple databases may be needed and are built separately and independently from each other. but it will suffice for now.

Malloc(): Error message goes here

Once it was up and running we noticed that we were getting lots of restarts on the main neo4j container a quick look at the stdout log and we could see each restart ending with something that looked like

malloc(): corrupted top size

instantly this looks like an issue with memory sizing inside the container for the JVM. Thankfully the team at Neo4j have accounted for this and give you a nice tool in the form of

neo4j-admin memrec

which interrogates the databases and gives some sensible values you can set in the output which in our case looked like


# Memory settings recommendation from neo4j-admin memrec:
#
# Assuming the system is dedicated to running Neo4j and has 376.6GiB of memory,
# we recommend a heap size of around 31g, and a page cache of around 331500m,
# and that about 22400m is left for the operating system, and the native memory
# needed by Lucene and Netty.
#
# Tip: If the indexing storage use is high, e.g. there are many indexes or most
# data indexed, then it might advantageous to leave more memory for the
# operating system.
#
# Tip: Depending on the workload type you may want to increase the amount
# of off-heap memory available for storing transaction state.
# For instance, in case of large write-intensive transactions
# increasing it can lower GC overhead and thus improve performance.
# On the other hand, if vast majority of transactions are small or read-only
# then you can decrease it and increase page cache instead.
#
# Tip: The more concurrent transactions your workload has and the more updates
# they do, the more heap memory you will need. However, don't allocate more
# than 31g of heap, since this will disable pointer compression, also known as
# "compressed oops", in the JVM and make less effective use of the heap.
#
# Tip: Setting the initial and the max heap size to the same value means the
# JVM will never need to change the heap size. Changing the heap size otherwise
# involves a full GC, which is desirable to avoid.
#
# Based on the above, the following memory settings are recommended:
dbms.memory.heap.initial_size=31g
dbms.memory.heap.max_size=31g
dbms.memory.pagecache.size=331500m
#
# It is also recommended turning out-of-memory errors into full crashes,
# instead of allowing a partially crashed database to continue running:
#dbms.jvm.additional=-XX:+ExitOnOutOfMemoryError
#
# The numbers below have been derived based on your current databases located at: '/var/lib/neo4j/data/databases'.
# They can be used as an input into more detailed memory analysis.
# Total size of lucene indexes in all databases: 0k
# Total size of data and native indexes in all databases: 17300m

So how to get these values into the container… Thankfully this is handled for you in the form of Environment Variables you can pass into the docker image. A bit of a google and i found this little snippet which is a goldmine for telling us how to translate settings into environment variables.

# Env variable naming convention:
# - prefix NEO4J_
# - double underscore char '__' instead of single underscore '_' char in the setting name
# - underscore char '_' instead of dot '.' char in the setting name
# Example:
# NEO4J_dbms_tx__log_rotation_retention__policy env variable to set
# dbms.tx_log.rotation.retention_policy setting

As for getting the variables into the container, you could do this from the pod and inject it in. I this case because the data we are going to be using is reasonably stable and tested we decided to stick them into the Docker file with the ENV directive.

ENV NEO4J_dbms_memory_heap_initial__size 31g
ENV NEO4J_dbms_memory_heap_max__size 31g
ENV NEO4J_dbms_memory_pagecache_size 331500m

And so far we haven’t had a restart yet!