Featured image of post A stack trace is not an error message

A stack trace is not an error message

The repair agent I’ve been building into go-tool-base narrates what it’s doing as it goes. It builds, it tests, it lints, it fixes, and it logs each step so I can watch it think. Mostly that log is a calm, readable trickle: tried this, that failed, reading the file, here’s the fix. Mostly.

The moment a build or a lint step failed, the calm trickle turned into a wall of Go stack frames, the same forty lines of runtime gubbins over and over, burying the one line I actually wanted to read.

Two different things we both call “the error”

The agent’s tools wrap their failures with cockroachdb/errors, which is a lovely library: it attaches a stack trace to an error the moment you create it, so when something goes wrong deep in the weeds you can see exactly how you got there. A failed go build comes back as one of these rich, wrapped values, carrying its message and its stack.

The line that recorded the failure looked like this:

l.Warn("Tool execution failed", "tool", name, "error", err)

Looks fine. It is not fine. The logger is charmbracelet/log, and when you hand a structured logger a cockroachdb error value as a field, it renders the whole thing: message, wraps, types, and every frame of that attached stack. So every failed step, and during self-repair there are plenty, printed a full traceback at WARN. The signal, the actual build error, was in there somewhere, wearing a forty-line coat.

The thing is, a stack trace and an error message are two different objects that we lazily both call “the error”. err.Error() is the message: short, human, “lint issues found…”. The value err is the message plus the evidence of where it came from. They serve different readers. The message is for whoever’s watching the loop run. The stack is for whoever’s debugging why the loop itself is broken. Hand the wrong one to the wrong reader and you’ve got either noise or a mystery.

Pick a reader per level

The fix is to stop making one log line serve both:

l.Warn("Tool execution failed", "tool", name, "error", err.Error())
l.Debug("Tool execution failure detail", "tool", name, "error", err)

The message goes to WARN, where someone’s watching the agent work and just wants to know what failed. The full wrapped value, stack and all, goes to DEBUG, where someone’s gone looking for trouble and wants every frame. Turn the level up and the evidence is right there; leave it at the default and the loop reads like prose again.

It’s a one-line change, give or take, and it lives in the shared chat tool-dispatch path rather than in the agent, so every tool-using client gets the quieter log for free. It’s the same loop I’d just taught to respect the linter; apparently I was determined to make it both honest and readable in the same fortnight.

Where the stack belongs

The stack trace was never the thing I needed to hide. cockroachdb/errors attaching it is exactly what I want; it’s the whole reason I use the library. The mistake was where I let it surface. A trace dumped at WARN on every routine failure isn’t observability, it’s wallpaper, and wallpaper is what you stop seeing. Keep the loud version for the level where someone’s actually gone looking for it, and leave the everyday log alone. The stack was never the noise. Printing it on every line was.

Built with Hugo
Theme Stack designed by Jimmy