BDD where it earns its place, and nowhere else

TL;DR: go-tool-base adopted Godog, the Cucumber-style BDD library for Go, but deliberately did not adopt it everywhere. BDD earns its place for two kinds of test: the service-lifecycle state machine, where a 300-line tangle of goroutines and channels became a readable Given/When/Then; and CLI end-to-end workflows, which are narratives by nature. Everywhere else, plain table-driven Go tests stay the baseline. The interesting decision wasn’t using BDD. It was being strict about where not to.

Two tests that hurt for different reasons

Most of go-tool-base’s tests are ordinary table-driven Go tests, and they’re fine. A function, a slice of input/expected pairs, a loop. Nobody needs Gherkin to understand a parser test.

But two areas were genuinely painful, and they were painful in the same way: the test had become harder to understand than the thing it tested.

The first was pkg/controls, the service-lifecycle package. It runs a small state machine (Unknown, Running, Stopping, Stopped) with signal handling, health monitoring, restart policies and graceful shutdown all woven through it. The integration tests for graceful shutdown had grown to over three hundred lines of imperative goroutine and channel coordination. They worked. But reviewing them was a slog, and a test you can’t review with confidence is a test you can’t trust when it fails. The behaviour being checked, “when a shutdown signal arrives mid-startup, the controller stops cleanly,” was a simple sentence buried under a heap of synchronisation scaffolding.

The second was the CLI itself. init, update, doctor are user workflows. “Given a config file with a custom value, when I run init, then the custom value survives the merge.” That’s already a Given/When/Then; it just happened to be written as Go.

Godog, and the line I drew

Godog is the official Go implementation of Cucumber. You write .feature files in plain Gherkin, and bind each step to a Go function. The shutdown scenario stops being three hundred lines of channels and becomes this:

Scenario: graceful shutdown completes within the deadline
  Given a controller with two registered services
  When a shutdown signal is received
  Then both services stop in registration order
  And the controller reports a clean shutdown

The goroutine choreography doesn’t vanish, of course. It moves into the step definitions, written once and reused. What changes is that the scenario is now readable by someone who has never opened the file before, including someone from an ops team who’ll never write Go but absolutely has opinions about how shutdown should behave.

Here’s the part I want to dwell on, because it’s the part most BDD adoptions get wrong. The first design decision written down for this work was: strategic, not universal. Use Godog only where BDD adds clarity. Keep table-driven Go tests as the baseline everywhere else.

That sounds obvious written down. It is not obvious in practice, because BDD has a gravitational pull. Once a team has feature files, there’s a strong urge to express everything as feature files, for consistency. And that’s how you end up with Gherkin scenarios for a pure function (Given the number 2, When I double it, Then I get 4) which is pure ceremony. You’ve wrapped a one-line table test in a paragraph of English and a step-definition indirection, and made it worse.

The honest test for whether BDD belongs is: is this test a narrative, or is it a matrix?

A matrix is same logic, many input/output pairs. That’s a table-driven test, that’s most unit tests, and Gherkin actively harms them. A narrative is a sequence of steps where the ordering and the state between steps is the thing under test, and that’s where Gherkin pays for itself. Lifecycle transitions are narratives. A user running three commands in sequence is a narrative. Doubling a number is not.

go-tool-base drew that line and stuck to it. Feature files live in features/ at the project root, where a non-Go developer can find and read them. Step definitions live in test/e2e/, kept away from the unit tests. And the unit tests stayed exactly what they were, because they were already the right tool.

Made to fit, not bolted on

A couple of smaller decisions kept the BDD layer from feeling like a foreign object.

It runs under go test. There’s no separate Cucumber runner to install or remember. A godog.TestSuite is invoked from an ordinary TestFeatures(t *testing.T), so the BDD scenarios run in the same go test ./... as everything else. CI didn’t need a new concept.

And the CLI end-to-end tests build the gtb binary once and reuse it across every scenario. Compiling a binary per scenario would make the suite slow enough that people would skip it, and a test suite people skip is decoration. Build once, test many.

Stepping back

go-tool-base brought in Godog for BDD, but the decision worth writing about is the restraint. BDD was applied to exactly two things: the service-lifecycle state machine, where a 300-line goroutine tangle became a four-line scenario anyone can review, and CLI workflows, which are Given/When/Then by their nature. Everywhere else, table-driven Go tests remained the baseline, because wrapping a matrix test in Gherkin makes it worse, not better.

The useful rule: BDD fits a narrative, ordered steps with meaningful state in between, and fights a matrix. Adopt it as a scalpel for the narratives. Resist the pull to make it a religion.