go-tool-base has had a thing called telemetry for a long while now. It’s the opt-in kind: the product analytics that asks a user’s permission before it phones a single byte home, sits there as a no-op until they say yes, and can be wiped on request. The whole package is built around consent.
Then the web-service series went and needed telemetry too. Not that telemetry. The other one, the one the rest of the industry means when it says the word: traces, metrics and logs of a running service. And the awkward thing about those two is that they share a name, they want to share a package, and they pull in exactly opposite directions on the one question that matters most.
This is the story of how 0.7.x grew a second telemetry without breaking the first, and where the line between them ended up getting drawn.
Why bother putting it in the framework at all
The starting point is that I could have left observability out. A reader could
wire up OpenTelemetry in their own service and go about their day. But the six
parts of the web-service series spent a lot of effort making the transports
first-class: a gRPC server, an HTTP server, a gateway, TLS across all of them,
each one a Register call against the controller. Turning a CLI into a real
long-running service and then shrugging “observability is your problem” would
have left a hole exactly where it hurts.
Because a service you can’t see into is a liability the moment it leaves your laptop. The series ended with a macguffin service that was typed, fast and served over TLS, and was also a black box: when it got slow, you had nowhere to look. Metrics and traces are how you get the lights on, and they deserved the same first-class treatment as the things they observe.
The other half of the reason is that the framework already had a foot in this world. The analytics package’s preferred backend speaks OTLP, the OpenTelemetry wire protocol. So OpenTelemetry was already in the building. Doing observability any other way would have meant two standards where one would do.
The catch: two telemetries, opposite instincts
Here’s where it gets interesting, and it’s the part worth slowing down on.
The analytics telemetry is about a user. It collects usage data, hashed machine id, which command ran, exit code, and the entire design assumes you have to ask first. It is off by default. The collector you get when it’s disabled is a no-op, so nothing is recorded until the user opts in, and there’s a deletion path for when they change their mind. That’s not an add-on, that’s by design.
The observability telemetry is about a service. It emits operational data, how long a request took, which span was slow, how many errored, to a collector the operator runs. And there is no user in the loop to ask. The operator deploys the service, points it at their collector, and that act is itself the consent. Asking would be nonsensical: whose permission, for data about their own service, on their own infrastructure?
So you have two things called telemetry, wanting to live in one package, with the opposite default on consent. One is off until someone says yes; the other is on the moment it’s configured. Get that wiring wrong and you fail in one of two ugly ways. Gate the operational telemetry behind the user’s analytics opt-in, and a service’s tracing silently does nothing because nobody ticked a box meant for something else. Or loosen the analytics gate to make observability flow, and you start leaking usage data the user never agreed to share. Neither is acceptable, and “just use two packages” throws away everything the two genuinely have in common.
What they actually share
Quite a lot, as it turns out, and all of it below the consent line.
Both ship their data over OTLP to a collector. Both need to describe who is emitting, the service name and version, the resource in OpenTelemetry’s terms. Both parse an endpoint, attach headers, decide whether the connection is plaintext. None of that has the faintest thing to do with consent. It’s just the plumbing of getting bytes to a collector, and the analytics backend already had all of it, written inline.
So the shape of the solution fell out of the problem. Lift the shared plumbing
into one place, let both telemetries stand on it, and keep the consent decision
firmly out of that shared layer. The structure under pkg/telemetry ended up
like this:
pkg/telemetry/
telemetry.go the analytics Collector (consent-gated)
backend_otel.go its OTLP backend
posthog/ datadog/ vendor analytics backends
otelcore/ shared: OTLP endpoint, resource, telemetry.* config
tracing/ observability signal
metrics/ observability signal
logs/ observability signal
observability.go Setup: builds the enabled signals (implied consent)
The new otelcore is the keystone. It holds the three things both sides need and
nothing they don’t:
ParseEndpoint
for the OTLP URL,
Resource
for the service identity, and
Resolve
for reading the shared telemetry.* config (a base endpoint, plus per-signal
overrides, in the same cascade as the TLS config). It imports no signal exporter
and knows nothing about traces, metrics, logs or analytics. It is deliberately
dumb plumbing.
The refactor: making the old telemetry stand on the new core
This next part is where the old telemetry and the new one become a single thing. The analytics OTLP backend was the first user of OTLP in the framework, and it had grown its own copy of all that plumbing: a function that parsed the endpoint URL, split out the host and path, worked out the insecure flag, and built the resource from a service name. Exactly the code the three new signals were about to need.
So rather than write it a second time and let the two drift, the analytics
backend was refactored onto otelcore. Its exporter builder,
buildOTelExporterOpts,
now calls otelcore.ParseEndpoint, the same function tracing, metrics and
logs call, and the resource comes from otelcore.Resource, the same one they
use. One implementation of “talk OTLP to a collector”, four callers: the
analytics backend and the three observability signals. Change how the framework
forms an OTLP endpoint, and every signal moves together.
The reassuring part was that the analytics tests didn’t budge. The refactor moved
code without changing behaviour, and the consent machinery, the opt-in, the
no-op-when-disabled, the deletion path, never came near otelcore. Which is
exactly the point.
Where the line is
Because the shared core is the easy half. The half that earns its keep is the bit that isn’t shared, and it’s a single, deliberate line.
The analytics collector keeps its gate. The constructor,
NewCollector,
still returns a no-op the moment telemetry is disabled, so a user who hasn’t opted
in gets a collector that silently discards everything. Informed consent, untouched.
Observability gets a different door entirely.
Setup
builds whichever signals the operator has switched on, and it is gated only by
telemetry.tracing.enabled and its siblings, which the operator sets. It never
consults the analytics opt-in. Turning on tracing doesn’t turn on analytics;
disabling analytics doesn’t silence tracing. The two enable flags live under the
same telemetry.* config root, sit next to each other, and never read each
other.
So that’s the whole architecture in a sentence: one package, one OTLP export core, two consent models that share everything except the answer to “do we need to ask”. The principle underneath, the one that decided every one of these calls, is that the kind of data sets the consent model. Usage data about a person needs informed consent. Operational data about a service runs on implied consent. The CLI and the web service are just where each kind tends to live.
Where this leaves the framework
0.7.x came out the other side with both telemetries: the one that asks first, exactly as it was, and a new one that doesn’t, because it has nobody to ask. They share an export core, a config root and a name, and they part company on the only thing they were ever going to disagree about.
I’ve been careful here to describe how the two consent models are kept apart, not to argue why they have to be. That argument, that “the kind of data decides the consent model” is a line worth holding rather than a convenient bit of engineering, is a piece of its own, and it’s the one I’m writing next.
