A CLI does its job and gets out of the way. You run it, it prints something or writes a file, the process exits, done. Then one day you want the opposite: a thing that stays running. A server answering requests, a worker chewing through a queue, something that sits there until you tell it to stop. And the moment a process is long-lived, a pile of unglamorous problems lands on your desk that a short-lived command never had to think about.
How does it shut down when Kubernetes sends it a SIGTERM, without dropping the requests it’s halfway through? How does anything outside it know it’s alive, or ready for traffic? When one part falls over at 3am, does the whole thing come down, or pick itself back up? None of that is your actual service. It’s the plumbing around it, and it’s the sort of plumbing that’s easy to write almost right and only notice the gap in during an incident.
This is a new series, a companion to the one on building a CLI. That series gets you a working command-line tool; this one turns it into a web service, a piece at a time: gRPC, HTTP, a gateway that bridges the two, TLS across all of them, and live API docs. But every one of those is a long-running process, so we start with the part they all stand on.
Same shape as before, each part stands alone. By the end of this one you’ll have a process that starts cleanly, reports its own health, and shuts down without dropping anything, and you won’t have written the lifecycle code yourself. The series is written against go-tool-base v0.6.0, the release that brings the web-service components in.
What every long-running process needs
go-tool-base’s answer to all of the above is the controls package, and its
centrepiece is the Controller. You hand it a set of services, things with a
way to start, a way to stop, and a way to report health, and it owns their
lifecycle. It starts them, watches for the operating system asking the process
to quit, drives a graceful shutdown in the right order, and keeps a running
picture of whether everything is alright.
A “service” here is deliberately loose. An HTTP server is one. A gRPC server is one. So is a background worker that wakes every few seconds, or a queue consumer. The controller doesn’t care what’s inside; it cares that it can start it, stop it, and ask after its health. That’s the whole trick: get those three verbs right once, in one place, and everything you bolt on later inherits them.
A service in thirty lines
Let’s build the smallest useful one: a heartbeat that logs a tick every second. It isn’t exciting, but it’s a real long-running service, and it shows every moving part without a transport getting in the way.
// main.go
package main
import (
"context"
"os"
"sync/atomic"
"time"
"gitlab.com/phpboyscout/go-tool-base/pkg/controls"
"gitlab.com/phpboyscout/go-tool-base/pkg/logger"
)
func main() {
log := logger.NewCharm(os.Stderr, logger.WithTimestamp(true))
// The controller owns the process lifecycle: it starts registered services,
// watches for SIGINT/SIGTERM, and drives a graceful shutdown.
controller := controls.NewController(context.Background(), controls.WithLogger(log))
var beats atomic.Int64
controller.Register("heartbeat",
// Start launches the work. The context is cancelled when the controller
// shuts down, so the goroutine just watches ctx.Done().
controls.WithStart(func(ctx context.Context) error {
go func() {
ticker := time.NewTicker(time.Second)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
return
case <-ticker.C:
log.Info("beat", "count", beats.Add(1))
}
}
}()
return nil
}),
// Stop runs during shutdown for any explicit cleanup.
controls.WithStop(func(_ context.Context) {
log.Info("heartbeat stopping", "total_beats", beats.Load())
}),
// Status reports health. Here we're healthy as long as we're ticking.
controls.WithStatus(func() error { return nil }),
)
controller.Start()
controller.Wait()
}
A few things are earning their keep there. NewController takes a context and
some options, here just a logger. Register names a service and gives it its
three verbs through functional options: WithStart launches it (and is handed a
context that gets cancelled when the controller shuts down, which is the hook
the goroutine watches), WithStop is called during shutdown for cleanup, and
WithStatus answers “are you alright?”. Then controller.Start() launches
everything and controller.Wait() blocks until the whole thing has stopped.
The three options are all in
pkg/controls.
Build it and run it:
go run .
INFO beat count=1
INFO beat count=2
INFO beat count=3
A service, running. Now for the half that’s easy to get wrong.
Shutting down on purpose
Press Ctrl-C, or send the process a SIGTERM the way an orchestrator would, and watch what it does:
WARN [Controller] : received signal signal=terminated
WARN [Controller] : Stopping Services
INFO heartbeat stopping total_beats=3
INFO [Controller] : Stopped
Nothing in our thirty lines handled a signal. The controller registered its own
handlers for SIGINT and SIGTERM, and when one arrived it cancelled the context
that every service’s Start is watching, gave them a window to finish, ran each
Stop, and exited cleanly. That cancel-the-context-then-Stop order is the
thing: it’s exactly what stops an HTTP server from dropping requests it’s
mid-way through when the pod rolls. We’ve got it here for a heartbeat that does
nothing on the way out, and we’ll get the same order for free for every real
transport we add later.
You can tune the window with WithShutdownTimeout, and turn the signal handling
off entirely (handy in tests) with WithoutSignals. The defaults are the right
ones for a service in a container.
Health, before anything’s asking
That third verb, WithStatus, is the start of the health story, and it’s worth
seeing now even though nothing’s calling it yet. The controller can report three
separate things: overall status, liveness, and readiness. Those aren’t the same
question. Liveness is “is this process wedged and in need of a restart”;
readiness is “should traffic come here yet”. An orchestrator uses them
differently, which is why they’re kept apart, and a service can answer them
separately by adding WithLiveness and WithReadiness alongside WithStatus.
Right now nothing asks, because we’ve no transport. But this is the quiet payoff
of putting lifecycle first: when we add an HTTP server in part 3, these reports
are what back its /healthz, /livez and /readyz endpoints, and when we add
gRPC in part 2 they back the standard gRPC health service, with no re-plumbing on
our side. The controller also carries a restart policy for services that should
pick themselves back up, and standalone health checks for things like “can I
still reach the database”, but those earn their place once we’ve something worth
checking.
Where this leaves us
A few lines in, we’ve a process that starts, ticks along, answers for its own health, and stops cleanly when the platform asks it to, on a controller that the real transports will register against unchanged. The heartbeat is a stand-in. Next part we swap it for a proper gRPC service, give it TLS, and the controller barely notices the difference, which is the entire point of it.
If you want to read ahead, the controls component has the full interface, and the service-orchestration deep-dive covers how the startup ordering and shutdown actually work underneath.
