Featured image of post Building a web service with go-tool-base, part 1: lifecycle and graceful shutdown

Building a web service with go-tool-base, part 1: lifecycle and graceful shutdown

A CLI does its job and gets out of the way. You run it, it prints something or writes a file, the process exits, done. Then one day you want the opposite: a thing that stays running. A server answering requests, a worker chewing through a queue, something that sits there until you tell it to stop. And the moment a process is long-lived, a pile of unglamorous problems lands on your desk that a short-lived command never had to think about.

How does it shut down when Kubernetes sends it a SIGTERM, without dropping the requests it’s halfway through? How does anything outside it know it’s alive, or ready for traffic? When one part falls over at 3am, does the whole thing come down, or pick itself back up? None of that is your actual service. It’s the plumbing around it, and it’s the sort of plumbing that’s easy to write almost right and only notice the gap in during an incident.

This is a new series, a companion to the one on building a CLI. That series gets you a working command-line tool; this one turns it into a web service, a piece at a time: gRPC, HTTP, a gateway that bridges the two, TLS across all of them, and live API docs. But every one of those is a long-running process, so we start with the part they all stand on.

Same shape as before, each part stands alone. By the end of this one you’ll have a process that starts cleanly, reports its own health, and shuts down without dropping anything, and you won’t have written the lifecycle code yourself. The series is written against go-tool-base v0.6.0, the release that brings the web-service components in.

What every long-running process needs

go-tool-base’s answer to all of the above is the controls package, and its centrepiece is the Controller. You hand it a set of services, things with a way to start, a way to stop, and a way to report health, and it owns their lifecycle. It starts them, watches for the operating system asking the process to quit, drives a graceful shutdown in the right order, and keeps a running picture of whether everything is alright.

A “service” here is deliberately loose. An HTTP server is one. A gRPC server is one. So is a background worker that wakes every few seconds, or a queue consumer. The controller doesn’t care what’s inside; it cares that it can start it, stop it, and ask after its health. That’s the whole trick: get those three verbs right once, in one place, and everything you bolt on later inherits them.

A service in thirty lines

Let’s build the smallest useful one: a heartbeat that logs a tick every second. It isn’t exciting, but it’s a real long-running service, and it shows every moving part without a transport getting in the way.

// main.go

package main

import (
	"context"
	"os"
	"sync/atomic"
	"time"

	"gitlab.com/phpboyscout/go-tool-base/pkg/controls"
	"gitlab.com/phpboyscout/go-tool-base/pkg/logger"
)

func main() {
	log := logger.NewCharm(os.Stderr, logger.WithTimestamp(true))

	// The controller owns the process lifecycle: it starts registered services,
	// watches for SIGINT/SIGTERM, and drives a graceful shutdown.
	controller := controls.NewController(context.Background(), controls.WithLogger(log))

	var beats atomic.Int64

	controller.Register("heartbeat",
		// Start launches the work. The context is cancelled when the controller
		// shuts down, so the goroutine just watches ctx.Done().
		controls.WithStart(func(ctx context.Context) error {
			go func() {
				ticker := time.NewTicker(time.Second)
				defer ticker.Stop()

				for {
					select {
					case <-ctx.Done():
						return
					case <-ticker.C:
						log.Info("beat", "count", beats.Add(1))
					}
				}
			}()

			return nil
		}),
		// Stop runs during shutdown for any explicit cleanup.
		controls.WithStop(func(_ context.Context) {
			log.Info("heartbeat stopping", "total_beats", beats.Load())
		}),
		// Status reports health. Here we're healthy as long as we're ticking.
		controls.WithStatus(func() error { return nil }),
	)

	controller.Start()
	controller.Wait()
}

A few things are earning their keep there. NewController takes a context and some options, here just a logger. Register names a service and gives it its three verbs through functional options: WithStart launches it (and is handed a context that gets cancelled when the controller shuts down, which is the hook the goroutine watches), WithStop is called during shutdown for cleanup, and WithStatus answers “are you alright?”. Then controller.Start() launches everything and controller.Wait() blocks until the whole thing has stopped. The three options are all in pkg/controls.

Build it and run it:

go run .
INFO beat count=1
INFO beat count=2
INFO beat count=3

A service, running. Now for the half that’s easy to get wrong.

Shutting down on purpose

Press Ctrl-C, or send the process a SIGTERM the way an orchestrator would, and watch what it does:

WARN [Controller] : received signal signal=terminated
WARN [Controller] : Stopping Services
INFO heartbeat stopping total_beats=3
INFO [Controller] : Stopped

Nothing in our thirty lines handled a signal. The controller registered its own handlers for SIGINT and SIGTERM, and when one arrived it cancelled the context that every service’s Start is watching, gave them a window to finish, ran each Stop, and exited cleanly. That cancel-the-context-then-Stop order is the thing: it’s exactly what stops an HTTP server from dropping requests it’s mid-way through when the pod rolls. We’ve got it here for a heartbeat that does nothing on the way out, and we’ll get the same order for free for every real transport we add later.

You can tune the window with WithShutdownTimeout, and turn the signal handling off entirely (handy in tests) with WithoutSignals. The defaults are the right ones for a service in a container.

Health, before anything’s asking

That third verb, WithStatus, is the start of the health story, and it’s worth seeing now even though nothing’s calling it yet. The controller can report three separate things: overall status, liveness, and readiness. Those aren’t the same question. Liveness is “is this process wedged and in need of a restart”; readiness is “should traffic come here yet”. An orchestrator uses them differently, which is why they’re kept apart, and a service can answer them separately by adding WithLiveness and WithReadiness alongside WithStatus.

Right now nothing asks, because we’ve no transport. But this is the quiet payoff of putting lifecycle first: when we add an HTTP server in part 3, these reports are what back its /healthz, /livez and /readyz endpoints, and when we add gRPC in part 2 they back the standard gRPC health service, with no re-plumbing on our side. The controller also carries a restart policy for services that should pick themselves back up, and standalone health checks for things like “can I still reach the database”, but those earn their place once we’ve something worth checking.

Where this leaves us

A few lines in, we’ve a process that starts, ticks along, answers for its own health, and stops cleanly when the platform asks it to, on a controller that the real transports will register against unchanged. The heartbeat is a stand-in. Next part we swap it for a proper gRPC service, give it TLS, and the controller barely notices the difference, which is the entire point of it.

If you want to read ahead, the controls component has the full interface, and the service-orchestration deep-dive covers how the startup ordering and shutdown actually work underneath.

Built with Hugo
Theme Stack designed by Jimmy