TL;DR: The CI gate jobs across the infrastructure repos all need the same pile of tools — OpenTofu, tflint, trivy, checkov, gitleaks, terraform-docs, the AWS CLI. Installing them per job is slow and unpinned. infra-tools is one container image with all of them, and one source of truth for their versions. Two of its build decisions are worth a look: it publishes with crane rather than a second image build, and it soft-fails its own vulnerability scan on purpose.

The same pile of tools, in every repo

Every infrastructure repo in this series runs the same CI gate jobs: format and validate the OpenTofu, lint it, scan it for security problems and secrets, check the docs. Those jobs need a specific set of tools, and it is the same set in every repo.

Install them per job and you pay twice. You pay in time, because every pipeline downloads and installs the whole set again. And you pay in drift, because unless every repo pins every tool identically, the repos slowly diverge on which version of trivy or tflint they actually run, and a check that passes in one repo fails in another for no reason anyone can see.

One image, one source of truth

infra-tools is the answer: a single Debian-based container image with the whole toolchain baked in. Every CI job in every repo uses it with one image: line.

The real value is not the convenience. It is that the image is the one place tool versions are pinned. The Go-based tools are pinned in a mise.toml. checkov, which has no mise plugin, is pinned in a requirements file installed with pipx. The AWS CLI is pinned by a build argument. Three mechanisms, because the tools come from three kinds of source, but one image, and every pin wired to Renovate so a version bump arrives as a reviewable pull request. There is exactly one answer to “what version of trivy does the toolchain use,” and it lives here.

Publishing with crane, not a second build

A build-pipeline detail that took a real bug to discover.

The pipeline builds the image with kaniko, which builds images without a privileged Docker daemon, something that matters a great deal on shared CI runners. Then it scans the image, then it publishes it.

The obvious way to write the publish stage is “build the image and push it.” But kaniko has no mode for “just push this tarball I already built.” A second kaniko invocation re-executes the entire Dockerfile from the top, including a second mise install, which makes a fresh round of calls to GitHub’s API to fetch tools. GitHub’s anonymous API limit is low and shared by IP, so on a CI runner that second install reliably trips a 403 rate-limit.

So the publish stage does not rebuild. It uses crane to push the exact image tarball the build stage already produced. The image is built once. And because the published bytes are the same bytes the scan stage scanned, there is no gap between “the image we checked” and “the image we shipped.”

Soft-failing the scanner on purpose

The decision that looks wrong until you see the reasoning: the pipeline scans the image with trivy, and trivy is allowed to fail without failing the pipeline.

A vulnerability scanner that does not gate the build sounds like a scanner switched off. It is not. It is a scanner pointed at something it cannot helpfully gate.

The tools in the image are prebuilt Go binaries. trivy inspects them, reads the version of the Go runtime each was compiled with, and reports every known CVE in that Go runtime. Those findings are real, but they are not mine to fix. The only fix is the upstream tool rebuilding itself against a patched Go. With seven such tools in the image, at any given moment one of them is usually a little behind on its Go version.

A hard gate would mean the image becomes unpublishable whenever any single upstream lags, over a CVE in code I do not own and cannot patch. That is not a security control; it is a way to be unable to ship. So the scan is allow_failure. The findings stay fully visible, and the residual count is genuinely useful as a metric for how far behind upstream the toolchain has drifted. It just does not block shipping an image whose only “vulnerabilities” are other people’s build timelines.

What it comes down to

The infrastructure repos all run the same CI gate jobs, needing the same tools, so infra-tools bakes the whole toolchain into one image and pins every version in one place, wired to Renovate.

Two build choices are worth copying. The publish stage uses crane to push the already-built, already-scanned tarball, because a second kaniko build would re-run mise install and hit GitHub’s anonymous rate limit, and because pushing the scanned bytes means shipping exactly what was checked. And the trivy scan is deliberately allow_failure, because it reports Go-runtime CVEs in prebuilt upstream binaries that no change to this repo can fix, so a hard gate would only make the image unshippable over someone else’s lag.