An AI client that doesn't flatten its providers

TL;DR: A CLI tool that integrates AI shouldn’t hard-wire one vendor, so it wants a single client interface over several providers. But a client built on only what every provider shares throws away the features actually worth having: prompt caching, extended thinking, citations. rust-tool-base’s rtb-ai refuses to choose. The genai crate gives a unified path across five providers, and an Anthropic-direct path drops below the abstraction for the features genai can’t reach. The escape hatch is designed in, not leaked.

The pull toward one interface

If your CLI tool talks to an AI model, hard-wiring one vendor is a poor bet. One user has an Anthropic key, another an OpenAI key. Someone is on Gemini. Someone runs Ollama locally because their data can’t leave the building. Someone points at an OpenAI-compatible endpoint from a provider you’ve never heard of. You don’t want a separate code path for each, so you want one AiClient that all of them slot behind.

rtb-ai gets that unification from the genai crate, which already speaks to Anthropic, OpenAI, Gemini, Ollama and OpenAI-compatible endpoints. One interface, five providers, the tool author picks one in config. The Go sibling makes the same bet: go-tool-base’s chat package also unifies several providers, behind an interface deliberately kept to four methods. So far this is the obvious design, and if it were the whole design there’d be nothing to write about.

What “unified” quietly costs you

Here’s the catch in any unified interface. It can only expose what every provider behind it has in common.

The common subset is plain chat. Messages go in, text comes out, optionally streamed token by token. That’s real and it’s useful and every provider does it. But the common subset is also the floor, and the features that make a particular provider worth choosing are almost never on the floor. They’re the things only that provider does.

Anthropic is the sharp example, because it has three features that matter and none of them are common-subset.

Prompt caching. You can mark the stable parts of a request, the system prompt and the tool list, as cacheable. The provider keeps them warm, and on the next turn you aren’t billed to re-send and re-process text that didn’t change. On a long agent loop, where the same large system prompt rides along on every single turn, that’s a substantial saving in both cost and latency.

Extended thinking. The model works through a hard problem in a visible, budgeted reasoning pass before it commits to an answer, and you can see that reasoning.

Citations. Structured references back to source material in the response.

A client built strictly on the common subset cannot express any of those. It has no field for them, because four of the five providers wouldn’t know what to do with the field. So a purely lowest-common-denominator client would “support” Anthropic and then use it badly, leaving its best features unreachable. Support as a checkbox, not as the point.

The escape hatch

rtb-ai’s answer is to not choose. It runs two implementations under one interface.

For OpenAI, Gemini, Ollama and OpenAI-compatible endpoints, calls route through genai, the unified path. For Anthropic, every method drops to a direct reqwest implementation straight against the Messages API. Same AiClient on the surface, a different implementation underneath, selected by which provider the config names.

And the request type has deliberate room for the difference:

pub struct ChatRequest {
    pub system: Option<String>,
    pub messages: Vec<Message>,
    pub temperature: Option<f32>,
    pub max_tokens: Option<u32>,
    /// Anthropic-only: enables prompt caching at every stable point.
    /// Ignored on non-Anthropic providers.
    pub cache_control: bool,
    /// Anthropic-only: extended-thinking budget. `None` disables.
    /// Ignored on non-Anthropic providers.
    pub thinking: Option<ThinkingMode>,
}

Set cache_control and the Anthropic-direct path inserts cache breakpoints at the three stable points: the system prompt, the tool list, and the first message. Set thinking and it adds the thinking block, and streaming surfaces a separate ThinkingToken event so you can show the reasoning apart from the answer. On a non-Anthropic provider, both fields are simply ignored. The interface carries them; only the implementation that understands them acts on them.

A hatch, not a leak

It’s worth being precise about why this isn’t the thing it superficially resembles, which is a leaky abstraction.

A leaky abstraction is one where implementation details bleed through that you didn’t intend and can’t reason about. The abstraction quietly fails to abstract, and you’re left guessing which provider you’re really talking to.

This is the opposite of that. The two Anthropic-only fields are not a leak. They are named, documented as Anthropic-only, inert everywhere else, and right there in the public type for anyone to see. The interface is uniform for the common case and deliberately, visibly non-uniform at exactly the points where uniformity would have cost you the good features. You opt into provider-specifics by setting a field. You stay fully portable by leaving it at its default. Nothing bleeds; you decide.

The same design line explains what does stay in the unified path. Structured output, chat_structured::<T>, sends a JSON Schema derived from your Rust type with the request and validates the reply against it before handing you a typed T. That’s a portability win that costs nothing across providers, so it belongs in the common interface. The split isn’t “Anthropic versus the rest.” It’s “features that are free to unify go in the unified path; features that aren’t get a designed door.” Prompt caching and extended thinking get the door because flattening them away would be the expensive kind of convenient.

To sum up

A CLI tool that integrates AI wants one client over several providers, and a unified interface can only expose what those providers share. The shared floor is plain chat, and the features worth choosing a provider for, like Anthropic’s prompt caching, extended thinking and citations, are never on the floor.

rtb-ai keeps both. genai provides the unified path across five providers; an Anthropic-direct reqwest path drops below the abstraction for the features genai can’t reach, and ChatRequest carries the Anthropic-only fields openly, ignored elsewhere. Uniform where uniformity is free, with a designed escape hatch where it isn’t. That’s the difference between supporting a provider and actually using it.