Featured image of post Building a CLI with go-tool-base, part 4: an AI dungeon master

Building a CLI with go-tool-base, part 4: an AI dungeon master

I run a Dungeons & Dragons game on the odd weekend, so when I sat down to put an AI feature inside a CLI, my first instinct wasn’t a chatbot. It was: could the tool run a little adventure, with an AI as the dungeon master? It turns out that’s a near-perfect way to learn the chat client, because the thing that makes a game trustworthy, rules the players can’t break, is exactly the thing that makes any AI feature trustworthy. So this part builds mytool adventure: a tiny dungeon you play in your terminal, narrated by an AI that is firmly on a leash.

Part 3 pointed AI at your CLI from the outside (an agent driving your commands over MCP). This part goes the other way: AI inside your tool, as a feature you write. The worry everyone has about that is fair, AI output is unpredictable, and a CLI is meant to be dependable. The whole lesson here is how you square those two: you don’t hope the model behaves, you box it in with rules it can’t escape and mechanics it doesn’t get to invent.

As before, this is written against go-tool-base v0.6.0 (gtb version).

Behind the DM screen

A turn of our game looks like this: the player types what they want to do, the AI dungeon master narrates what happens and offers a few choices, and round it goes until the adventure reaches an end. The trick is where the truth lives. The model’s job is the prose, and only the prose. Everything else is yours:

  • The rules live in the system prompt: what the DM may and may not do.
  • The mechanics live in Go functions the model calls as tools (dice, combat). It never makes a number up.
  • The state lives in a Go struct you hand the model fresh every turn, so it never has to remember, and can’t quietly rewrite history.
  • The shape of each turn is a typed Go struct the model fills in, so your code always gets back something it can render, never a wall of prose to parse.

Two go-tool-base capabilities do the heavy lifting: the AI calling your Go functions, and the AI handing back a typed struct instead of text you have to regex. The game is just a fun excuse to use both at once.

Wiring a provider

The chat client (pkg/chat) is a library you import; you don’t need any special feature flag for it. It does need an API key, and it’ll find one from a few places. The simplest, for now, is the well-known environment variable for your provider:

export ANTHROPIC_API_KEY="sk-ant-..."   # or GEMINI_API_KEY, OPENAI_API_KEY

That’s the bottom of the client’s lookup chain, which is fine for playing locally. For a tool you actually ship, go-tool-base has the ai feature and its mytool init wizard (the same initialiser system from part 2) to store the key properly, and there’s a whole post on where a CLI should keep your keys. For learning the client, an env var is plenty.

Scaffold the command

You know this step from part 1:

gtb generate command --name adventure --short "Play a dungeon adventure"

Everything below goes in the RunAdventure function the generator left you in pkg/cmd/adventure/main.go, plus a couple of types and helpers in the same package.

The state is yours, not the model’s

Start with the truth. The game state is a plain Go struct that you own. The model never holds it; instead you hand it the current state at the top of every turn (more on that in the loop). This is the part to grow: start small, then add rooms, items, NPCs, quest flags, whatever your adventure needs. Nothing else in the design has to change when you do.

// GameState is the single source of truth for the game. Extend it freely.
type GameState struct {
	PlayerHP  int
	Location  string
	Inventory []string
	Foes      map[string]int // foe name -> remaining hit points
}

// summary renders the state into a line the model is given each turn.
func (g *GameState) summary() string {
	foes := make([]string, 0, len(g.Foes))
	for name, hp := range g.Foes {
		foes = append(foes, fmt.Sprintf("%s (%d HP)", name, hp))
	}
	return fmt.Sprintf("You have %d HP, at %s, carrying %s. Foes: %s.",
		g.PlayerHP, g.Location, strings.Join(g.Inventory, ", "), strings.Join(foes, ", "))
}

And the shape of a turn, the thing the model has to produce:

type Turn struct {
	Narration string   `json:"narration"`
	Choices   []string `json:"choices"`
	GameOver  bool     `json:"game_over"`
}

The dungeon master’s tools

A tool in pkg/chat is a chat.Tool: a name, a description the model reads to decide when to use it, a parameter schema, and a handler. The handler gets the model’s arguments as raw JSON and returns any value (which the framework JSON-encodes back to the model) or an error.

The simplest possible one is a die roll. This is the canonical “give the model something it’s bad at” tool, because language models cannot be trusted to roll fairly or even add up:

func rollTool() chat.Tool {
	return chat.Tool{
		Name:        "roll",
		Description: "Roll a die with the given number of sides; returns 1..sides.",
		// Use an anonymous struct so the schema's properties sit at the top level,
		// which is where SetTools looks. A named type would hide them behind a $ref.
		Parameters: jsonschema.Reflect(struct {
			Sides int `json:"sides" jsonschema:"description=number of sides on the die"`
		}{}),
		Handler: func(_ context.Context, args json.RawMessage) (any, error) {
			var a struct {
				Sides int `json:"sides"`
			}
			if err := json.Unmarshal(args, &a); err != nil {
				return nil, err
			}
			if a.Sides <= 0 {
				a.Sides = 20
			}
			return rand.Intn(a.Sides) + 1, nil
		},
	}
}

That comment about the anonymous struct matters, by the way. Reflect a named type and jsonschema emits a top-level reference with the real fields tucked inside, and the tool ships with no parameters at all. An anonymous struct inlines them where the framework expects. It’s the one sharp edge in the whole exercise.

Combat is where state actually changes, so combat is a tool too. Note it takes the foe by name and looks it up in Foes, so it works for the goblin and for any creature you add later, without touching this function:

func attackTool(game *GameState) chat.Tool {
	return chat.Tool{
		Name:        "attack",
		Description: "Resolve the player's attack on a named foe. Rolls to hit, applies damage.",
		Parameters: jsonschema.Reflect(struct {
			Target string `json:"target" jsonschema:"description=the name of the foe being attacked"`
		}{}),
		Handler: func(_ context.Context, args json.RawMessage) (any, error) {
			var a struct {
				Target string `json:"target"`
			}
			if err := json.Unmarshal(args, &a); err != nil {
				return nil, err
			}
			hp, ok := game.Foes[a.Target]
			if !ok {
				return map[string]any{"error": "no such foe: " + a.Target}, nil
			}
			if rand.Intn(20)+1 < 10 {
				return map[string]any{"hit": false, "foe": a.Target}, nil
			}
			dmg := rand.Intn(6) + 1
			hp -= dmg
			if hp < 0 {
				hp = 0
			}
			game.Foes[a.Target] = hp
			return map[string]any{
				"hit": true, "foe": a.Target, "damage": dmg,
				"foe_hp": hp, "defeated": hp == 0,
			}, nil
		},
	}
}

A bad target comes back as a plain error string, which the framework hands to the model so it can recover (apologise, pick a real foe) rather than crash.

That’s the whole tool set, and there’s deliberately nothing here for reading the state. The model never fetches it. Instead the loop hands it the current state at the top of every turn, which we wire up shortly. A language model has no memory you can rely on, so rather than trust it to remember the fight, you give it the truth each time.

The turn is a tool too

Here’s the neat part. The chat client won’t let a single call both run tools and return a typed struct, they’re separate modes. So instead of asking for the struct afterwards, we make submitting the turn into a tool of its own. The dungeon master ends its turn by calling submit_turn, and its handler captures the typed Turn into a variable we hold:

func submitTurnTool(out *Turn) chat.Tool {
	return chat.Tool{
		Name:        "submit_turn",
		Description: "End your turn. Call this exactly once, last, with the turn's outcome.",
		Parameters: jsonschema.Reflect(struct {
			Narration string   `json:"narration" jsonschema:"description=two-sentence narration of what just happened"`
			Choices   []string `json:"choices" jsonschema:"description=the actions the player may take next"`
			GameOver  bool     `json:"game_over" jsonschema:"description=true only if the game has ended"`
		}{}),
		Handler: func(_ context.Context, args json.RawMessage) (any, error) {
			if err := json.Unmarshal(args, out); err != nil {
				return nil, err
			}
			return "turn recorded", nil
		},
	}
}

So the turn’s structure is enforced by a schema, same as any other tool’s parameters. Your loop gets a populated Turn every round, never prose.

The rules

This is where you bound the model. The system prompt is the rulebook, and it leans hard on the tools so the DM has no room to freelance the mechanics:

const dmRules = `You are the dungeon master of a short fantasy adventure. Each turn
you are given the current game state and the player's action.

Resolve the action and end the turn:
- If the player attacks, you MUST call the attack tool with the foe's name to
  resolve it. Do not decide the hit or the damage yourself.
- For any other chance event, call the roll tool and use its result.
- For simple actions, just narrate them.
- Then call submit_turn exactly once: a two-sentence narration, two or three
  choices, and game_over.

Trust the state you are given; never contradict it. A foe at 0 hit points is dead
and stays dead. The game ends when the player's hit points reach 0 (they lose), or
when the player reaches a satisfying ending. When it ends, set game_over and narrate
the finish.

Keep the tone light and quick.`

Two of those lines carry the weight. Trusting the state you are given, and never contradicting it, is what keeps the world consistent: the state is handed in fresh every turn (the next section), so the model works from the truth instead of from a memory it does not reliably have. And you MUST call the attack tool is what stops it quietly deciding hits and damage itself when it would rather just narrate. Those two are the difference between a game with rules and a model telling a story.

The loop

Now stitch it together. Create the client with the rules as its system prompt, register the tools once, and run a turn each time the player acts:

func RunAdventure(ctx context.Context, props *props.Props, opts *AdventureOptions, args []string) error {
	game := &GameState{
		PlayerHP:  20,
		Location:  "the mouth of a damp cave",
		Inventory: []string{"a short sword", "a guttering torch"},
		Foes:      map[string]int{"goblin": 12},
	}
	var turn Turn

	client, err := chat.New(ctx, props, chat.Config{
		SystemPrompt: dmRules,
		MaxSteps:     8, // roll/attack, then submit_turn
	})
	if err != nil {
		return err
	}

	if err := client.SetTools([]chat.Tool{
		rollTool(),
		attackTool(game),
		submitTurnTool(&turn),
	}); err != nil {
		return err
	}

	action := "I step into the cave."
	for {
		turn = Turn{}
		// Hand the model the current truth, then the player's action.
		input := fmt.Sprintf("State: %s\nThe player: %s", game.summary(), action)
		if _, err := client.Chat(ctx, input); err != nil {
			return err
		}

		fmt.Println("\n" + turn.Narration)
		if turn.GameOver {
			return nil
		}
		action, err = chooseAction(turn.Choices)
		if err != nil {
			return err
		}
	}
}

The same client runs every turn, so the conversation and the tools carry through the whole game; and the State: line you prepend is always current, because the attack tool mutated game last turn. The model is never trusted to remember, only to narrate.

Let the player off the menu

The one helper I glossed is chooseAction. A bare fmt.Scanln would do, but we can do much better with almost no effort, and make a point while we’re at it. The framework already leans on Charm’s huh for its init wizard, you met it in part 2, so we’ll use the same library for a proper menu, with one deliberate addition:

func chooseAction(choices []string) (string, error) {
	const other = "__other__"

	opts := make([]huh.Option[string], 0, len(choices)+1)
	for _, c := range choices {
		opts = append(opts, huh.NewOption(c, c))
	}
	opts = append(opts, huh.NewOption("Something else...", other))

	var pick, custom string
	form := huh.NewForm(
		huh.NewGroup(
			huh.NewSelect[string]().
				Title("What do you do?").
				Options(opts...).
				Value(&pick),
		),
		// A second step that only appears when the player chose "Something else".
		huh.NewGroup(
			huh.NewInput().
				Title("Describe your action").
				Value(&custom),
		).WithHideFunc(func() bool { return pick != other }),
	)
	if err := form.Run(); err != nil {
		return "", err
	}

	if pick == other {
		return custom, nil
	}
	return pick, nil
}

The select gives the player a tidy arrow-key menu instead of typing a number, but the addition that earns its keep is the last option. “Something else…” is always there, and choosing it unfolds a second step (huh shows or hides a group with WithHideFunc) where the player types whatever they actually want to do. That free text goes straight to the dungeon master as the next turn’s input, and because the DM is an AI bound by the rules rather than a switch statement over three fixed choices, it just copes. Bargain with the goblin, search your pockets, set the cave alight: the model narrates it within the rules you gave it, rolling and applying damage through the same tools. That is the agency a scripted game can’t offer, and it’s the natural place to start building your own richer interactivity on top.

Play it

Set your key, build, and go:

export ANTHROPIC_API_KEY="sk-ant-..."
just build
./bin/mytool adventure

A turn looks like this (your wording will differ every time; the mechanics won’t):

You swing your short sword at the goblin, the blade whistling through the damp cave
air. The creature snarls as it tries to dodge your blow.

What do you do?
> Attack the goblin again
  Try to push deeper into the cave
  Retreat to the entrance
  Something else...

Your blade whistles through the air, but the nimble goblin dances back just in
time. It lunges forward with a rusty dagger in return, yet its clumsy strike only
finds empty air.

What do you do?
> Swing your sword again!
  Try to intimidate the creature
  Retreat from the cave
  Something else...

Behind that, the dungeon master called attack each turn (a hit, then a miss), the goblin’s hit points changed in the GameState you own, and the next turn handed that updated state straight back to the model. The prose is the model’s; every number is yours.

The pattern under the game

Strip the dungeon away and you’re left with the thing worth keeping. An AI feature you can ship is one where you’ve kept the model away from everything that has to be right: the rules live in the system prompt, the mechanics in typed Go tools the model must call, the state in a struct you hand it fresh each turn rather than trust it to remember, and the output in a struct it fills in rather than free text. Do that and the model’s unpredictability is confined to exactly where you want it, the wording, and walled out of everywhere you don’t, the maths, the state, the shape of the result.

Two honest limits worth knowing. There’s no temperature dial on the client (the setting that would let you turn the model’s randomness down), so you can’t make the prose reproducible; you make the mechanics reproducible instead, which for most features is what you actually needed. And a tool calling loop is several round-trips to the model per turn, so it’s not free, keep MaxSteps tight for anything interactive.

That’s the foundation, and the state struct is already sized for more than one fight: it carries a location, an inventory and a map of foes you’ve barely touched. Add a move tool that updates Location, a use_item tool that reaches into Inventory, a second creature in Foes, even a give_quest flag, and the adventure grows without the architecture changing. The model just gets more tools to call and more truth to read. Saved games come nearly free, too: the client can snapshot and resume a conversation. Next part leaves AI behind and gets the tool ready to look after itself: shipping signed self-updates, so a new release reaches your users safely. Until then, go explore the cave.

Built with Hugo
Theme Stack designed by Jimmy