Skip to content

Toolkit — Adaptation decision tree

Gate: G3 Route (level-picking). Category: routing substrate.

What problem it solves

The pair worksheet settles the what kind of AI question — classical ML vs. LLM-family. Inside the LLM family, more choices remain: raw prompting, retrieval-augmented generation (RAG), fine-tuning, single agent, tool-using agent, multi-agent. Each level adds capability and cost; the wrong level means either failing on capability or paying multiples for capability not used. The adaptation decision tree walks the team through a sequence of questions that route a piece to the lowest capability level that is sufficient. It exists because capability-level creep — picking agents when RAG would do, picking RAG when prompting would do — is the most common misroute in 2025-2026.

How it is used

A 30–45 minute G3 conversation per piece, run after the pair worksheet has landed the piece in the LLM family. The chair walks the team down the tree, answering each question yes/no based on the piece's actual requirements. Each leaf of the tree corresponds to a capability level. The walk ends with the level and a paragraph explaining why lower levels were ruled out.

Inputs

  • A piece routed to the LLM family by the pair worksheet.
  • A clear specification of what the piece must produce: input shape, output shape, knowledge sources, action requirements.
  • The AI canvas and ML canvas (for comparison metrics and failure modes).

Outputs

  • A routed capability level — prompting / RAG / fine-tuning / single agent / tool-using agent / multi-agent.
  • A walk-through record: for each question in the tree, the answer and the reasoning.
  • Flagged upgrades: conditions under which the piece would need to move to a higher level; these become review-cadence items.

Visualisation

Adaptation decision tree — sequence of yes/no questions routing piece to prompting, RAG, fine-tuning, single agent, tool-using agent, or multi-agent Needs external knowledge beyond training? no yes Output stable across inputs? Must take multi-step actions? yes no no yes Prompting template + system prompt Accuracy plateau above prompt limits? RAG retrieve + generate Needs multiple coordinating agents? no yes no yes Prompting + examples few-shot Fine-tuning train on corpus Tool-using agent single loop + tools Multi-agent coordinated Walk top-to-bottom; each "yes/no" narrows to the lowest level that meets requirements.

A yes/no walk. Each node answers a specific capability question; the terminal leaves are the routed levels. Default bias is to the left (lower capability).

Anatomy

Question 1 — external knowledge. Does the piece need information that isn't in the LLM's training? Pricing, inventory, internal docs, anything time-sensitive — yes. General knowledge tasks that pre-2024 LLMs know — no. Yes routes to RAG-or-above; no routes to prompting-or-fine-tuning.

Question 2 — stability across inputs. For prompting-candidate pieces, does the same prompt shape work across the input distribution? If yes, prompting is sufficient. If no (outputs need strong adaptation to inputs), few-shot examples or fine-tuning.

Question 3 — accuracy plateau. For prompting + examples, does accuracy plateau below the required floor? If yes, fine-tuning is the upgrade. Fine-tuning is the most expensive prompting-family step; do not go there without evidence prompting has plateaued.

Question 4 — multi-step actions. For RAG-or-above pieces, must the piece take actions (not just produce text)? If no, plain RAG. If yes, agents.

Question 5 — coordinating agents. For agent pieces, can a single loop with tools handle the workload, or are genuinely independent sub-agents needed? Defaults should be single loop; multi-agent is over-engineered for most use cases in 2025-2026.

Default bias left. At every branch, the lower-capability leaf is preferred unless evidence shows it cannot meet requirements. "We haven't tried prompting" is not evidence; offline evaluation showing prompting plateaus is.

Example

Paper trail — two pieces walked through the tree

G3 level-picking sessions for two pieces at the freight operator, W12 of 2026. Chair: Ada. Team: Priya, Alex, Raj, Amira.

Piece A: dispatcher shift-handover summariser. Routed to LLM family by the pair worksheet (unstructured text summarisation, no structured features).

  • Q1 external knowledge? No. The handover notes are the input; the task is compressing them. No retrieval needed.
  • Q2 stable across inputs? Amira: "shift notes follow a loose template; the summariser should work across all shifts with the same prompt." Yes.
  • Routed leaf: Prompting. System prompt + template. No few-shot, no RAG, no agent.
  • Flagged upgrades: if dispatchers complain about specific missed items ("the truck-mechanical notes aren't carried forward"), add few-shot examples before jumping to fine-tuning.

Piece B: carrier-relationship assistant for account managers. Routed to LLM family (free-text query, pulls from carrier history).

  • Q1 external knowledge? Yes — carrier history, contract terms, recent complaints. All internal and time-sensitive.
  • Q2 [skipped, already routed past prompting by Q1].
  • Q4 multi-step actions? Ada asks carefully: "does the assistant need to do things, or just answer questions?" Priya (with Amira): "at launch, answer questions only. Drafting a reply email is phase 2."
  • Routed leaf: RAG. Retrieve carrier records; generate a grounded answer.
  • Flagged upgrades: if phase 2 (draft replies) ships, re-walk the tree — that adds an action, which routes to a tool-using agent (with an email-draft tool, dispatcher-in-the-loop). Do not ship phase 2 on RAG alone.

Paper trail. Two pieces, two walks. Prompting and RAG routed. No agents in this engagement. Each walk produced a reasoning paragraph filed with the ML canvas. The flagged upgrade conditions went into the review-cadence matrix at G5.

Pitfalls

Skipping to agents. "This is complicated, so it's an agent problem." Most pieces that feel like agents are actually RAG + a small post-processing step. The tree forces the simpler routing to be ruled out first, with evidence.

Running the tree without offline evidence. Answering "prompting plateaus" without having tried it is unfounded. Each question that rules out a lower level needs a piece of evidence; otherwise the tree is self-justification for a pre-decided routing.

Treating the tree as a checklist. The walk should generate argument, not box-checking. If the walk took 8 minutes and produced no disagreement, the chair has probably not pushed hard enough on the default-bias-left discipline.

Missing the stable-input question. A piece that needs wildly different outputs across inputs can masquerade as prompting-candidate until it's in production and failing on edge cases. Sampling representative inputs before routing catches this.

Confusing fine-tuning with RAG. They solve different problems. Fine-tuning changes the model's default behaviour on seen patterns; RAG provides information the model didn't have. Teams who think they need fine-tuning often need RAG, and vice-versa.

Locking the routing. The tree output is a commitment for now, not forever. Explicit upgrade conditions (e.g., "if accuracy plateaus below 85% with few-shot, consider fine-tuning") go into the commitment artefact so re-routing is planned, not reactive.

When not to use

  • Pieces that haven't passed the pair worksheet. The tree presumes LLM-family routing; ML-routed pieces don't use this.
  • Pieces where the requirements are genuinely unclear (walks produce "maybe" at most nodes). The upstream specification needs more work before the tree can resolve.
  • Research or prototype work where the intent is to explore capability levels, not commit to one. The tree is for production-candidate routing.

Provenance

The adaptation-decision-tree pattern as applied to LLM-family routing is an engineering-practice adaptation that consolidates guidance from Anthropic's and OpenAI's practitioner documentation [1], Lewis et al.'s RAG paper [2], and the emerging agent-engineering literature (Wooldridge's Introduction to MultiAgent Systems provides background framing [3]). The default-left bias — prefer the lowest capability level — is informed by OpenAI's Practices for Governing Agentic AI Systems [4].

  • Pair worksheet. Upstream; lands the piece in the LLM family.
  • Retrievable-quality test. Applied at the RAG-or-above branch; tests whether RAG's retrieval is useful before committing.
  • Total-cost-of-ownership ladder. Used to confirm the level's cost is sustainable.

Verification

[1] Anthropic. Building effective agents. Engineering blog. 2024. [verified] Practitioner guidance on when to use agents vs. simpler patterns.

[2] Lewis P, Perez E, Piktus A, et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. NeurIPS. 2020. [verified] The foundational RAG paper.

[3] Wooldridge M. An Introduction to MultiAgent Systems. 2nd ed. Wiley; 2009. [verified] Academic background on multi-agent systems, preceding the LLM-agent wave.

[4] Shavit Y, Agarwal S, Brundage M, et al. Practices for governing agentic AI systems. OpenAI research paper. 2023. [verified] Governance discipline for agentic pieces; informs the default-left bias.