Docs

How PromptPotter works.

Join the waitlist

PromptPotter treats your prompt as a program — an eight-field config with tunable axes — and evolves it with a critique-guided loop. Here's the format it works in, and the ideas behind it.

PromptPotter
What it does

One prompt in. A population out.

PromptPotter treats your prompt as a program with tunable axes. Each round it mutates one axis at a time — with a cited reason, never a random guess — and breeds a population of candidates to test.

origin prompt — GSM8K
{
  "persona": "You are solving a grade-school math word problem.",
  "task_intent": "Read the problem carefully and produce one exact numeric answer.",
  "problem_description": "",
  "instruction": "Work through the problem step-by-step. Track quantities and units explicitly. Double-check each arithmetic operation before continuing.",
  "thinking_style": "Deliberate chain-of-thought. Do not skip intermediate computations; verify each against the problem statement."
}
variants the optimizer evolves
{
  "thinking_style": [
    "Think step-by-step: break the problem into smaller arithmetic steps.",
    "Work backwards from what the question asks to what you need to compute.",
    "Write out each calculation explicitly before giving the final answer.",
    "Double-check your arithmetic at each step before proceeding."
  ],
  "instruction": [
    "Solve the math problem step by step. End your answer with #### followed by the numeric answer.",
    "Show your work clearly. The final answer must be a single number after ####.",
    "Break this into steps. After solving, verify by re-reading the question."
  ],
  "persona": [
    "You are a patient math tutor who shows all work.",
    "You are a careful problem solver who double-checks calculations."
  ]
}

Real config from the GSM8K dataset in the PromptPotter repo. The origin is the conservative floor; every variant changes exactly one axis.

The ideas

Three ideas, one loop.

01 · The loop

Critique-guided generation

Each round an LLM proposes a population of candidates, scores them on your real dataset, and critiques the result. The next round builds on that evidence — never a random guess.

02 · Escalation

Three nested layers

L1 generates every round. When it stalls, L2 refines how the task is framed; if that stalls too, L3 replans the whole strategy. Each layer fires only on real evidence.

03 · Backends

Pluggable and read-only

Any backend that publishes a pipeline definition is optimizable — model, retrieval, params and all. PromptPotter never edits the backend; it tunes a per-call overlay.

BYO model & backend
Groq OpenAI Anthropic OpenRouter Langfuse Python 3.13 TermNorm
Get early access…