PromptPotter treats your prompt as a program — an eight-field config with tunable axes — and evolves it with a critique-guided loop. Here's the format it works in, and the ideas behind it.
PromptPotter treats your prompt as a program — an eight-field config with tunable axes — and evolves it with a critique-guided loop. Here's the format it works in, and the ideas behind it.
PromptPotter treats your prompt as a program with tunable axes. Each round it mutates one axis at a time — with a cited reason, never a random guess — and breeds a population of candidates to test.
{
"persona": "You are solving a grade-school math word problem.",
"task_intent": "Read the problem carefully and produce one exact numeric answer.",
"problem_description": "",
"instruction": "Work through the problem step-by-step. Track quantities and units explicitly. Double-check each arithmetic operation before continuing.",
"thinking_style": "Deliberate chain-of-thought. Do not skip intermediate computations; verify each against the problem statement."
} {
"thinking_style": [
"Think step-by-step: break the problem into smaller arithmetic steps.",
"Work backwards from what the question asks to what you need to compute.",
"Write out each calculation explicitly before giving the final answer.",
"Double-check your arithmetic at each step before proceeding."
],
"instruction": [
"Solve the math problem step by step. End your answer with #### followed by the numeric answer.",
"Show your work clearly. The final answer must be a single number after ####.",
"Break this into steps. After solving, verify by re-reading the question."
],
"persona": [
"You are a patient math tutor who shows all work.",
"You are a careful problem solver who double-checks calculations."
]
} Real config from the GSM8K dataset in the PromptPotter repo. The origin is the conservative floor; every variant changes exactly one axis.
Each round an LLM proposes a population of candidates, scores them on your real dataset, and critiques the result. The next round builds on that evidence — never a random guess.
L1 generates every round. When it stalls, L2 refines how the task is framed; if that stalls too, L3 replans the whole strategy. Each layer fires only on real evidence.
Any backend that publishes a pipeline definition is optimizable — model, retrieval, params and all. PromptPotter never edits the backend; it tunes a per-call overlay.