guides
Cost control
Pipemason is BYO Anthropic — every token you spend lands on your bill, not ours. Three levers keep spend predictable: model tiering, cost ceilings, and dry-run estimation.
Model tiering
Configure which model each agent runs on in .pipeline/config.yml. The default profile pins cheap-and-fast Haiku for low-risk work and reserves Sonnet / Opus for high-stakes phases (architect, reviewer, security auditors).
models:
default: claude-haiku-4-5
by_phase:
analyze: claude-sonnet-4-6
plan: claude-sonnet-4-6
contracts: claude-sonnet-4-6
impl: claude-haiku-4-5
review: claude-sonnet-4-6
security: claude-sonnet-4-6
by_risk:
high: claude-opus-4-7Stories tagged risk: high during program planning upgrade to the by_risk.high model regardless of phase.
Cost ceilings
Set a per-program ceiling and pipemason will pause + escalate before crossing it:
limits: cost_ceiling_usd: 25.00 # per program retry_budget_per_phase: 3 # how many retries before escalation
Hitting the ceiling escalates rather than aborts — you can review what was spent, lift the ceiling, and resume.
Dry-run estimation
Before kicking off a program, estimate its cost:
pipemason cost-estimate "<your intent>" --mode greenfield-system
The estimator runs program_plan only (cheap) and projects total cost based on the planned story count, sizes, risks, and your model profile. It's a rough number — actual spend depends on how many retries each phase needs — but it's the right order of magnitude.
Cost levers ranked by impact
- Sharpen the spec. The most expensive line item on any program is retries caused by ambiguous specs. A 10-minute spec edit can save half the token spend.
- Right-size the story. An XL story uses more iterations than two L stories. Decompose ruthlessly during
program_plan. - Tier the model. Haiku 4.5 is ~5× cheaper than Sonnet 4.6 for impl work. Most impl phases don't need a smart model — they need a fast one.
- Tighten retry budgets. Default 3 retries per phase; if a phase usually succeeds in 1, lowering to 2 caps the worst case without changing the median.
Where the spend lives
Every run's cost_usd_estimate, total_tokens_in, and total_tokens_out are recorded server-side and shown on the dashboard. The dashboard metric strip surfaces a 30-day rollup; the per-run page shows the breakdown by phase.
Note