guides

Cost control

Pipemason is BYO Anthropic — every token you spend lands on your bill, not ours. Three levers keep spend predictable: model tiering, cost ceilings, and dry-run estimation.

Model tiering

Configure which model each agent runs on in .pipeline/config.yml. The default profile pins cheap-and-fast Haiku for low-risk work and reserves Sonnet / Opus for high-stakes phases (architect, reviewer, security auditors).

models:
  default: claude-haiku-4-5
  by_phase:
    analyze: claude-sonnet-4-6
    plan: claude-sonnet-4-6
    contracts: claude-sonnet-4-6
    impl: claude-haiku-4-5
    review: claude-sonnet-4-6
    security: claude-sonnet-4-6
  by_risk:
    high: claude-opus-4-7

Stories tagged risk: high during program planning upgrade to the by_risk.high model regardless of phase.

Cost ceilings

Set a per-program ceiling and pipemason will pause + escalate before crossing it:

limits:
  cost_ceiling_usd: 25.00       # per program
  retry_budget_per_phase: 3     # how many retries before escalation

Hitting the ceiling escalates rather than aborts — you can review what was spent, lift the ceiling, and resume.

Dry-run estimation

Before kicking off a program, estimate its cost:

pipemason cost-estimate "<your intent>" --mode greenfield-system

The estimator runs program_plan only (cheap) and projects total cost based on the planned story count, sizes, risks, and your model profile. It's a rough number — actual spend depends on how many retries each phase needs — but it's the right order of magnitude.

Cost levers ranked by impact

Sharpen the spec. The most expensive line item on any program is retries caused by ambiguous specs. A 10-minute spec edit can save half the token spend.
Right-size the story. An XL story uses more iterations than two L stories. Decompose ruthlessly during program_plan.
Tier the model. Haiku 4.5 is ~5× cheaper than Sonnet 4.6 for impl work. Most impl phases don't need a smart model — they need a fast one.
Tighten retry budgets. Default 3 retries per phase; if a phase usually succeeds in 1, lowering to 2 caps the worst case without changing the median.

Where the spend lives

Every run's cost_usd_estimate, total_tokens_in, and total_tokens_out are recorded server-side and shown on the dashboard. The dashboard metric strip surfaces a 30-day rollup; the per-run page shows the breakdown by phase.

Note

These are Anthropic API costs only. They don't include the pipemason platform fee (which is per-seat, not per-token).