Agent Spec Builder

Turn an agent idea into an implementable Markdown spec (no backend - stays in your browser).

1) Fill a few fields2) Get structured SPEC.md output3) Export pack + prompt testsQuick examples →

Inputs

Keep it rough. The output is meant to be iterated with your team.

Spec completeness

0%Just started

ObjectiveProblem / ContextPrimary UsersToolsData SourcesConstraintsSuccess MetricsRisks / Open QuestionsCost / Latency BudgetEval Cases

Next up: Add a clear objective describing the outcome.

App / Spec name

Use a descriptive name your team will recognize - this becomes the doc title.

Objective

Good: "Reduce avg support response time from 4h to <15min for Tier 1 tickets." Avoid vague goals like "improve customer experience."

Problem / Context

Describe the trigger event, current pain point, and where this runs (Slack bot, API endpoint, cron job, etc.).

Primary users

Be specific: roles, team sizes, or downstream systems (e.g., L1 support reps, 12-person team).

Tools (one per line)

Include permission level for each tool - e.g., read-only, requires human approval, fully autonomous.

Tool Contracts (structured)

Add structured contracts for each tool (auth, rate limits, error modes, PII, idempotency).

Data sources (one per line)

List every data source the agent reads from. Note access patterns (real-time vs. batch, read vs. write).

Constraints (one per line)

Think: compliance, security, rate limits, cost caps, and human-in-the-loop requirements.

Cost / latency budget (optional)

p95 latency

95th percentile - the worst acceptable wait for most users.

Max cost/day

Include LLM tokens, API calls, and infrastructure.

Max retries

How many times the agent retries before giving up or escalating.

Degrade to

What happens when the agent fails - silent fallback, alert, or queue for human?

Success metrics (one per line)

Make metrics measurable: "Reduce manual triage from 200 tickets/day to <20" beats "improve efficiency."

Non-goals (one per line)

Explicitly scoping out features prevents scope creep and sets expectations with stakeholders.

Risks / Open questions (one per line)

Flag unknowns early: model accuracy gaps, missing data, regulatory gray areas, or untested edge cases.

Eval Rubric

Define test cases before building - what does "working correctly" look like?

Generated Spec

Copy/paste into a repo, doc, ticket, or PRD.

Spec lint

5 suggestions

Missing objective - Write one sentence describing the outcome this agent delivers.
No tools listed - If the agent does real work, list the APIs/systems it can call (even if approvals are required).
No data sources listed - List systems of record (docs, CRM, ticketing, runbooks, etc.).
No constraints listed - Add a few hard constraints (approvals, privacy, retention, access controls, etc.).
No success metrics listed - Add measurable outcomes (e.g., time saved/week, accuracy, CSAT, escalation rate).

# Agent Spec

**Generated:** 2026-03-08T18:49:47.287Z

**Objective:** (TBD)

---

## Problem / Context

What situation is this agent operating in? What triggers its use?


## Primary Users

Who relies on it day-to-day?


## Success Metrics

- Define measurable outcomes


## Constraints

- List hard constraints


## Cost / Latency Budget

- **p95 latency:** (TBD)
- **Max cost/day:** (TBD)
- **Max retries:** (TBD)
- **Degrade to:** (TBD - e.g., human handoff, safer mode, or partial output)


## Non-goals

- Explicitly exclude out-of-scope items


## High-level Architecture

- **UI / Entry point:** (web app, Slack bot, API, etc.)
- **Orchestrator:** agent runtime / workflow engine
- **Tools:** external actions (APIs, DB, tickets, email)
- **Knowledge:** docs / policies / context retrieval (if applicable)
- **Observability:** logs, traces, human review hooks


## Tools

- List the tools the agent can call

### Tool Contracts

(None yet)


## Data Sources

- List the data sources / systems of record

**Data handling notes:**
- PII? (yes/no)
- Retention: (TBD)
- Access controls: (TBD)


## Evaluation Plan (MVP)

### Offline evaluation
- Create 10–30 realistic test cases (inputs + expected outputs).
- Score output on: correctness, completeness, policy compliance, and action safety.

### Online / pilot
- Start with human-in-the-loop approvals for tool actions.
- Track: task success rate, time saved, escalation rate, and user satisfaction.


## Guardrails

- Define what the agent **must never do** (e.g., send email without approval).
- Require confirmations for destructive actions.
- Log every tool call with inputs/outputs for auditability.


## Risks / Open Questions

- List known risks + unknowns


---

## Implementation Notes (for builders)


- Start with the smallest end-to-end slice that proves value.

- Add one tool at a time; ship with strong logging and safe defaults.

MVP note: this tool intentionally avoids calling an LLM. It encodes a good structure so humans (or your own model setup) can fill in the details.

Comments

No comments yet.

0/5000

# Agent Spec **Generated:** 2026-03-08T18:49:47.287Z **Objective:** (TBD) --- ## Problem / Context What situation is this agent operating in? What triggers its use? ## Primary Users Who relies on it day-to-day? ## Success Metrics - Define measurable outcomes ## Constraints - List hard constraints ## Cost / Latency Budget - **p95 latency:** (TBD) - **Max cost/day:** (TBD) - **Max retries:** (TBD) - **Degrade to:** (TBD - e.g., human handoff, safer mode, or partial output) ## Non-goals - Explicitly exclude out-of-scope items ## High-level Architecture - **UI / Entry point:** (web app, Slack bot, API, etc.) - **Orchestrator:** agent runtime / workflow engine - **Tools:** external actions (APIs, DB, tickets, email) - **Knowledge:** docs / policies / context retrieval (if applicable) - **Observability:** logs, traces, human review hooks ## Tools - List the tools the agent can call ### Tool Contracts (None yet) ## Data Sources - List the data sources / systems of record **Data handling notes:** - PII? (yes/no) - Retention: (TBD) - Access controls: (TBD) ## Evaluation Plan (MVP) ### Offline evaluation - Create 10–30 realistic test cases (inputs + expected outputs). - Score output on: correctness, completeness, policy compliance, and action safety. ### Online / pilot - Start with human-in-the-loop approvals for tool actions. - Track: task success rate, time saved, escalation rate, and user satisfaction. ## Guardrails - Define what the agent **must never do** (e.g., send email without approval). - Require confirmations for destructive actions. - Log every tool call with inputs/outputs for auditability. ## Risks / Open Questions - List known risks + unknowns --- ## Implementation Notes (for builders) - Start with the smallest end-to-end slice that proves value. - Add one tool at a time; ship with strong logging and safe defaults.