Inputs
Keep it rough. The output is meant to be iterated with your team.
Tool Contracts (structured)
Add structured contracts for each tool (auth, rate limits, error modes, PII, idempotency).
Cost / latency budget (optional)
Generated Spec
Copy/paste into a repo, doc, ticket, or PRD.
Spec lint
5 suggestions
- Missing objective — Write one sentence describing the outcome this agent delivers.
- No tools listed — If the agent does real work, list the APIs/systems it can call (even if approvals are required).
- No data sources listed — List systems of record (docs, CRM, ticketing, runbooks, etc.).
- No constraints listed — Add a few hard constraints (approvals, privacy, retention, access controls, etc.).
- No success metrics listed — Add measurable outcomes (e.g., time saved/week, accuracy, CSAT, escalation rate).
# Agent Spec
**Generated:** 2026-02-14T09:08:50.029Z
**Objective:** (TBD)
---
## Problem / Context
What situation is this agent operating in? What triggers its use?
## Primary Users
Who relies on it day-to-day?
## Success Metrics
- Define measurable outcomes
## Constraints
- List hard constraints
## Cost / Latency Budget
- **p95 latency:** (TBD)
- **Max cost/day:** (TBD)
- **Max retries:** (TBD)
- **Degrade to:** (TBD — e.g., human handoff, safer mode, or partial output)
## Non-goals
- Explicitly exclude out-of-scope items
## High-level Architecture
- **UI / Entry point:** (web app, Slack bot, API, etc.)
- **Orchestrator:** agent runtime / workflow engine
- **Tools:** external actions (APIs, DB, tickets, email)
- **Knowledge:** docs / policies / context retrieval (if applicable)
- **Observability:** logs, traces, human review hooks
## Tools
- List the tools the agent can call
### Tool Contracts
(None yet)
## Data Sources
- List the data sources / systems of record
**Data handling notes:**
- PII? (yes/no)
- Retention: (TBD)
- Access controls: (TBD)
## Evaluation Plan (MVP)
### Offline evaluation
- Create 10–30 realistic test cases (inputs + expected outputs).
- Score output on: correctness, completeness, policy compliance, and action safety.
### Online / pilot
- Start with human-in-the-loop approvals for tool actions.
- Track: task success rate, time saved, escalation rate, and user satisfaction.
## Guardrails
- Define what the agent **must never do** (e.g., send email without approval).
- Require confirmations for destructive actions.
- Log every tool call with inputs/outputs for auditability.
## Risks / Open Questions
- List known risks + unknowns
---
## Implementation Notes (for builders)
- Start with the smallest end-to-end slice that proves value.
- Add one tool at a time; ship with strong logging and safe defaults.
MVP note: this tool intentionally avoids calling an LLM. It encodes a good structure so humans (or your own model setup) can fill in the details.