Turn an agent idea into an implementable Markdown spec (no backend - stays in your browser).
Keep it rough. The output is meant to be iterated with your team.
Add structured contracts for each tool (auth, rate limits, error modes, PII, idempotency).
Define test cases before building - what does "working correctly" look like?
Copy/paste into a repo, doc, ticket, or PRD.
# Agent Spec
**Generated:** 2026-03-08T18:49:47.287Z
**Objective:** (TBD)
---
## Problem / Context
What situation is this agent operating in? What triggers its use?
## Primary Users
Who relies on it day-to-day?
## Success Metrics
- Define measurable outcomes
## Constraints
- List hard constraints
## Cost / Latency Budget
- **p95 latency:** (TBD)
- **Max cost/day:** (TBD)
- **Max retries:** (TBD)
- **Degrade to:** (TBD - e.g., human handoff, safer mode, or partial output)
## Non-goals
- Explicitly exclude out-of-scope items
## High-level Architecture
- **UI / Entry point:** (web app, Slack bot, API, etc.)
- **Orchestrator:** agent runtime / workflow engine
- **Tools:** external actions (APIs, DB, tickets, email)
- **Knowledge:** docs / policies / context retrieval (if applicable)
- **Observability:** logs, traces, human review hooks
## Tools
- List the tools the agent can call
### Tool Contracts
(None yet)
## Data Sources
- List the data sources / systems of record
**Data handling notes:**
- PII? (yes/no)
- Retention: (TBD)
- Access controls: (TBD)
## Evaluation Plan (MVP)
### Offline evaluation
- Create 10–30 realistic test cases (inputs + expected outputs).
- Score output on: correctness, completeness, policy compliance, and action safety.
### Online / pilot
- Start with human-in-the-loop approvals for tool actions.
- Track: task success rate, time saved, escalation rate, and user satisfaction.
## Guardrails
- Define what the agent **must never do** (e.g., send email without approval).
- Require confirmations for destructive actions.
- Log every tool call with inputs/outputs for auditability.
## Risks / Open Questions
- List known risks + unknowns
---
## Implementation Notes (for builders)
- Start with the smallest end-to-end slice that proves value.
- Add one tool at a time; ship with strong logging and safe defaults.