Create Next App

← Experiments

Inputs

Keep it rough. The output is meant to be iterated with your team.

App / Spec name

Objective

Problem / Context

Primary users

Tools (one per line)

Tool Contracts (structured)

Add structured contracts for each tool (auth, rate limits, error modes, PII, idempotency).

Data sources (one per line)

Constraints (one per line)

Cost / latency budget (optional)

p95 latency

Max cost/day

Max retries

Degrade to

Success metrics (one per line)

Non-goals (one per line)

Risks / Open questions (one per line)

Generated Spec

Copy/paste into a repo, doc, ticket, or PRD.

Spec lint

5 suggestions

Missing objective — Write one sentence describing the outcome this agent delivers.
No tools listed — If the agent does real work, list the APIs/systems it can call (even if approvals are required).
No data sources listed — List systems of record (docs, CRM, ticketing, runbooks, etc.).
No constraints listed — Add a few hard constraints (approvals, privacy, retention, access controls, etc.).
No success metrics listed — Add measurable outcomes (e.g., time saved/week, accuracy, CSAT, escalation rate).

# Agent Spec

**Generated:** 2026-02-14T09:08:50.029Z

**Objective:** (TBD)

---

## Problem / Context

What situation is this agent operating in? What triggers its use?


## Primary Users

Who relies on it day-to-day?


## Success Metrics

- Define measurable outcomes


## Constraints

- List hard constraints


## Cost / Latency Budget

- **p95 latency:** (TBD)
- **Max cost/day:** (TBD)
- **Max retries:** (TBD)
- **Degrade to:** (TBD — e.g., human handoff, safer mode, or partial output)


## Non-goals

- Explicitly exclude out-of-scope items


## High-level Architecture

- **UI / Entry point:** (web app, Slack bot, API, etc.)
- **Orchestrator:** agent runtime / workflow engine
- **Tools:** external actions (APIs, DB, tickets, email)
- **Knowledge:** docs / policies / context retrieval (if applicable)
- **Observability:** logs, traces, human review hooks


## Tools

- List the tools the agent can call

### Tool Contracts

(None yet)


## Data Sources

- List the data sources / systems of record

**Data handling notes:**
- PII? (yes/no)
- Retention: (TBD)
- Access controls: (TBD)


## Evaluation Plan (MVP)

### Offline evaluation
- Create 10–30 realistic test cases (inputs + expected outputs).
- Score output on: correctness, completeness, policy compliance, and action safety.

### Online / pilot
- Start with human-in-the-loop approvals for tool actions.
- Track: task success rate, time saved, escalation rate, and user satisfaction.


## Guardrails

- Define what the agent **must never do** (e.g., send email without approval).
- Require confirmations for destructive actions.
- Log every tool call with inputs/outputs for auditability.


## Risks / Open Questions

- List known risks + unknowns


---

## Implementation Notes (for builders)


- Start with the smallest end-to-end slice that proves value.

- Add one tool at a time; ship with strong logging and safe defaults.

MVP note: this tool intentionally avoids calling an LLM. It encodes a good structure so humans (or your own model setup) can fill in the details.