Guides14 min read

Specifying AI Agent Behaviors: A PM's Guide to Agent Design

How to spec AI agent behaviors, constraints, and success criteria as a product manager. Covers goal definition, guardrails, tool use policies, and writing agent specs your engineering team can build from.

By Tim Adair• Published 2026-02-09

Quick Answer (TL;DR)

AI agents are systems that take autonomous actions to achieve goals, rather than simply generating text in response to a prompt. As a PM, specifying agent behavior requires a fundamentally different approach than specifying traditional features. You need to define goals (not just tasks), set behavioral constraints (not just UI rules), specify tool-use policies (what the agent can and cannot do), and establish success criteria that account for the probabilistic nature of agent behavior. The spec is not a PRD in the traditional sense. It is closer to a set of operating rules for an autonomous system.

Summary: Specifying AI agents requires defining goals, constraints, tool policies, and success criteria that govern autonomous behavior rather than deterministic feature flows.

Key Steps:

  • Define the agent's goal, scope, and autonomy level for each task it can perform
  • Write explicit behavioral constraints covering safety, escalation, and tool use
  • Create evaluation criteria that measure goal achievement, not just output quality
  • Time Required: 3-5 days for a comprehensive agent spec; ongoing refinement

    Best For: PMs building products with autonomous AI capabilities (coding assistants, research agents, workflow automation)


    Table of Contents

  • What Makes Agents Different from Prompts
  • The Agent Spec Document
  • Defining Agent Goals
  • Setting Autonomy Levels
  • Behavioral Constraints and Guardrails
  • Tool Use Policies
  • Escalation and Handoff Rules
  • Memory and Context Management
  • Success Criteria and Evaluation
  • The Agent Spec Template
  • Common Mistakes
  • Key Takeaways

  • What Makes Agents Different from Prompts

    A prompt-based AI feature is reactive: user sends input, model generates output, done. An AI agent is proactive: it receives a goal, plans a sequence of actions, executes those actions using tools, observes results, and iterates until the goal is achieved or it determines it cannot proceed.

    This distinction changes everything about how you spec the feature:

    From Inputs/Outputs to Goals/Actions

    Prompt-based feature: "Given a customer support ticket, generate a draft response."

    Agent-based feature: "Resolve customer support tickets by researching the issue in our knowledge base, drafting a response, checking it against our quality standards, and sending it if confidence is high enough, or escalating to a human if not."

    From Deterministic to Probabilistic

    Traditional features follow predictable paths. An agent might take 3 steps to resolve one ticket and 12 steps for another. It might succeed on the first try or need to backtrack and try a different approach. Your spec must account for this variability.

    From User-Triggered to Autonomous

    Agents can act without user input at each step. This creates new categories of risk. A traditional feature cannot accidentally delete a customer's data because every action requires user confirmation. An agent with file system access could, unless you explicitly constrain it.


    The Agent Spec Document

    An agent spec is not a traditional PRD. It is a hybrid of product requirements, operating procedures, and safety policies. It needs to answer three fundamental questions:

  • What is the agent trying to achieve? (Goals and scope)
  • What is the agent allowed to do? (Capabilities, tools, and autonomy)
  • What must the agent never do? (Constraints, safety, and escalation)
  • Who Reads the Agent Spec

    Your agent spec serves multiple audiences:

  • Engineering: Implements the agent's reasoning loop, tool integrations, and constraint enforcement
  • Trust and Safety: Reviews the constraint and escalation sections for risk
  • QA: Uses the success criteria to build test scenarios
  • Legal and Compliance: Reviews the scope and constraint sections for regulatory alignment
  • The agent itself: Parts of the spec become the agent's system prompt and behavioral instructions

  • Defining Agent Goals

    Goal Structure

    Every agent needs a clearly defined goal hierarchy:

    Primary goal: The overarching objective the agent is trying to achieve. This should be stated in terms of user or business outcomes, not agent actions.

    Good: "Resolve customer support tickets so the customer's issue is fixed and they are satisfied."

    Bad: "Generate responses to customer support tickets."

    Sub-goals: The intermediate objectives the agent pursues to achieve the primary goal. These represent the steps in the agent's typical workflow.

    Example sub-goals for a support agent:

  • Understand the customer's issue from the ticket content and history
  • Research the solution using internal knowledge base and documentation
  • Draft a response that addresses the specific issue
  • Verify the response is accurate and complete
  • Send the response or escalate to a human agent
  • Constraints on goals: Boundaries on how the agent pursues its goals.

    "Resolve the issue in the fewest steps possible. Never ask the customer for information that is already available in their account data. Prioritize accuracy over speed."

    Goal Clarity Checklist

    For each goal, verify:

  • Is the goal stated in terms of outcomes, not actions?
  • Is it measurable? How will you know if the agent achieved the goal?
  • Is it bounded? Are there clear conditions under which the agent should stop pursuing the goal?
  • Is it prioritized? If goals conflict, which one wins?

  • Setting Autonomy Levels

    Not every agent action should have the same level of autonomy. The PM's job is to define which actions the agent can take independently, which require user confirmation, and which are strictly off-limits.

    The Autonomy Spectrum

    Level 1 - Suggest: Agent recommends an action but takes no action. The user must explicitly approve.

    Example: "Suggest a response to this support ticket for the human agent to review and send."

    Level 2 - Act then notify: Agent takes the action and informs the user after the fact. The user can undo.

    Example: "Categorize and route incoming tickets automatically. Show the agent's categorization decision in the ticket metadata."

    Level 3 - Act silently: Agent takes the action without notification. Used only for low-risk, high-confidence actions.

    Example: "Update the ticket's internal priority score based on sentiment analysis."

    Level 4 - Act and prevent undo: Agent takes irreversible actions. This level should be extremely rare and heavily constrained.

    Example: Almost never appropriate for customer-facing actions. Might apply to internal cleanup tasks.

    Assigning Autonomy Levels

    For each action the agent can take, assign an autonomy level based on:

    FactorLower AutonomyHigher Autonomy
    ReversibilityAction is irreversibleAction is easily undone
    ImpactAffects customer data or experienceOnly affects internal systems
    ConfidenceAgent is uncertainAgent has high confidence
    FrequencyRare occurrenceRoutine, repetitive task
    StakesFinancial or legal implicationsNo external consequences

    Behavioral Constraints and Guardrails

    Constraints are the most important part of your agent spec. Goals tell the agent what to achieve. Constraints tell it how to behave while achieving those goals.

    Types of Constraints

    Hard constraints (never violate):

  • Never share customer data with other customers
  • Never make promises about features that do not exist
  • Never take financial actions (refunds, charges) without human approval
  • Never modify production systems without authorization
  • Soft constraints (prefer but can flex):

  • Prefer concise responses under 200 words (but go longer if the issue is complex)
  • Prefer to resolve in a single interaction (but ask clarifying questions if needed)
  • Prefer to use the knowledge base (but reason from general knowledge if KB does not cover the topic)
  • Writing Effective Constraints

    Constraints must be specific and testable. For each constraint, you should be able to write a test case that verifies the agent respects it.

    Bad constraint: "Be safe."

    Good constraint: "Never execute code that deletes, modifies, or overwrites files outside the designated workspace directory. If the user requests a file operation outside the workspace, explain the restriction and suggest an alternative."

    Bad constraint: "Do not be biased."

    Good constraint: "When comparing products or recommending solutions, present at least two options with pros and cons for each. Never recommend a single option without acknowledging its limitations."

    The Constraint Hierarchy

    When constraints conflict, the agent needs a priority order:

  • Safety constraints (protect users, data, and systems) - never override
  • Legal and compliance constraints (regulatory requirements) - never override
  • Product policy constraints (brand voice, content standards) - rarely override
  • Quality constraints (accuracy, completeness) - may flex based on context
  • Efficiency constraints (speed, cost) - most flexible

  • Tool Use Policies

    Agents achieve goals by using tools: APIs, databases, file systems, web browsers, code interpreters. Your spec must define exactly which tools the agent has access to and the rules for using each one.

    Tool Inventory

    For each tool the agent can use, document:

  • Tool name and description: What it does in plain language
  • When to use it: The conditions under which this tool is appropriate
  • When not to use it: Explicit exclusions
  • Rate limits: How often the agent can call this tool (per minute, per task, per session)
  • Fallback behavior: What to do if the tool fails or is unavailable
  • Tool Use Rules

    Least privilege principle: Give the agent access to the minimum set of tools required for its job. A support agent does not need access to the deployment pipeline. A coding assistant does not need access to the billing system.

    Read before write: When the agent needs to modify something (database, file, configuration), it should read the current state first, confirm the modification is correct, and then write. This prevents accidental overwrites.

    Confirm before destructive actions: Any tool call that deletes data, sends communications to customers, or modifies billing should require explicit confirmation (from the user or from a separate approval system).

    Log everything: Every tool call the agent makes should be logged with the input, output, and reasoning. This creates an audit trail for debugging and compliance.


    Escalation and Handoff Rules

    Every agent must know when to stop acting autonomously and involve a human. Poorly defined escalation rules are the most common cause of agent failures in production.

    Escalation Triggers

    Define specific conditions that trigger escalation:

    Confidence-based: "If the agent's confidence in its response is below 0.7, escalate to a human reviewer."

    Content-based: "If the customer mentions legal action, regulatory complaints, or media attention, immediately escalate to a senior support manager."

    Complexity-based: "If the agent has taken more than 10 actions without resolving the issue, escalate with a summary of what has been tried."

    Error-based: "If a tool call fails twice in succession, escalate with the error details."

    User-requested: "If the user asks to speak with a human at any point, immediately comply."

    The Handoff Protocol

    When escalating, the agent should:

  • Summarize what happened: What the user asked for, what the agent tried, and what the current state is
  • Provide context: All relevant data, conversation history, and tool call logs
  • Suggest next steps: What the agent thinks the human should try next
  • Notify the user: Let the user know they are being connected to a human, with an estimated wait time if available

  • Memory and Context Management

    Agents that operate across multiple interactions or sessions need memory management policies.

    Short-Term Memory (Within a Session)

    Define what the agent retains during a single interaction:

  • The full conversation history
  • Tool call results and observations
  • Its current plan and progress toward the goal
  • Any user preferences expressed during the conversation
  • Long-Term Memory (Across Sessions)

    Define what persists across interactions:

  • User preferences and past interactions (with consent)
  • Resolved issues and their solutions (for learning)
  • User-specific context (role, plan, account status)
  • What the Agent Must Forget

    Equally important is defining what the agent must not retain:

  • Sensitive information shared during support interactions (passwords, personal data) should not persist beyond the session
  • One user's data should never leak into another user's context
  • Outdated information should be expired and refreshed

  • Success Criteria and Evaluation

    Measuring Agent Performance

    Agent evaluation is fundamentally different from prompt evaluation. You are not just scoring individual outputs; you are assessing whether the agent achieved its goal through a sequence of actions.

    Goal achievement rate: What percentage of tasks does the agent complete successfully without human intervention?

    Efficiency: How many steps does the agent take on average? How long does it take? What is the cost per task?

    Constraint compliance: How often does the agent violate its constraints? Zero tolerance for hard constraint violations.

    User satisfaction: When the agent resolves an issue, how does the user rate the experience?

    Escalation quality: When the agent escalates, does it provide sufficient context for the human to take over smoothly?

    The Agent Scorecard

    Track these metrics weekly:

    MetricTargetCurrentTrend
    Goal achievement rate> 85%--
    Avg steps per task< 8--
    Hard constraint violations0--
    Soft constraint compliance> 95%--
    User satisfaction (CSAT)> 4.2/5--
    Escalation rate10-20%--
    Avg resolution time< 5 min--

    The Agent Spec Template

    Use this template as the starting point for any agent spec:

    Section 1: Agent Overview

  • Agent name and version
  • Primary goal (outcome-oriented)
  • Target users and use cases
  • Scope boundaries (what the agent does and does not do)
  • Section 2: Goal Hierarchy

  • Primary goal
  • Sub-goals (ordered)
  • Goal prioritization rules
  • Section 3: Autonomy Matrix

  • For each action: action name, autonomy level, conditions for escalation
  • Section 4: Tool Inventory

  • For each tool: name, purpose, usage rules, rate limits, fallback behavior
  • Section 5: Behavioral Constraints

  • Hard constraints (categorized by safety, legal, product policy)
  • Soft constraints with flexibility conditions
  • Constraint priority hierarchy
  • Section 6: Escalation Protocol

  • Escalation triggers (confidence, content, complexity, error, user-requested)
  • Handoff procedure
  • Notification templates
  • Section 7: Memory Policies

  • Short-term retention rules
  • Long-term persistence rules
  • Data expiration and deletion rules
  • Section 8: Success Criteria

  • Goal achievement targets
  • Efficiency targets
  • Constraint compliance targets
  • User satisfaction targets
  • Section 9: Evaluation Plan

  • Eval dataset description
  • Eval frequency
  • Metric tracking and reporting

  • Common Mistakes

    Mistake 1: Specifying actions instead of goals

    Instead: Define what the agent should achieve, not the exact steps it should take. Let the agent reason about the best path to the goal.

    Why: Overly prescriptive specs make agents brittle. They cannot adapt to unexpected situations because they are following a script, not pursuing a goal.

    Mistake 2: Missing constraint prioritization

    Instead: Explicitly rank your constraints so the agent knows which to preserve when they conflict.

    Why: Without prioritization, the agent makes arbitrary choices when constraints conflict, leading to inconsistent and sometimes dangerous behavior.

    Mistake 3: Setting autonomy too high at launch

    Instead: Start with Level 1 (suggest only) for all actions. Increase autonomy gradually based on performance data.

    Why: You can always give an agent more autonomy later. Taking autonomy away after a visible failure is a trust-destroying event for users.

    Mistake 4: No escalation path

    Instead: Define explicit escalation triggers and handoff protocols for every agent.

    Why: An agent without an escalation path will either fail silently (bad UX) or keep trying increasingly creative solutions (dangerous).

    Mistake 5: Ignoring tool interaction failures

    Instead: Define fallback behavior for every tool the agent uses. What happens when the API times out? When the database returns unexpected data?

    Why: In production, tools fail regularly. An agent without fallback behavior will either crash or hallucinate responses.


    Getting Started Checklist

    Week 1: Foundation

  • Identify the agent's primary goal and target users
  • Map the agent's workflow: what tasks does it perform and in what order?
  • List all tools the agent will need access to
  • Draft initial autonomy levels (start conservative with Level 1)
  • Week 2: Constraints and Safety

  • Write all hard constraints (safety, legal, compliance)
  • Write soft constraints with flexibility conditions
  • Define escalation triggers and handoff protocol
  • Review constraints with trust/safety and legal teams
  • Week 3: Evaluation

  • Define success criteria and target metrics
  • Build an eval dataset with 50+ scenarios (including adversarial ones)
  • Run initial eval and establish baseline performance
  • Identify gaps in the spec based on eval failures
  • Week 4: Iteration

  • Refine the spec based on eval results
  • Conduct a tabletop exercise: walk through 10 realistic scenarios with the team
  • Document known limitations and edge cases
  • Publish the final spec for engineering implementation

  • Key Takeaways

  • Agent specs are fundamentally different from traditional PRDs. You are defining goals and constraints for an autonomous system, not a deterministic feature flow.
  • Define goals in terms of outcomes, not actions. Let the agent reason about how to achieve the goal.
  • Assign explicit autonomy levels to every action. Start conservative and increase autonomy based on data.
  • Constraints are the most important part of the spec. Hard constraints must never be violated. Soft constraints can flex based on context.
  • Define tool use policies with the least privilege principle. Every tool call should be logged.
  • Every agent needs explicit escalation triggers and a clear handoff protocol to human operators.
  • Measure agent success by goal achievement rate, efficiency, constraint compliance, and user satisfaction.
  • Next Steps:

  • Pick one agent or autonomous feature in your product and write a goal hierarchy for it
  • Map every action the agent takes and assign autonomy levels
  • Write the hard constraints section and review it with your trust/safety team

  • How to Run LLM Evals
  • Prompt Engineering for Product Managers
  • Red Teaming AI Products
  • AI Product Monitoring and Observability

  • About This Guide

    Last Updated: February 9, 2026

    Reading Time: 14 minutes

    Expertise Level: Intermediate to Advanced

    Citation: Adair, Tim. "Specifying AI Agent Behaviors: A PM's Guide to Agent Design." IdeaPlan, 2026. https://ideaplan.io/guides/specifying-ai-agent-behaviors

    Free Resource

    Want More Guides Like This?

    Subscribe to get product management guides, templates, and expert strategies delivered to your inbox.

    No spam. Unsubscribe anytime.

    Want instant access to all 50+ premium templates?

    Put This Guide Into Practice

    Use our templates and frameworks to apply these concepts to your product.