Skip to main content
New: Deck Doctor. Upload your deck, get CPO-level feedback. 7-day free trial.
TemplateFREE⏱️ 15 minutes

AI Safety Plan Template for AI Products

A template for planning AI safety measures, covering threat modeling, red team testing, guardrails design, content filtering, prompt injection defense,...

Updated 2026-03-05
AI Safety Plan
#1
#2
#3
#4
#5

Edit the values above to try it with your own data. Your changes are saved locally.

Get this template

Choose your preferred format. Google Sheets and Notion are free, no account needed.

Frequently Asked Questions

How many red team test cases do we need before launch?+
For a Tier 2 (medium-risk) AI feature, aim for at least 100-200 test cases across all categories, with at least 20 per high-priority threat category. For Tier 3 (high-risk) features, double that number and include tests from external security researchers if possible. The [AI Eval Scorecard](/tools/ai-eval-scorecard) can help you structure your evaluation criteria.
Should we build our own content safety classifier or use a third-party service?+
For most teams, start with a third-party classifier (OpenAI Moderation API, Google Cloud Natural Language, Perspective API) and layer your own domain-specific rules on top. Building a custom classifier from scratch requires significant labeled data and ongoing maintenance. Use a third-party service as the base layer and add custom rules for your specific content policies.
How do we balance safety with user experience?+
Overly aggressive guardrails create false positives that frustrate users. Track the false positive rate of each guardrail and tune thresholds based on user feedback. A good target is less than 1% false positive rate for input filters and less than 0.5% for output filters. When a guardrail triggers, the refusal message should be helpful, not generic.
What is the difference between prompt injection and jailbreaking?+
Prompt injection involves inserting instructions into user input (or retrieved content) that override the system prompt. Jailbreaking involves social engineering the model into ignoring its safety training through persuasion or role-play scenarios. Both are threats, but they require different defenses. Input sanitization helps with injection; system prompt hardening and safety classifiers help with jailbreaking. The [prompt engineering glossary entry](/glossary/prompt-engineering) covers defensive prompting techniques.
How often should we update our safety plan?+
Review monthly for the first six months after launch, then quarterly. Update immediately after any safety incident. Update the red team test suite whenever a new attack vector is published in the AI safety research community. Model updates from providers (new versions, capability changes) should trigger a full safety re-evaluation.

Explore More Templates

Browse our full library of PM templates, or generate a custom version with AI.