Token Cost per Interaction

Quick Answer (TL;DR)

Token Cost per Interaction measures the average expenditure in tokens and dollars for each user interaction with an AI feature. The formula is Total tokens consumed x Price per token / Total interactions. Industry benchmarks: Simple chat: $0.001-0.01, Complex analysis: $0.05-0.30, RAG workflows: $0.02-0.15 per interaction. Track this metric from day one of any AI feature launch to manage unit economics.

What Is Token Cost per Interaction?

Token Cost per Interaction captures the direct inference cost of serving each user request through an AI model. Every prompt and response consumes tokens --- the fundamental billing unit for LLM APIs --- and those tokens have a price. This metric translates raw token consumption into a per-interaction cost that product managers can map to business models and pricing.

This metric is critical because AI features have variable marginal costs, unlike traditional software where the cost of serving one more request is near zero. A single complex interaction can cost 100x more than a simple one. Without tracking cost per interaction, you cannot build sustainable pricing, set usage limits, or forecast infrastructure spend.

Understanding token cost also reveals optimization opportunities. A prompt that uses 2,000 tokens of system instructions for a task that only needs 500 tokens is wasting money on every call. Breaking down costs by component --- system prompt, user input, retrieved context, model output --- shows exactly where the spend goes and where cuts are possible without degrading quality.

The Formula

Total tokens consumed x Price per token / Total interactions

How to Calculate It

Suppose your AI feature processed 50,000 interactions in a month, consuming 150 million tokens at a blended rate of $0.008 per 1,000 tokens:

Token Cost per Interaction = (150,000,000 x $0.000008) / 50,000 = $1,200 / 50,000 = $0.024

This tells you each interaction costs about 2.4 cents. If your subscription price implies a per-user budget of $2/month and users average 100 interactions, you are spending $2.40 per user --- already over budget before accounting for infrastructure, storage, and other costs.

Industry Benchmarks

Context	Range
Simple chatbot (short Q&A)	$0.001-0.01 per interaction
Complex analysis or summarization	$0.05-0.30 per interaction
RAG-augmented workflows	$0.02-0.15 per interaction
Code generation (multi-file context)	$0.10-0.50 per interaction

How to Improve Token Cost per Interaction

Trim System Prompts

System prompts are the hidden cost driver in most AI features. Audit every system prompt for redundant instructions, verbose formatting, and context that does not improve output quality. A 40% reduction in system prompt length directly reduces cost by that proportion on every single call.

Implement Smart Context Windows

Not every interaction needs the full conversation history or every retrieved document. Build logic to select only the most relevant context for each query. Techniques like conversation summarization and selective retrieval can cut context tokens by 50-70%.

Route to Cheaper Models

Use smaller, cheaper models for tasks that do not require the largest model capabilities. Classification, entity extraction, and simple Q&A can run on models that cost 10-20x less per token. Save the expensive model for complex reasoning and generation.

Cache and Pre-Compute

Identify interactions with predictable outputs and serve cached responses instead of making fresh API calls. Semantic caching, pre-computed embeddings, and result reuse can eliminate 15-30% of API calls entirely.

Set Token Budgets per Feature

Assign each AI feature a maximum token budget per interaction. Implement hard limits on input context length and output generation length. This prevents runaway costs from edge cases --- a single user pasting a 50,000-word document should not cost $5 per interaction.

Common Mistakes

Ignoring input vs. output token pricing. Most API providers charge different rates for input and output tokens, with output tokens costing 2-4x more. Optimize output length first for the biggest cost impact.

Not tracking cost by feature. Aggregating cost across all AI features hides which features are economical and which are burning money. Break down cost per interaction by feature, user segment, and query type.

Forgetting about failed requests. Tokens consumed by requests that error out, time out, or get retried still cost money. Track and reduce your waste rate.

Optimizing cost without measuring quality. Cutting tokens is easy; cutting tokens without degrading output quality is hard. Always pair cost metrics with quality metrics like task success rate and hallucination rate.

AI Cost per Output --- total cost including infrastructure to generate each AI output

LLM Response Latency --- time for an LLM to generate a response

AI Task Success Rate --- percentage of AI-assisted tasks completed correctly

Prompt-to-Value Ratio --- efficiency of converting user prompts into useful outputs

Product Metrics Cheat Sheet --- complete reference of 100+ metrics

Token Cost per Interaction: Definition, Formula & Benchmarks

Quick Answer (TL;DR)

What Is Token Cost per Interaction?

The Formula

How to Calculate It

Industry Benchmarks

How to Improve Token Cost per Interaction

Trim System Prompts

Implement Smart Context Windows

Route to Cheaper Models

Cache and Pre-Compute

Set Token Budgets per Feature

Common Mistakes

Put Metrics Into Practice

Token Cost per Interaction: Definition, Formula & Benchmarks

Quick Answer (TL;DR)

What Is Token Cost per Interaction?

The Formula

How to Calculate It

Industry Benchmarks

How to Improve Token Cost per Interaction

Trim System Prompts

Implement Smart Context Windows

Route to Cheaper Models

Cache and Pre-Compute

Set Token Budgets per Feature

Common Mistakes

Related Metrics

Put Metrics Into Practice