Skip to main content
New: 9 PM Courses with hands-on exercises and certificates
Back to Glossary
AI and Machine LearningT

Token

Definition

A token is the fundamental unit of text that large language models process. Before an LLM can read or generate text, the input must be broken into tokens by a tokenizer. Each model family uses its own tokenizer: GPT-4 uses cl100k_base, Claude uses a custom BPE tokenizer, and Llama uses SentencePiece. These tokenizers split text into subword units that balance vocabulary size with the ability to represent any text.

Common English words are usually single tokens ("the", "product", "feature"). Longer or less common words get split into multiple tokens ("tokenization" might become "token" + "ization"). Numbers, code, and non-English text often require more tokens per character. The OpenAI tokenizer tool (tiktoken) and Anthropic's token counter are useful for estimating costs before building features.

Tokens define two critical constraints for AI products: the context window (maximum tokens a model can process in one request) and the cost (API pricing is per-token). Understanding tokens helps PMs set usage limits, estimate infrastructure costs, and design prompts that maximize value within constraints. You can model how token costs affect product economics using the LLM Cost Estimator.

Why It Matters for Product Managers

Tokens are the unit of cost for AI products, much like compute hours for cloud infrastructure or API calls for integration-heavy products. Every PM building with LLMs needs to understand tokens because they directly control contribution margin. A feature that sends 10,000 tokens per request costs 10x more than one that achieves the same result in 1,000 tokens.

Token economics also shape product design. If your context window is 128K tokens, you can include a user's entire document history in each request. If it is 8K tokens, you need a retrieval strategy (like RAG) to select only the most relevant context. These are product architecture decisions that affect user experience, cost, and latency.

How to Apply It

Build token awareness into your product development process from day one. Treating tokens as an afterthought leads to cost surprises that force painful feature rollbacks or pricing changes.

Practical steps for PMs:

  • Estimate token usage per feature before building (input prompt + expected output + conversation context)
  • Set up token usage monitoring and dashboards segmented by feature and user tier
  • Implement prompt engineering best practices to reduce token waste in system prompts
  • Choose model tiers based on task complexity (cheap models for routing, expensive models for generation)
  • Set max_tokens limits on outputs to prevent runaway costs from verbose responses
  • Build token budgets into your pricing model so heavy users pay proportionally

Frequently Asked Questions

How big is a token?+
A token is roughly 3-4 characters or about 0.75 words in English. 'Product manager' is 2 tokens. 'Antidisestablishmentarianism' might be 4-5 tokens because the tokenizer splits long, uncommon words into smaller pieces. Numbers are often split: '2026' might become two tokens ('20' and '26'). Whitespace and punctuation consume tokens too. A 1,000-word document is approximately 1,300-1,500 tokens. Non-English languages typically require more tokens per word.
Why do tokens matter for product costs?+
API-based LLMs charge per token for both input and output. GPT-4o charges roughly $2.50 per million input tokens and $10 per million output tokens. Claude charges similar rates at scale. If your product sends a 2,000-token prompt and receives a 500-token response, that is 2,500 tokens per request. At 10,000 requests per day, you are processing 25 million tokens daily. Token costs are the primary variable cost for AI-powered products and directly affect contribution margin.
How do PMs optimize token usage?+
Three main levers. First, reduce input tokens by shortening system prompts, summarizing conversation history instead of passing the full thread, and caching common responses. Second, limit output tokens by setting max_tokens parameters and designing prompts that request concise responses. Third, choose the right model for each task. Use a smaller, cheaper model (GPT-4o-mini, Haiku) for simple classification or extraction, and reserve expensive models for tasks that genuinely need them.

Explore More PM Terms

Browse our complete glossary of 100+ product management terms.