Quick Answer (TL;DR)
The AI Build vs. Buy framework helps product managers choose among three approaches: Use an API (fastest, cheapest, least differentiated), Fine-tune a model (moderate investment, good customization, requires domain data), or Train from scratch (highest investment, maximum control, requires ML team and infrastructure). The decision depends on five factors: how differentiated the AI needs to be, how much proprietary data you have, your team's ML expertise, your timeline, and your long-term cost tolerance. Most products should start with APIs, graduate to fine-tuning as they find product-market fit, and train from scratch only when AI is a core differentiator.
What Is the AI Build vs. Buy Framework?
Every product team adding AI capabilities faces a fundamental sourcing question: should we use an existing service, adapt one, or build our own? This question has existed in software for decades ("buy vs. build"), but AI introduces new dimensions that make the traditional framework insufficient.
In traditional software, the buy-vs-build decision is primarily about engineering effort versus customization. In AI, the decision also involves data strategy, model differentiation, ongoing maintenance costs, and a spectrum of options rather than a binary choice. You're not choosing between "build it ourselves" and "buy a tool" -- you're choosing a position on a continuum from fully outsourced to fully owned.
This framework became essential as AI capabilities commoditized rapidly. In 2020, building a text classification system required an ML team. By 2023, you could call an API. By 2025, you could fine-tune a frontier model for your specific domain in an afternoon. The options multiplied, and so did the confusion about which approach is right for which situation.
The framework helps PMs make this decision systematically by evaluating their specific context against five decision factors, then mapping to one of three primary approaches: API integration, model fine-tuning, or custom model training.
The Framework in Detail
The Three Approaches
Approach 1: Use a Third-Party API
You integrate a hosted AI service (OpenAI, Anthropic, Google Cloud AI, AWS AI services, etc.) into your product via API calls. The provider handles model training, infrastructure, scaling, and updates.
| Aspect | Detail |
|---|---|
| Time to value | Days to weeks |
| Upfront cost | Near zero |
| Ongoing cost | Per-API-call pricing (scales with usage) |
| ML expertise required | Minimal -- prompt engineering and API integration |
| Differentiation | Low -- competitors can use the same API |
| Data requirement | None for basic usage; some for prompt optimization |
| Control | Low -- provider controls model behavior, updates, and availability |
Approach 2: Fine-Tune an Existing Model
You take a pre-trained foundation model (open-source like Llama, Mistral, or a commercial model via fine-tuning APIs) and train it further on your proprietary data to specialize it for your use case.
| Aspect | Detail |
|---|---|
| Time to value | Weeks to months |
| Upfront cost | Moderate (compute for fine-tuning, data preparation) |
| Ongoing cost | Hosting/inference costs (self-hosted or managed) |
| ML expertise required | Moderate -- fine-tuning, evaluation, deployment |
| Differentiation | Moderate to high -- model reflects your unique data |
| Data requirement | Hundreds to thousands of high-quality domain-specific examples |
| Control | Moderate -- you control specialization; base model is provider's |
Approach 3: Train a Custom Model from Scratch
You design the model architecture, collect and curate training data, train the model on your infrastructure, and manage the full ML lifecycle.
| Aspect | Detail |
|---|---|
| Time to value | Months to years |
| Upfront cost | High (ML team, compute infrastructure, data acquisition) |
| Ongoing cost | Infrastructure, team, and retraining costs |
| ML expertise required | High -- full ML engineering and research team |
| Differentiation | Maximum -- model is entirely proprietary |
| Data requirement | Large (thousands to millions of examples, depending on complexity) |
| Control | Full -- you own every aspect of the model |
The Five Decision Factors
Factor 1: Differentiation Requirement
Key question: Is the AI capability a core differentiator, or is it a commodity feature?
If the AI is your product's primary value proposition -- the reason customers choose you over competitors -- you need more control and uniqueness than an API provides. If the AI is a supporting feature (spell check, basic classification, standard OCR), an API is likely sufficient.
Differentiation Spectrum:
Factor 2: Data Assets
Key question: Do you have proprietary data that would make a custom model significantly better than a generic one?
Your data advantage is the most important factor in the build-vs-buy decision. If you have unique, high-quality, domain-specific data that no API provider has access to, fine-tuning or custom training can create a model that outperforms any generic offering.
Data Assessment Matrix:
| Your Data Situation | Recommended Approach |
|---|---|
| No proprietary data, generic task | API |
| Small proprietary dataset (hundreds of examples) | API with few-shot examples, or light fine-tuning |
| Medium proprietary dataset (thousands of examples) | Fine-tuning |
| Large proprietary dataset (hundreds of thousands+) | Fine-tuning or custom training, depending on task complexity |
| Unique data that creates a competitive moat | Custom training to maximize the moat |
Factor 3: Team Expertise
Key question: Does your team have the ML engineering skills to build and maintain a custom model?
Be honest about this one. Fine-tuning requires ML engineers who understand training dynamics, evaluation methodology, and deployment infrastructure. Custom model training requires ML researchers who can design architectures, debug training instabilities, and optimize for your specific domain. Hiring these people takes months and costs significantly more than software engineers.
| Team Composition | Realistic Approach |
|---|---|
| No ML engineers | API only |
| 1-2 ML engineers | Fine-tuning with managed infrastructure |
| ML team (5+) with infrastructure | Fine-tuning or custom training |
| ML research team with compute budget | Any approach, including custom training |
Factor 4: Timeline
Key question: When does the AI feature need to be in production?
| Timeline | Realistic Approach |
|---|---|
| 1-4 weeks | API integration |
| 1-3 months | API integration or fine-tuning (if data is ready) |
| 3-6 months | Fine-tuning with data collection |
| 6-12+ months | Custom model training |
A common mistake is to start with a long custom training timeline when an API could validate the product concept in weeks. Validate first, invest later.
Factor 5: Long-Term Cost
Key question: What will this cost at your target scale?
API pricing often looks cheap at prototype scale and becomes expensive at production scale. Custom models have high upfront costs but lower marginal costs at scale.
Cost Crossover Analysis:
Run a simple cost comparison at your projected scale:
For many products, the crossover point where self-hosted fine-tuned models become cheaper than APIs is between 100,000 and 1,000,000 monthly predictions -- but this varies enormously by use case and provider pricing.
The Decision Tree
Follow this simplified decision tree to reach an initial recommendation:
- No -> Use an API. Move to step 5.
- Yes -> Continue to step 2.
- No -> Use an API now; collect data for future fine-tuning. Move to step 5.
- Yes -> Continue to step 3.
- No -> Fine-tune using a managed service (lower expertise requirement). Move to step 5.
- Yes -> Continue to step 4.
- No -> Fine-tune an existing foundation model.
- Yes -> Train a custom model from scratch.
When to Use This Framework
When NOT to Use It
Real-World Example
Scenario: A legal tech startup is building a contract analysis product that extracts key terms, identifies risks, and suggests edits for commercial agreements.
Phase 1 -- API (Months 1-3):
The team starts by integrating a frontier LLM API to analyze contracts. Results are promising but inconsistent: the model handles standard NDA clauses well but misinterprets industry-specific terms and occasionally hallucinates contract provisions. Accuracy on their evaluation set is 74%.
Cost: $2,000/month for API calls during beta with 50 users.
Phase 2 -- Fine-tuning (Months 4-8):
After collecting 3,000 annotated contract examples from their beta users (with permission), the team fine-tunes an open-source LLM. The fine-tuned model achieves 89% accuracy on their evaluation set. Industry-specific terminology is handled correctly, and hallucinations drop significantly.
Cost: $5,000 for fine-tuning compute, $3,500/month for GPU hosting. But they now serve 500 users, and equivalent API costs would be $18,000/month.
Phase 3 -- Evaluation of custom training (Month 9):
The PM runs the build-vs-buy analysis for the next phase. Custom model training would cost $200K+ and require hiring two ML researchers. The fine-tuned model at 89% accuracy already exceeds user expectations. Decision: continue fine-tuning with incremental improvements. Revisit custom training if accuracy requirements increase or the fine-tuned approach plateaus.
Key insight: The team saved 6 months and $150K+ by starting with an API, validating product-market fit, and collecting proprietary data before investing in fine-tuning. They would have wasted both time and money training a custom model for a product that hadn't yet proven market demand.
Common Pitfalls
AI Build vs. Buy vs. Other Decision Frameworks
| Framework | Scope | Unique Contribution |
|---|---|---|
| This framework | AI-specific sourcing decision with three options | Addresses the fine-tuning middle ground unique to AI; includes data asset assessment |
| Traditional Build vs. Buy | General software sourcing | Binary decision that misses the fine-tuning option central to modern AI |
| Wardley Mapping | Strategic component positioning | Helps identify which AI components are commodities vs. differentiators, but doesn't prescribe the sourcing approach |
| Make-Buy-Ally | Enterprise sourcing with partnerships | Adds partnership option (valuable for AI when a data-sharing agreement with a vendor creates unique capability) |
| Total Cost of Ownership (TCO) | Financial comparison | Important input to this framework's cost factor, but doesn't address differentiation, data, or expertise considerations |
The AI Build vs. Buy framework should be used in conjunction with TCO analysis for the cost factor and Wardley Mapping for the strategic differentiation factor. Together, they give you both the strategic and financial clarity to make a sound sourcing decision.