When should a product team use a third-party AI API instead of building their own model?

Use a third-party API when the AI task is generic (translation, sentiment analysis, OCR), your team lacks ML expertise, you need to ship in weeks rather than months, and the task does not require proprietary data to achieve acceptable quality. API-first approaches are also ideal for validating product-market fit before investing in custom infrastructure.

What are the risks of depending on a third-party AI API?

Key risks include vendor lock-in (switching costs increase as you build around a specific API), pricing changes (API costs can increase dramatically as you scale), model changes (provider updates can break your product behavior without warning), data privacy (your user data flows through a third-party system), and rate limits or outages that directly impact your product availability. Mitigate these risks with abstraction layers, fallback providers, and contractual protections.

How do you decide between fine-tuning and training from scratch?

Fine-tune when a strong foundation model exists for your domain and you need to adapt it to your specific use case with moderate amounts of domain-specific data (hundreds to thousands of examples). Train from scratch only when no adequate foundation model exists, you have massive amounts of proprietary data (millions of examples), your task is highly specialized, and you have the ML team and infrastructure to support the multi-month investment. For most product teams, fine-tuning delivers 90% of the value at 10% of the cost.

AI Build vs. Buy Framework

Quick Answer (TL;DR)

The AI Build vs. Buy framework helps product managers choose among three approaches: Use an API (fastest, cheapest, least differentiated), Fine-tune a model (moderate investment, good customization, requires domain data), or Train from scratch (highest investment, maximum control, requires ML team and infrastructure). The decision depends on five factors: how differentiated the AI needs to be, how much proprietary data you have, your team's ML expertise, your timeline, and your long-term cost tolerance. Most products should start with APIs, graduate to fine-tuning as they find product-market fit, and train from scratch only when AI is a core differentiator.

What Is the AI Build vs. Buy Framework?

Every product team adding AI capabilities faces a fundamental sourcing question: should we use an existing service, adapt one, or build our own? This question has existed in software for decades ("buy vs. build"), but AI introduces new dimensions that make the traditional framework insufficient.

In traditional software, the buy-vs-build decision is primarily about engineering effort versus customization. In AI, the decision also involves data strategy, model differentiation, ongoing maintenance costs, and a spectrum of options rather than a binary choice. You're not choosing between "build it ourselves" and "buy a tool" -- you're choosing a position on a continuum from fully outsourced to fully owned.

This framework became essential as AI capabilities commoditized rapidly. In 2020, building a text classification system required an ML team. By 2023, you could call an API. By 2025, you could fine-tune a frontier model for your specific domain in an afternoon. The options multiplied, and so did the confusion about which approach is right for which situation.

The framework helps PMs make this decision systematically by evaluating their specific context against five decision factors, then mapping to one of three primary approaches: API integration, model fine-tuning, or custom model training.

The Framework in Detail

The Three Approaches

Approach 1: Use a Third-Party API

You integrate a hosted AI service (OpenAI, Anthropic, Google Cloud AI, AWS AI services, etc.) into your product via API calls. The provider handles model training, infrastructure, scaling, and updates.

Aspect	Detail
Time to value	Days to weeks
Upfront cost	Near zero
Ongoing cost	Per-API-call pricing (scales with usage)
ML expertise required	Minimal -- prompt engineering and API integration
Differentiation	Low -- competitors can use the same API
Data requirement	None for basic usage; some for prompt optimization
Control	Low -- provider controls model behavior, updates, and availability

Approach 2: Fine-Tune an Existing Model

You take a pre-trained foundation model (open-source like Llama, Mistral, or a commercial model via fine-tuning APIs) and train it further on your proprietary data to specialize it for your use case.

Aspect	Detail
Time to value	Weeks to months
Upfront cost	Moderate (compute for fine-tuning, data preparation)
Ongoing cost	Hosting/inference costs (self-hosted or managed)
ML expertise required	Moderate -- fine-tuning, evaluation, deployment
Differentiation	Moderate to high -- model reflects your unique data
Data requirement	Hundreds to thousands of high-quality domain-specific examples
Control	Moderate -- you control specialization; base model is provider's

Approach 3: Train a Custom Model from Scratch

You design the model architecture, collect and curate training data, train the model on your infrastructure, and manage the full ML lifecycle.

Aspect	Detail
Time to value	Months to years
Upfront cost	High (ML team, compute infrastructure, data acquisition)
Ongoing cost	Infrastructure, team, and retraining costs
ML expertise required	High -- full ML engineering and research team
Differentiation	Maximum -- model is entirely proprietary
Data requirement	Large (thousands to millions of examples, depending on complexity)
Control	Full -- you own every aspect of the model

The Five Decision Factors

Factor 1: Differentiation Requirement

Key question: Is the AI capability a core differentiator, or is it a commodity feature?

If the AI is your product's primary value proposition -- the reason customers choose you over competitors -- you need more control and uniqueness than an API provides. If the AI is a supporting feature (spell check, basic classification, standard OCR), an API is likely sufficient.

Differentiation Spectrum:

Commodity (use API): Translation, basic sentiment analysis, speech-to-text, standard image classification

Adapted (fine-tune): Domain-specific classification (legal, medical, financial), customized recommendation logic, specialized content generation

Core (train from scratch): Proprietary ranking algorithms (Google Search), unique generative capabilities (Midjourney's aesthetic), novel prediction models (weather forecasting with proprietary data)

Factor 2: Data Assets

Key question: Do you have proprietary data that would make a custom model significantly better than a generic one?

Your data advantage is the most important factor in the build-vs-buy decision. If you have unique, high-quality, domain-specific data that no API provider has access to, fine-tuning or custom training can create a model that outperforms any generic offering.

Data Assessment Matrix:

Your Data Situation	Recommended Approach
No proprietary data, generic task	API
Small proprietary dataset (hundreds of examples)	API with few-shot examples, or light fine-tuning
Medium proprietary dataset (thousands of examples)	Fine-tuning
Large proprietary dataset (hundreds of thousands+)	Fine-tuning or custom training, depending on task complexity
Unique data that creates a competitive moat	Custom training to maximize the moat

Factor 3: Team Expertise

Key question: Does your team have the ML engineering skills to build and maintain a custom model?

Be honest about this one. Fine-tuning requires ML engineers who understand training dynamics, evaluation methodology, and deployment infrastructure. Custom model training requires ML researchers who can design architectures, debug training instabilities, and optimize for your specific domain. Hiring these people takes months and costs significantly more than software engineers.

Team Composition	Realistic Approach
No ML engineers	API only
1-2 ML engineers	Fine-tuning with managed infrastructure
ML team (5+) with infrastructure	Fine-tuning or custom training
ML research team with compute budget	Any approach, including custom training

Factor 4: Timeline

Key question: When does the AI feature need to be in production?

Timeline	Realistic Approach
1-4 weeks	API integration
1-3 months	API integration or fine-tuning (if data is ready)
3-6 months	Fine-tuning with data collection
6-12+ months	Custom model training

A common mistake is to start with a long custom training timeline when an API could validate the product concept in weeks. Validate first, invest later.

Factor 5: Long-Term Cost

Key question: What will this cost at your target scale?

API pricing often looks cheap at prototype scale and becomes expensive at production scale. Custom models have high upfront costs but lower marginal costs at scale.

Cost Crossover Analysis:

Run a simple cost comparison at your projected scale:

API cost at scale: (projected monthly API calls) x (per-call price) = monthly API cost

Fine-tuned model cost: (hosting/inference cost per month) + (amortized fine-tuning cost) + (maintenance engineer allocation)

Custom model cost: (infrastructure cost per month) + (ML team cost per month) + (amortized training cost)

For many products, the crossover point where self-hosted fine-tuned models become cheaper than APIs is between 100,000 and 1,000,000 monthly predictions -- but this varies enormously by use case and provider pricing.

The Decision Tree

Follow this simplified decision tree to reach an initial recommendation:

Is the AI a core product differentiator?

- No -> Use an API. Move to step 5.

- Yes -> Continue to step 2.

Do you have significant proprietary data?

- No -> Use an API now; collect data for future fine-tuning. Move to step 5.

- Yes -> Continue to step 3.

Do you have ML expertise on the team?

- No -> Fine-tune using a managed service (lower expertise requirement). Move to step 5.

- Yes -> Continue to step 4.

Does the task require a novel model architecture?

- No -> Fine-tune an existing foundation model.

- Yes -> Train a custom model from scratch.

Regardless of current choice, plan the migration path. Most teams should expect to move along the spectrum over time: API -> fine-tuned -> custom, as the product matures and the AI becomes more central.

When to Use This Framework

At the start of any AI product initiative, to make the initial sourcing decision

When evaluating whether to migrate from an API to a self-hosted model

When conducting annual AI strategy reviews to reassess current approaches

When cost-optimizing an existing AI feature at scale

When a vendor dependency creates risk that needs to be evaluated

When NOT to Use It

The AI approach is already determined by regulation or partnership. If your industry requires on-premise models or a specific vendor, the decision is made for you.

You're exploring whether AI is applicable at all. First validate that AI can solve the problem (feasibility spike), then decide how to source it.

The decision is trivial. If you're adding spell-check to a text field, just use an API. Not everything requires a framework.

Real-World Example

Scenario: A legal tech startup is building a contract analysis product that extracts key terms, identifies risks, and suggests edits for commercial agreements.

Phase 1 -- API (Months 1-3):

The team starts by integrating a frontier LLM API to analyze contracts. Results are promising but inconsistent: the model handles standard NDA clauses well but misinterprets industry-specific terms and occasionally hallucinates contract provisions. Accuracy on their evaluation set is 74%.

Cost: $2,000/month for API calls during beta with 50 users.

Phase 2 -- Fine-tuning (Months 4-8):

After collecting 3,000 annotated contract examples from their beta users (with permission), the team fine-tunes an open-source LLM. The fine-tuned model achieves 89% accuracy on their evaluation set. Industry-specific terminology is handled correctly, and hallucinations drop significantly.

Cost: $5,000 for fine-tuning compute, $3,500/month for GPU hosting. But they now serve 500 users, and equivalent API costs would be $18,000/month.

Phase 3 -- Evaluation of custom training (Month 9):

The PM runs the build-vs-buy analysis for the next phase. Custom model training would cost $200K+ and require hiring two ML researchers. The fine-tuned model at 89% accuracy already exceeds user expectations. Decision: continue fine-tuning with incremental improvements. Revisit custom training if accuracy requirements increase or the fine-tuned approach plateaus.

Key insight: The team saved 6 months and $150K+ by starting with an API, validating product-market fit, and collecting proprietary data before investing in fine-tuning. They would have wasted both time and money training a custom model for a product that hadn't yet proven market demand.

Common Pitfalls

Building before validating. Teams with ML expertise default to custom training because it's intellectually interesting. But building a custom model for a product nobody wants is the most expensive way to fail. Validate with an API first.

Underestimating API lock-in. Once your product is deeply integrated with a specific API's behavior, response format, and capabilities, switching providers is a significant engineering effort. Build an abstraction layer from day one.

Ignoring the data collection opportunity. Even while using an API, you should be collecting and labeling data that will enable fine-tuning later. Every user interaction is potential training data. Design your logging and consent infrastructure early.

Overestimating fine-tuning difficulty. With modern tooling (Hugging Face, OpenAI fine-tuning API, Replicate, etc.), fine-tuning has become dramatically more accessible. Teams that dismiss it as "too hard" often haven't evaluated the current state of the art.

Underestimating maintenance costs. A custom model isn't a one-time investment. It needs monitoring, retraining, infrastructure maintenance, and engineering attention permanently. Budget for ongoing costs, not just the initial build.

Failing to plan the migration path. The best approach today is probably not the best approach in 18 months. Design your system architecture with a clean interface between the AI component and the rest of the product, so you can swap implementations without rewriting the application.

AI Build vs. Buy vs. Other Decision Frameworks

Framework	Scope	Unique Contribution
This framework	AI-specific sourcing decision with three options	Addresses the fine-tuning middle ground unique to AI; includes data asset assessment
Traditional Build vs. Buy	General software sourcing	Binary decision that misses the fine-tuning option central to modern AI
Wardley Mapping	Strategic component positioning	Helps identify which AI components are commodities vs. differentiators, but doesn't prescribe the sourcing approach
Make-Buy-Ally	Enterprise sourcing with partnerships	Adds partnership option (valuable for AI when a data-sharing agreement with a vendor creates unique capability)
Total Cost of Ownership (TCO)	Financial comparison	Important input to this framework's cost factor, but doesn't address differentiation, data, or expertise considerations

The AI Build vs. Buy framework should be used in conjunction with TCO analysis for the cost factor and Wardley Mapping for the strategic differentiation factor. Together, they give you both the strategic and financial clarity to make a sound sourcing decision.

AI Build vs. Buy Framework: When to Use APIs, Fine-Tune, or Train from Scratch