Millennial AI
Development

Build vs. buy for AI: how mid-market companies should decide

Tarun SharmaMarch 28, 202618 min read

TL;DR

  • --Build when the workflow is your moat. Buy when it is commodity infrastructure.
  • --Vendor lock-in risk is highest in data pipelines, lowest in general-purpose APIs.
  • --Total cost of ownership for custom builds typically runs 2-4x the initial estimate.
  • --The hybrid approach (buy the base, build the differentiation layer) works for most mid-market companies.

Why this question is harder than it seems

Build-vs-buy for AI is different from traditional software procurement. The technology shifts every six months. A vendor that is state-of-the-art in January may be commoditized by July. A custom model trained on proprietary data may outperform any SaaS product, but only if you have the engineering bench to maintain it.

Mid-market companies feel this tension more than most. They are large enough to have unique workflows worth automating, but rarely large enough to staff a dedicated ML engineering team. The right answer depends less on what is technically possible and more on where differentiation lives.

The stakes are asymmetric. A bad buy decision wastes a year of subscription fees and integration effort, but you can walk away. A bad build decision can consume 18 months of engineering capacity, produce a system that is expensive to maintain, and leave you with a codebase too costly to abandon and too brittle to extend. We see this regularly: a company builds a custom NLP pipeline, realizes six months in that the maintenance burden is unsustainable for their team size, and ends up buying a vendor product anyway. The build was not wrong in theory. It was wrong for their resource constraints.

The inverse mistake is less visible but just as costly. A company buys a vendor tool that handles 80% of their use case, then spends two years trying to get the last 20% through feature requests, workarounds, and custom integrations. That last 20% was their differentiation, and they gave up control of it.

The moat test

Ask one question: does this workflow generate outsized value because of how we do it, or is it a cost center we need to run efficiently? If the former, build. If the latter, buy.

Take a specialty insurance firm that underwrites niche risk categories. Their underwriting logic is their moat. A custom AI model trained on their proprietary loss data will outperform any generic tool. But their HR onboarding process? Commodity workflow. Buy the best SaaS tool and move on.

The error most companies make is treating everything as either a build or a buy decision. McKinsey's build-vs-buy principles for AI reinforce this: a portfolio approach (build on moat-critical workflows, buy for everything else) tends to deliver better outcomes than either extreme.

A useful diagnostic: ask each department head, "What do you do differently from competitors that directly affects revenue or margin?" Specific, defensible answers point to build candidates. Vague answers that could apply to any competitor in your space point to buy candidates. In our experience, only 15-20% of a company's workflows are genuinely differentiated. The rest are operational infrastructure.

Custom AI vs off-the-shelf: a category-by-category breakdown

The moat test gives you a framework. What follows is the practical application across five AI tool categories we see mid-market companies evaluate most frequently. For each, the honest answer on whether custom AI development makes sense or whether off-the-shelf tools get you there faster and cheaper.

Document processing. Off-the-shelf document extraction tools (think Nanonets, Rossum, AWS Textract) handle invoices, receipts, and standard business forms well. They are trained on millions of documents and will outperform anything you build in-house for common formats. Where they fall short: industry-specific documents with non-standard layouts. A mortgage servicer processing proprietary loan modification forms, a logistics company extracting data from customs declarations across 30 countries, a healthcare insurer parsing provider contracts with idiosyncratic clause structures. The deciding factor is volume. If you process more than 10,000 specialized documents per month, custom builds pay back within 12-18 months. Below that threshold, manual review plus an off-the-shelf tool is usually more cost-effective.

Customer support. Off-the-shelf chatbots (Intercom, Zendesk AI, Ada) are genuinely good at tier-1 support, handling password resets, order status, and FAQ-style questions with 70-85% resolution rates out of the box. The gap emerges in domain-specific triage. A medical device company needs its support bot to understand product model numbers, failure modes, and regulatory escalation paths. A B2B software company with a deeply technical product needs AI that can parse error logs and suggest fixes specific to their architecture. The hybrid play works well here: use an off-the-shelf platform for the conversational interface and routing, then plug in custom classification and retrieval models for the domain-specific logic.

Data analytics and business intelligence. This category has the clearest hybrid winner. Off-the-shelf BI dashboards (Tableau, Looker, Power BI) are mature, well-supported, and handle 90% of reporting needs. Where custom AI adds real value: predictive models trained on your specific data. A retailer building demand forecasting models on their sales history, a manufacturer predicting equipment failures using their sensor data, a financial services firm scoring credit risk using proprietary signals. Buy the visualization and reporting layer. Build the ML models that feed it. The models depreciate faster than most teams expect, so budget for quarterly retraining cycles from the start.

Content generation. Off-the-shelf LLM APIs (OpenAI, Anthropic, Google) are commoditizing fast. Raw generation capability is no longer a differentiator. The custom layer that matters: prompt pipelines, RAG systems pulling from your proprietary content, and fine-tuning for brand voice or domain terminology. A law firm generating first drafts of contract clauses needs retrieval from their precedent library. A manufacturing company generating maintenance procedures needs grounding in their equipment manuals. This is one area where the build vs buy calculus shifts quickly. Six months ago, fine-tuning was necessary for quality. Today, well-designed RAG systems with strong retrieval often match fine-tuned model quality at a fraction of the cost. Re-evaluate your approach every quarter.

Sales automation. Mostly an off-the-shelf category. CRM-integrated tools (Salesforce Einstein, HubSpot AI, Gong, Clari) are well-established and improving rapidly. Custom builds only make sense for proprietary scoring models where your sales process involves signals that commercial tools cannot capture: proprietary intent data, industry-specific buying patterns, relationship graphs from your unique market position. For the vast majority of mid-market sales teams, buying the platform and configuring it well delivers 90% of the value at 20% of the cost of building custom.

Across all five categories, one pattern repeats: the off-the-shelf tool handles the general case well, and custom AI development only justifies itself when your specific use case diverges meaningfully from the general case. Be honest about how unique your requirements actually are. Most companies overestimate their uniqueness by a wide margin.

CategoryOff-the-ShelfCustom BuildRecommendationCost Range
Document ProcessingStandard docsIndustry-specific formatsHybrid$500–$2K/mo vs $80K–$200K build
Customer SupportTier-1 chatbotDomain-specific triageStart off-the-shelf$1K–$5K/mo vs $60K–$150K build
Data Analytics/BIDashboardsCustom ML modelsHybrid$20–$75/user + $50K–$180K build
Content GenerationLLM APIsRAG pipelines + fine-tuningHybrid$500–$3K/mo + $40K–$120K build
Sales AutomationCRM toolsProprietary scoringMostly buy$50–$200/user vs $40K–$100K build

Vendor lock-in: where the risk is

Gartner's build-vs-buy framework for software highlights lock-in as the most underestimated risk in AI procurement, and our experience confirms it. Lock-in risk varies sharply across the AI stack. At the API layer (LLM calls, vision models, speech-to-text), switching costs are relatively low. Interfaces are converging, and most applications can swap providers with modest refactoring.

The risk concentrates in data pipelines. Once your operational data flows through a vendor's ingestion and transformation layer, migration gets expensive. Embeddings stored in a proprietary vector database, fine-tuned models hosted on a single cloud, ETL pipelines built on vendor-specific connectors. These dependencies accumulate quietly.

Our guidance: own your data layer, rent your model layer. Keep embeddings portable, maintain export capabilities for all training data, and insist on API-based integrations over platform-native ones wherever you can.

To make this concrete, consider three product categories. AI-powered document processing platforms often store your extracted data, learned templates, and custom extraction rules in proprietary formats. After 12 months of feeding documents through the system, you have a library of trained extraction patterns that do not transfer to a competitor. The switching cost is not the subscription fee. It is 6-8 months of retraining a new system on your document types.

Customer data platforms with AI features present a different lock-in pattern. They unify your customer data from multiple sources, build identity graphs, and train predictive models on the combined dataset. The data itself may be exportable (most vendors offer CSV exports), but the identity resolution logic, the predictive models, and the integration configurations are not. You can get your data out. You cannot get your intelligence out.

Conversational AI platforms accumulate lock-in through dialogue flows, training utterances, and integration hooks into your backend systems. A company that has spent 18 months refining a customer-facing chatbot with hundreds of intent patterns, entity definitions, and escalation rules faces a near-complete rebuild when switching providers. The data is technically portable. The architecture is not.

The common thread: lock-in is less about data portability and more about accumulated configuration, training, and workflow logic inside the vendor's system. When evaluating any AI vendor, ask: "If we leave in 18 months, what can we take with us and what do we lose?" If the vendor cannot give a straight answer, that tells you something.

What custom builds cost

Custom AI builds cost more than the initial estimate. This is predictable: systems that interact with production data always surface surprises. Data quality issues appear after deployment. Edge cases multiply. Users request changes that seem minor but require architectural rework.

Deloitte's analysis of custom AI development costs confirms what we see in practice: a realistic cost model includes initial development (the number everyone quotes), data preparation and cleaning (typically 30-40% of build cost), ongoing model maintenance and retraining (15-25% of initial cost annually), and integration upkeep as upstream systems evolve.

Companies that account for these costs upfront make better build-vs-buy decisions than those comparing the build estimate against the annual SaaS subscription.

The table below breaks down a realistic three-year TCO starting from a $150,000 initial build estimate. As HBR's research on the real cost of building AI products documents, the result is a 2-4x multiplier over the initial estimate. Teams do not estimate poorly — the initial estimate only covers the first phase of a multi-phase commitment. Data preparation costs surface mid-project, maintenance grows as the system takes on edge cases, and integration upkeep accumulates as connected systems evolve.

Compare that to a SaaS product at $3,000/month ($108,000 over three years). The SaaS option looks cheaper on paper, but only if it solves your problem. If it gets you 70% of the way and you spend the remaining 30% on workarounds, manual processes, and frustration, the math changes. The fair comparison: full TCO of the custom build versus full TCO of the SaaS product plus the cost of living with its limitations.

Cost ComponentAmountWhen It Hits
Initial build estimate$150,000Project start
Data preparation$45K–$60KMid-project
Year 1 maintenance$22.5K–$37.5KPost-launch
Year 2 maintenance$22.5K–$37.5KOngoing
Year 3 maintenance$22.5K–$37.5KOngoing
Integration upkeep (3 yrs)$45K–$75KOngoing
3-year TCO$300K–$420K

Buy the base, build the differentiation

For most mid-market companies, the answer is neither pure build nor pure buy. It is a layered approach: buy the foundational infrastructure (cloud hosting, base models, general-purpose APIs) and build the differentiation layer on top.

In practice: use off-the-shelf LLMs but wrap them in custom prompting pipelines tuned to your domain. Buy a CRM but build custom lead-scoring models that reflect your specific market signals. Use open-source frameworks but train on proprietary data.

Forrester's 2025 framework for enterprise AI build-vs-buy decisions reaches a similar conclusion: the hybrid approach requires more architectural discipline than either extreme, but it balances speed-to-market against long-term defensibility. For companies with 50-500 employees, it is usually the right call.

The key is a strict boundary between the bought layer and the built layer. This boundary should be an API or a well-defined data interface, not a tangled integration. When the boundary is clean, you can swap out the bought layer without rebuilding the custom layer on top.

A few scenarios. A mid-market logistics company needs route optimization. Buy a mapping and geocoding API (commodity infrastructure), build a custom optimization layer that accounts for your specific constraints (driver preferences, customer time windows, vehicle capacity rules unique to your fleet). The mapping API is interchangeable. Your optimization logic is your edge.

A professional services firm needs proposal generation. Use a commercial LLM for language generation, build a retrieval layer that pulls from your past proposals, win/loss data, and pricing history. The LLM is swappable (and will be swapped as better ones emerge). Your proprietary data and the retrieval logic around it are the defensible asset.

A recurring mistake: building the custom layer too tightly coupled to the bought layer's specific features or data formats. If your custom scoring model assumes a specific CRM's data schema, switching CRMs means rebuilding the model. Build an abstraction layer between them. It costs an extra week upfront and saves months later.

One more point: document your architecture decisions and why you made them. In 18 months, someone will ask why you built the scoring layer custom instead of using the CRM's built-in scoring. If the reasoning is not written down, the team may revisit the decision unnecessarily, or worse, rip out the custom layer without understanding why it exists. A one-page architecture decision record for each build-vs-buy choice pays for itself many times over.

Tarun Sharma

Tarun Sharma

Partner, Engineering

IIT Kanpur, Jaguar Land Rover, a published paper in Elsevier, and his own company (Twinity Labs) building digital twins. Deepest technologist on the team. He decides what gets built and how.

LinkedIn
Try our free AI assessment