Why Indian mid-market businesses don't need a data warehouse

AI Analytics FundamentalsWhyBy Maharshi SapariaReviewed
SHORT ANSWER

Data warehouses (Snowflake, Databricks) need ETL pipelines, dedicated data engineers, and 6-18 months to implement. For Indian mid-market businesses without a 10-person data team, the warehouse cost often exceeds the value. AI that reads source systems directly skips the warehouse and gets to answers in three weeks.

What a data warehouse actually costs

Snowflake's per-credit price looks affordable on paper. The warehouse fee itself is often the smallest line. The team you need to operate it is the largest.

₹2L+ / month
Snowflake compute and storage
Smallest standard warehouse, modest workload
₹50K - ₹2L / month
ETL tooling
Fivetran or similar
₹15L - ₹40L / yr
Per data engineer
Two to four engineers needed in steady state
₹50L - ₹2 Cr
Year-1 all-in
License + ETL + people + modelling + BI tool on top
~270 days
Time to first business answer
After schemas are mapped and pipelines stable

The hidden cost is time-to-value. The warehouse does not answer business questions on day one. It answers questions on day 270, after the schemas are mapped, the dimensions are modelled, the pipelines are stable, the tests pass, and the BI tool is connected. Most mid-market projects underestimate this by a factor of two. Year-two and onward usually settle at ₹40 lakh to ₹1.2 crore in steady state, mostly people.

Why warehouses made sense at enterprise scale

Three conditions made data warehouses the right answer at enterprise scale, all of them genuine. First, analytical workloads were heavy enough that running them against source systems would have crushed the source. A retail chain with twenty thousand stores cannot run "year-over- year same-store sales" against the live transactional database; the database is too busy serving the stores.

Second, dedicated data teams existed. A bank with a 50-person data engineering organisation could build and maintain the pipelines, the dimensional models, and the governance. The warehouse was an investment that paid back across hundreds of analysts.

Third, the data scale (hundreds of millions of rows queried frequently) genuinely needed columnar storage and MPP compute. The warehouse architecture was the right engineering answer to that scale problem.

What changes at mid-market scale

Indian mid-market businesses (50 to 500 employees, ₹50 Cr to ₹500 Cr revenue) usually fail all three of those conditions. Each of the three flips, and the cost-benefit flips with them.

THE THREE CONDITIONS THAT FLIP AT MID-MARKET
  • Source systems can serve analytical queries. Transaction volume is modest. Tally Prime returns a six-month outstanding query in seconds. A CRM with 50,000 contacts queries instantly. There is no source-system-being-crushed problem to solve.
  • There is no dedicated data team. There is one accountant good with Excel, one founder who can read SQL if forced, and a Tally consultant on speed dial. Building a warehouse for that team means hiring two engineers whose full-time job becomes maintaining pipelines.
  • Data scale is not warehouse-shaped. A typical Indian mid-market business has 5 lakh to 50 lakh rows across all source systems. That fits comfortably in the source databases themselves. The warehouse benefits (analytical speed, decoupling from source) are small; the costs (people, time, complexity) are large.

Source-system AI as the alternative

The source-system pattern: an AI analytics layer reads Tally directly, reads your CRM directly, reads your inventory module directly, translates plain-English questions into the right query for each system, and returns answers on demand. No warehouse to build. No ETL pipelines to maintain. No data team to hire. The schema drift problem solves itself because there is no intermediate model to keep in sync.

Comparison assumes 5 to 50 lakh rows total across source systems, the mid-market range.
Data warehouseSource-system AI
Year-1 cost (mid-market)₹50 lakh to ₹2 crore all-in₹12 lakh to ₹30 lakh all-in
Time-to-value6 to 18 monthsAbout 3 weeks
Team needed2 to 4 data engineers, 1 modeller, BI specialistNo data team; existing finance and operations users
Query latency (mid-market scale)Sub-second on aggregates1 to 5 seconds against source systems

The trade-off is honest. Source-system queries are slower than warehouse queries on huge datasets. A warehouse can aggregate 100 million rows in a second; a source-system query against Tally on the same volume might take 30 seconds. For mid-market data scales (5 to 50 lakh rows), source-system queries return in 1 to 5 seconds, which is fast enough that the user does not notice. The other trade-off is cross-system joins. A warehouse makes "customers from CRM joined to invoices from Tally" trivial because both live in one place. Source-system AI handles this by querying each side and joining at runtime, which works well for typical mid-market joins (thousands of rows on each side) and gets harder at very large scales.

When you do need a warehouse

Source-system AI is not the answer for everyone. Four legitimate cases push you back toward the warehouse.

  • Source systems are getting hammered. Analytical queries are hurting OLTP performance. Rare in mid-market, common in late-stage scale-ups where the transactional database is already at capacity.
  • Genuinely large data. Hundreds of millions of rows queried frequently, where source-system queries take minutes. Rare in mid-market, common in regulated industries with long retention.
  • Immutable historical snapshots. Compliance requirements (banking, insurance under IRDAI) where the source system does not retain history at the granularity you need. The warehouse becomes a governed audit store.
  • A data team of 10 or more. Full-time data engineering and analytics staff. At that team size the warehouse pays back the operational cost across enough analysts to make sense.

Three case patterns we see

Source-system wins. A 200-employee manufacturer running Tally Prime, a custom CRM, and an internal production module. Five finance users, ten sales users, no data team. KolossusAI reads all three directly, team uses it daily, total cost ₹2 lakh per month all-in. A warehouse for the same business would have been ₹60 lakh year one and would have answered fewer questions.

Warehouse wins. A 1,500-employee retail chain with 80 stores, transactional volume of 50 lakh rows per month, an internal data team of eight, and regulatory snapshot requirements. They run Snowflake, Power BI, and a small AI analytics layer for ad-hoc work. Source-system-only would not keep up with the analytical workload.

Hybrid is right. A 600-employee services firm with one Tally company, three regional offices, and a growing analytics team of three. They keep Tally as the system of record, run KolossusAI for everything ad-hoc and live, and added a small warehouse for the historical board-pack reporting that needs immutable monthly snapshots. Both layers do what they are good at.

KolossusAI's source-system approach

KolossusAI is built for the source-system pattern. We connect to Tally Prime, your CRM, your ERP, and custom databases through secure connectors that read in place and never stage your underlying ledger anywhere outside your boundary. Plain-English questions translate into the right query for each system. Cross-system joins happen at runtime.

The deployment shapes match Indian mid-market reality. Managed cloud in India for businesses without a strict residency requirement. Single-tenant private cloud in your AWS or Azure account. Fully on-premise for regulated sectors or for owners who simply prefer the data to stay in the building.

See how KolossusAI works for the source-system architecture and our pricing for what this lands at financially. The 14-day production POC against your real data is free, no credit card.

FREQUENTLY ASKED

Questions readers actually ask.

What about query performance on large data?

Mid-market data sizes (5 to 50 lakh rows total) return in 1 to 5 seconds against source systems, which is indistinguishable from a warehouse for the user. The performance gap opens at hundreds of millions of rows, which most Indian mid-market businesses simply do not have. If you do, source-system AI plus a small warehouse for the heavy aggregations is the right hybrid pattern.

What does Snowflake actually cost in India?

Snowflake itself starts around ₹2 lakh per month for a small standard warehouse running modest workloads, but that is just compute and storage. Add ETL tooling (Fivetran or similar at ₹50,000 to ₹2 lakh per month), two data engineers, a BI tool on top, and the all-in cost lands ₹50 lakh to ₹2 crore year one. The Snowflake bill is often less than 20% of the total.

Can I add a warehouse later if I scale up?

Yes, and that is the right path. Start with source-system AI because it is fast to deploy and matches mid-market scale. If your business grows past the warehouse threshold (analytical workload starts hurting OLTP, data scale crosses hundreds of millions of rows, you hire a real data team), add a warehouse for the workloads that need it and keep the AI layer for the ad-hoc work. The two coexist cleanly.

What if I already have a warehouse?

KolossusAI reads warehouses too. If you have already invested in Snowflake or Databricks, the AI layer points at the warehouse instead of the source systems and answers questions in plain English on top of your existing model. You keep the warehouse for the heavy workloads it was built for and add ad-hoc accessibility on top. No rebuild required.

How does this differ from a 'modern data stack'?

The modern data stack (Fivetran, dbt, Snowflake, Looker) is a warehouse-first architecture optimised for enterprise data teams. Source-system AI is a warehouse-skipping architecture optimised for mid-market businesses without a data team. They are different bets on what mid-market actually needs. The modern data stack is the right answer once you have a real data team; until then, the operational cost outruns the value.