What a data warehouse actually costs
Snowflake's per-credit price looks affordable on paper. The warehouse fee itself is often the smallest line. The team you need to operate it is the largest.
The hidden cost is time-to-value. The warehouse does not answer business questions on day one. It answers questions on day 270, after the schemas are mapped, the dimensions are modelled, the pipelines are stable, the tests pass, and the BI tool is connected. Most mid-market projects underestimate this by a factor of two. Year-two and onward usually settle at ₹40 lakh to ₹1.2 crore in steady state, mostly people.
Why warehouses made sense at enterprise scale
Three conditions made data warehouses the right answer at enterprise scale, all of them genuine. First, analytical workloads were heavy enough that running them against source systems would have crushed the source. A retail chain with twenty thousand stores cannot run "year-over- year same-store sales" against the live transactional database; the database is too busy serving the stores.
Second, dedicated data teams existed. A bank with a 50-person data engineering organisation could build and maintain the pipelines, the dimensional models, and the governance. The warehouse was an investment that paid back across hundreds of analysts.
Third, the data scale (hundreds of millions of rows queried frequently) genuinely needed columnar storage and MPP compute. The warehouse architecture was the right engineering answer to that scale problem.
What changes at mid-market scale
Indian mid-market businesses (50 to 500 employees, ₹50 Cr to ₹500 Cr revenue) usually fail all three of those conditions. Each of the three flips, and the cost-benefit flips with them.
- Source systems can serve analytical queries. Transaction volume is modest. Tally Prime returns a six-month outstanding query in seconds. A CRM with 50,000 contacts queries instantly. There is no source-system-being-crushed problem to solve.
- There is no dedicated data team. There is one accountant good with Excel, one founder who can read SQL if forced, and a Tally consultant on speed dial. Building a warehouse for that team means hiring two engineers whose full-time job becomes maintaining pipelines.
- Data scale is not warehouse-shaped. A typical Indian mid-market business has 5 lakh to 50 lakh rows across all source systems. That fits comfortably in the source databases themselves. The warehouse benefits (analytical speed, decoupling from source) are small; the costs (people, time, complexity) are large.
Source-system AI as the alternative
The source-system pattern: an AI analytics layer reads Tally directly, reads your CRM directly, reads your inventory module directly, translates plain-English questions into the right query for each system, and returns answers on demand. No warehouse to build. No ETL pipelines to maintain. No data team to hire. The schema drift problem solves itself because there is no intermediate model to keep in sync.
| Data warehouse | Source-system AI | |
|---|---|---|
| Year-1 cost (mid-market) | ₹50 lakh to ₹2 crore all-in | ₹12 lakh to ₹30 lakh all-in |
| Time-to-value | 6 to 18 months | About 3 weeks |
| Team needed | 2 to 4 data engineers, 1 modeller, BI specialist | No data team; existing finance and operations users |
| Query latency (mid-market scale) | Sub-second on aggregates | 1 to 5 seconds against source systems |
The trade-off is honest. Source-system queries are slower than warehouse queries on huge datasets. A warehouse can aggregate 100 million rows in a second; a source-system query against Tally on the same volume might take 30 seconds. For mid-market data scales (5 to 50 lakh rows), source-system queries return in 1 to 5 seconds, which is fast enough that the user does not notice. The other trade-off is cross-system joins. A warehouse makes "customers from CRM joined to invoices from Tally" trivial because both live in one place. Source-system AI handles this by querying each side and joining at runtime, which works well for typical mid-market joins (thousands of rows on each side) and gets harder at very large scales.
When you do need a warehouse
Source-system AI is not the answer for everyone. Four legitimate cases push you back toward the warehouse.
- Source systems are getting hammered. Analytical queries are hurting OLTP performance. Rare in mid-market, common in late-stage scale-ups where the transactional database is already at capacity.
- Genuinely large data. Hundreds of millions of rows queried frequently, where source-system queries take minutes. Rare in mid-market, common in regulated industries with long retention.
- Immutable historical snapshots. Compliance requirements (banking, insurance under IRDAI) where the source system does not retain history at the granularity you need. The warehouse becomes a governed audit store.
- A data team of 10 or more. Full-time data engineering and analytics staff. At that team size the warehouse pays back the operational cost across enough analysts to make sense.
Three case patterns we see
Source-system wins. A 200-employee manufacturer running Tally Prime, a custom CRM, and an internal production module. Five finance users, ten sales users, no data team. KolossusAI reads all three directly, team uses it daily, total cost ₹2 lakh per month all-in. A warehouse for the same business would have been ₹60 lakh year one and would have answered fewer questions.
Warehouse wins. A 1,500-employee retail chain with 80 stores, transactional volume of 50 lakh rows per month, an internal data team of eight, and regulatory snapshot requirements. They run Snowflake, Power BI, and a small AI analytics layer for ad-hoc work. Source-system-only would not keep up with the analytical workload.
Hybrid is right. A 600-employee services firm with one Tally company, three regional offices, and a growing analytics team of three. They keep Tally as the system of record, run KolossusAI for everything ad-hoc and live, and added a small warehouse for the historical board-pack reporting that needs immutable monthly snapshots. Both layers do what they are good at.
KolossusAI's source-system approach
KolossusAI is built for the source-system pattern. We connect to Tally Prime, your CRM, your ERP, and custom databases through secure connectors that read in place and never stage your underlying ledger anywhere outside your boundary. Plain-English questions translate into the right query for each system. Cross-system joins happen at runtime.
The deployment shapes match Indian mid-market reality. Managed cloud in India for businesses without a strict residency requirement. Single-tenant private cloud in your AWS or Azure account. Fully on-premise for regulated sectors or for owners who simply prefer the data to stay in the building.
See how KolossusAI works for the source-system architecture and our pricing for what this lands at financially. The 14-day production POC against your real data is free, no credit card.