Per-query vs flat AI pricing - which is honest for Indian SMBs?

Pricing & CommercialCompareBy Maharshi SapariaReviewed
SHORT ANSWER

Flat pricing is the honest model. Per-query pricing punishes the team for using the product - the more value you get, the more you pay. It also makes budgeting impossible because the bill swings monthly. KolossusAI uses a flat custom quote shaped by users, systems, and scale. No per-query meters, ever.

What per-query pricing actually does to a team

The first thing that happens when finance sees a per-query bill is a memo. Use the tool sparingly. Confirm with the team before running large queries. Justify each ad-hoc report. The memo is reasonable - finance has to control a variable cost - but it directly negates the reason you bought an AI analytics product. The whole point was to make asking questions cheap so the team asks more of them.

The second thing is selection bias in who uses the product. The senior people, who can defend a query, keep using it. The juniors, who would benefit most from cheap exploration, stop. The decisions that should be informed by data start relying on the senior person's memory again. Adoption looks fine on the dashboard (active users) and the value collapses (active questions per user trends to zero).

Same product, two pricing models, very different organisational behaviour.
Per-queryFlat
PredictabilityBill swings 3x to 5x month over monthSame number every month for the term
Adoption incentiveEach question costs money - team flinchesAsking is free at the margin - team explores
Vendor incentiveDrive your usage up to grow revenueMake sure you renew by being useful
Forecast difficultyHard - depends on usage cycles you cannot pre-commitEasy - one line in the budget for twelve months
Chargeback complexityMost data-driven unit pays the most (punishes good behaviour)Allocate flat cost across business units cleanly

The two flavours of per-query pricing

Not every metered model uses the word "query". Vendors have gotten more creative, but the meter is the meter regardless of what runs on the dashboard.

  • True per-query. The vendor charges a clean unit fee per question your team asks. ₹50 per query, ₹500 per dashboard view, ₹2,000 per scheduled report. Easy to understand, easy to bill against, ruinous in practice because the unit cost makes the team flinch every time.
  • Per-query in disguise. The pricing page says 'compute units' or 'API calls' or 'tokens' or 'execution credits' or 'concurrent sessions'. Each is a meter that ticks when your team uses the product. Vendors prefer this language because it sounds like infrastructure, not like a usage tax. Functionally, it is identical to per-query.

A useful test: read the contract and ask, can my bill go up this month if my user count, system count, and data scale do not change? If yes, you are on a metered model regardless of what the marketing says.

Why vendors prefer per-query

Per-query has unbounded upside for the vendor. If your team adopts the product enthusiastically, the bill grows in proportion to that enthusiasm. The vendor's revenue scales with your usage. The investor pitch loves it because revenue per customer compounds without new sales effort.

Per-query also lets the vendor advertise a low entry price. ₹10,000 per month plus usage looks better on a comparison page than a flat ₹1.5 lakh, even though the flat number is probably cheaper at real adoption levels.

3x - 5x
Typical month-over-month swing
Quarter-end, audit prep, board reviews
Month 4
When the meter shows up
After you cannot easily switch
~0
Vendor's marginal cost
Of your tenth query this month

A small honest sub-case: usage-based pricing makes sense for true infrastructure (cloud storage, network egress, GPU time) where the vendor's cost actually scales with your consumption. It is dishonest when applied to a finished analytics product, because the vendor's marginal cost of your tenth query this month is approximately zero.

Why finance teams hate per-query

Forecast accuracy collapses. A finance head needs to commit to next quarter's tooling spend within a few percent. A line item that varies 3x with usage breaks the forecast and attracts CFO scrutiny every cycle. The internal cost of managing that variability often exceeds the savings of the lower entry price.

The chargeback problem is worse. If finance allocates the AI tool cost to business units, per-query pricing means the most data-driven unit pays the most. That punishes good behaviour. Either finance accepts the misallocation or builds an internal allocation engine, both of which are worse than just paying flat.

Flat pricing as the honest model

Flat pricing names a number, attaches that number to a shape (users, systems, scale, deployment type), and tells you exactly what changes that number (more users, more systems, jump in data scale). The bill is the same in March and April. The team uses the product without asking. The vendor's incentive shifts from "drive your usage up" to "make sure you renew", which means making the product actually useful.

Flat pricing also gives finance permission to think about adoption rather than control. The conversation in the quarterly review changes from "why did this line item spike" to "are we getting our money's worth, and are adoption metrics climbing". That is a healthier conversation for everyone.

The behavioural change inside the team is the part most buyers do not budget for. On per-query pricing, a finance manager learns to batch questions, draft them carefully, and send them to a single power user who runs everything. On flat pricing, the same finance manager opens the tool when she has two minutes between meetings, asks a half-formed question to test a hypothesis, and discovers something the formal MIS would never have surfaced. The latter pattern is where most of the actual analytical value at Indian mid-market sits, and it only happens when the meter is off.

What flat actually means

A real flat quote names one number per month, lists what that number includes (named users, named source systems, named data scale, deployment shape, support level), and names exactly the events that change the number. Adding a source system. Crossing a user count threshold. Jumping deployment shape (managed cloud to single-tenant). That is the entire price grammar.

A fake flat quote names one number and then has a paragraph of asterisks. "Subject to fair use", "subject to capacity tier", "premium connectors billed separately", "compute credits beyond the included pool charged at...". That is per-query in a flat dress. Read the fine print before you sign.

Contract gotchas to watch for

THE FLAT-LOOKING CLAUSES THAT ARE NOT FLAT
  • Concurrent user caps. Some flat quotes name a license seat count but cap concurrent active sessions. If your team uses the product at the same time (typical morning standup), you will hit the cap and the vendor's solution is to upgrade.
  • 'Fair use' clauses. Vague language that lets the vendor introduce metering retroactively if your usage is 'abnormally high'. Push back on the definition or remove the clause.
  • Capacity tiers. 'Flat at ₹1 lakh up to tier 1 capacity, then ₹50,000 per additional capacity unit.' That is metered. Negotiate a single all-inclusive number for your projected scale.
  • Multi-year lock-in. A discounted flat three-year contract sounds great until your needs change in year two. Prefer one-year terms with a discount for renewing.
  • Auto-renewal. Standard practice, but confirm the notice window (90 days is common, 30 is friendly) and that the renewal price is fixed, not 'current list'.
FREQUENTLY ASKED

Questions readers actually ask.

What about token or API-call fees from the underlying LLM?

That is the vendor's cost, not yours. A flat-pricing vendor absorbs the underlying model inference cost into its quote. If a vendor passes through tokens or API calls as a separate line, that is a metered model with one extra step. KolossusAI's flat quote includes all model inference; we are responsible for managing the inference economics.

Is flat pricing genuinely sustainable for the vendor?

Yes, when the unit economics are managed correctly. Inference cost per query has fallen 90% in two years and continues to fall. A vendor that prices for realistic adoption (not the worst case) and invests in efficient query patterns has a healthy margin at flat pricing. Vendors that move to per-query are usually either inefficient at inference or chasing investor metrics, not responding to a real cost problem.

What if my usage genuinely is heavy?

Heavy usage shows up during the 14-day POC and shapes the quote. If your team will run thousands of questions a day against very large datasets, the deployment scale is larger and the flat number is higher. The point is that the number is named upfront and does not change month to month. There is no version of the product where you get charged more for using it more, once the quote is set.

How does KolossusAI's flat quote actually get shaped?

Four inputs. Number of active users (people who actually log in, not licensed seats). List of source systems (Tally, CRM, ERP, custom databases). Approximate data scale (transactions per month). Deployment shape (managed cloud in India, single-tenant private cloud in your account, or fully on-premise). The 14-day POC validates all four against your real environment, then we issue the flat quote. See our pricing for the full approach.

What about lock-in with a flat-pricing vendor?

Flat pricing and lock-in are separate questions. Lock-in comes from contract length, data portability, and switch cost. KolossusAI offers one-year terms with a renewal discount rather than multi-year lock-ins, exports your full query history and configuration on request, and does not stage your underlying ledger anywhere outside your boundary. The flat number is honest pricing; the short term and clean exit are honest contracting.