AI cost governance is the set of policies, controls, and processes that an organisation puts in place to manage how money is spent on AI — specifically on large language models and other AI APIs.
It answers three questions:
- Where is the money going? Which teams, projects, and use cases are spending what, with which provider.
- Should it be going there? Enforcement of budget limits, model allowlists, and spend-per-project rules.
- Can you prove what happened? An immutable audit trail that survives a compliance review.
If you have answers to all three — enforced in software, not just in spreadsheets — you have AI cost governance.
Why it's different from cloud cost management
Cloud cost management tools (AWS Cost Explorer, Azure Cost Management, GCP Billing) were built for infrastructure: servers, storage, databases. They show you what you spent, grouped by service or tag.
AI cost governance is different in two important ways.
First, the cost drivers are different. Cloud costs are mostly fixed or predictable: an EC2 instance runs, it costs a set amount per hour. LLM costs are deeply variable — they depend on how many tokens are in each prompt, which model is called, whether the response was cached, whether the request was routed to a cheaper model, and whether a PII obfuscation layer shortened the prompt. Understanding AI spend requires understanding the request, not just the billing line item.
Second, the compliance surface is different. Cloud resources don't typically see your users' personal data. LLM prompts often do. Every AI API call is a potential data exfiltration event — not intentional, just structural. A prompt that contains a customer's name, medical history, or financial data is sending that data to a third-party model provider. Cloud cost management tools don't see this. AI cost governance does.
The three layers
A complete AI cost governance framework has three layers:
1. Visibility
You can't govern what you can't see. Visibility means a unified view of AI spend across all providers — not a separate dashboard per provider, and not a monthly invoice that arrives three weeks after the fact.
Good visibility shows you:
- Cost per request, in real time
- Cost by project, team, and GL account
- Which model was actually used (not just requested — a routing layer may have substituted a cheaper option)
- Token breakdown: prompt tokens, completion tokens, cached reads
- Anomalies: cost spikes, unusual models, new endpoints that weren't in last month's report
2. Enforcement
Visibility tells you what happened. Enforcement determines what's allowed to happen.
Enforcement mechanisms include:
Budget limits. A hard cap on spend per project or per team. When the cap is reached, requests are blocked — not just flagged. The difference matters. An alert that fires at 80% of budget and then lets the remaining 20% get spent before sending the invoice is not enforcement. A gateway that returns HTTP 402 (Payment Required) when the balance is exhausted is enforcement.
Model allowlists. Which models are permitted for which projects. A developer working on a customer-facing feature should not be able to accidentally route a production request to a research-preview model at 10× the cost of the production model. Allowlists prevent this at the infrastructure layer, not the code review layer.
PII policy. Rules for what happens when personally identifiable information is detected in a prompt before it reaches the model. Options typically include: block the request, obfuscate the PII (replace with a deterministic token, restore the real value in the response), or log and allow. The default for regulated industries is block or obfuscate.
3. Audit trail
An audit trail records every AI API call: who made it, which model, what the cost was, what project it belongs to, what happened to any PII in the prompt.
The critical word is immutable. An audit trail that application code can modify is not an audit trail for compliance purposes — it's a log. Compliance-grade audit trails are enforced at the database layer: the application database role literally cannot UPDATE or DELETE audit rows. A deploy-time check can verify this invariant on every rollout.
Who owns AI cost governance?
In most organisations, AI cost governance sits at the intersection of three teams that don't naturally talk to each other:
- Finance owns the budget and the chargeback process
- Security / Compliance owns the data governance and audit requirements
- Engineering owns the infrastructure and makes the technical choices
The problem with having each team manage their slice independently is that the gaps between teams are where things go wrong. Engineering routes a request to a cheaper model without telling Finance. Compliance reviews a model contract without knowing which projects are actually using it. Finance gets an invoice that doesn't map to any internal cost code.
AI cost governance works best when there is a single control plane — a gateway — that all three teams can see. Finance sees the ledger. Compliance sees the audit trail. Security sees the policy enforcement log. Engineering sees the request logs and routing decisions. Same data, different views, no gaps.
What it is not
AI cost governance is not AI safety. AI safety is about the outputs of models — preventing harmful, biased, or hallucinated content. AI cost governance is about the infrastructure around models — who uses them, what they cost, and what data they see.
AI cost governance is not observability. Observability (LangSmith, Helicone, Braintrust) tells you what your prompts looked like and how your chains behaved. Governance tells you whether the spend was authorised and whether the audit trail is intact. Observability answers "what did it do?" Governance answers "was it allowed to do it, and can we prove it?"
AI cost governance is not the same as vendor management. You can have a fully negotiated enterprise agreement with OpenAI and still have zero visibility into which team spent what, or whether PII was included in production prompts.
When does it become necessary?
Organisations typically start investing in AI cost governance when they hit one of these triggers:
- The first bill surprise. An unexpectedly large invoice that nobody can explain by project or team.
- The first compliance question. An auditor asking for a log of which data was sent to which model provider.
- The first PII incident. A customer support agent that accidentally included a customer's medical data in a prompt.
- The first agent runaway. An autonomous agent that spent far more than intended because there was no circuit breaker.
- The first procurement review. A security or legal team that needs to know exactly which AI vendors the organisation has data-sharing relationships with.
Each of these is a forcing function. The organisations that handle them fastest are the ones that had governance infrastructure in place before the incident, not the ones that built it in response.
Getting started
The good news is that the infrastructure for AI cost governance has become substantially simpler to deploy. A gateway that proxies all LLM API calls can implement all three layers — visibility, enforcement, audit trail — in a 30-minute deployment.
The key implementation choice is where to put enforcement: in the application code, or at the infrastructure layer. Application-code enforcement (checking a budget before making an API call, stripping PII in a helper function) is fragile — it requires every developer on every team to remember to call the right function. Infrastructure-layer enforcement (a gateway that every request passes through, regardless of which team or library made the request) is durable. The gateway doesn't care which SDK you're using. It sees every call.
The organisations that are furthest ahead on AI cost governance made this architectural choice early: one gateway, all traffic, all governance enforced there. The organisations that are catching up are the ones that tried to solve it in application code and are now discovering how many places that code didn't exist.