When MCP went mainstream, the cost question went with it. Every tool a model can call is a line item. Every line item without a budget is a horror story.
Visionality runs MCP as a billed gateway in front of the operator's actual MCP servers — same JSON-RPC the client speaks, same upstream the tools expose, but with three things wedged in the middle:
- Authentication for two distinct lanes (enterprise ID-JAG + per-user PKCE).
- Per-tool cost catalog, with budget debits on every
tools/call. - An audit trail and an anomaly detector that surface what the operator actually needs to know.
This article walks through the shape.
The two auth lanes
The enterprise lane uses ID-JAG (the JWT-bearer pattern Okta, Microsoft, and Anthropic have all standardized on). The IdP signs an identity assertion; the gateway validates against an org-scoped idp_allowlist; the result is an opaque MCP bearer token bound to (org_id, subject, mcp_server). No per-user enrollment, no shared secrets, no copy-paste.
The per-user lane uses OAuth 2.1 with PKCE (the S256-only flow Claude Desktop and similar local clients drive). The gateway exposes /mcp/v1/authorize (params guard + redirect to the consent UI), /approve (mints a one-time code bound to the verifier), and /deny. Defense in depth: the same redirect_uri policy (RFC 8252 §7.3 loopback-only) is enforced at four layers — the planner, the approver, the deny path, and the client-registry CRUD form on the dashboard.
A leaked or coerced redirect URI can't get a code minted against it. We checked that one twice.
The cost catalog
Every (org_id, server_uri, tool_name) tuple gets a cost estimate. The operator manages it from /mcp/servers/[uri] — full CRUD, inline edit, delete with confirmation. When a client calls tools/call, the gateway:
- Looks up the cost in the catalog (404-equivalent JSON-RPC error if not registered).
- Debits the caller's spend token by that amount (
MCP_BUDGET_EXHAUSTEDJSON-RPC error if the budget is empty). - Dispatches to the upstream invoker.
- Logs the invocation to
mcp_invocations(best-effort — writer failures don't fail the response, but the operator sees them).
The catalog can be edited live. Cost estimates change frequently; the table is the source of truth.
What the operator sees
/mcp lists every MCP server the operator has seeded, with tool counts, sum-of-estimates per call, and active grant counts (mcp_grant rows that haven't expired). Click into a server and you get the catalog editor plus a 30-day per-tool spend chart, sourced from mcp_invocations.
/mcp/clients is the client-registry CRUD. Each row pins one client_id to one or more allowed redirect_uris (strict equality per RFC 6749 §3.1.2; no wildcards). Empty registry → the gateway falls back to the loopback-only policy. Populated registry → the policy is enforced strictly. Either posture is valid; the operator chooses.
/mcp/audit is the SOC 2 evidence timeline. Every approve/deny event in the last 30 days, with subject, client, redirect, registry status (matched | unregistered_client | redirect_mismatch | skipped | not_checked), and reason. The summary cards show approve rate and registry-status distribution at a glance. Filterable by subject, client, outcome.
The anomaly detector
This is the one the operator asks about most. Per (server_uri, tool_name), the detector watches three axes against a 30-day rolling mean:
tool_volume_spike— today calls > 10× mean.tool_cost_spike— today spend > 10× mean.tool_error_spike— today error rate > 10× mean (with a min-daily-calls gate to avoid the 1-call-1-error noise floor).
Severity scales with ratio: critical at 25×, warning at 15×, info below. The /mcp page surfaces the worst ten breaches with severity badges and direct links back to the tool catalog row.
The detector is a pure function over McpInvocationRow[]. No DB calls in the detector itself. The dashboard pulls the rows from mcp_invocations and hands them in. Same shape as our request-logs anomaly module — cost_spike, volume_spike, model_swap, new_model — which has been in production since v1.5.
What about sweep?
PKCE codes have a 60-second TTL by default (5–600s configurable). The Postgres-backed PkceAuthCodeStore exposes a sweep(olderThanMs) method, and the gateway exposes POST /mcp/v1/cron/pkce-sweep guarded by a shared secret. Render cron POSTs nightly. The table stays bounded; the operator doesn't need to think about it.
What we didn't build (yet)
- Client capability negotiation — the discovery doc lists what the gateway speaks; we don't yet expose per-server capabilities.
- Streaming JSON-RPC notifications — important for long-running tools, queued for Sprint Ω.
- Per-tool quota beyond the spend token's overall budget — operators have asked.
If you're piloting MCP at the enterprise scale, the gateway exists for the reason every gateway exists: the protocol assumes you have a billing surface, an identity surface, and an audit surface. We built those for you.