When MCP went mainstream, the cost question went with it. Every tool a model can call is a line item. Every line item without a budget is a horror story.

Visionality runs MCP as a billed gateway in front of the operator's actual MCP servers — same JSON-RPC the client speaks, same upstream the tools expose, but with three things wedged in the middle:

Authentication for two distinct lanes (enterprise ID-JAG + per-user PKCE).
Per-tool cost catalog, with budget debits on every tools/call.
An audit trail and an anomaly detector that surface what the operator actually needs to know.

This article walks through the shape.

The two auth lanes

The enterprise lane uses ID-JAG (the JWT-bearer pattern Okta, Microsoft, and Anthropic have all standardized on). The IdP signs an identity assertion; the gateway validates against an org-scoped idp_allowlist; the result is an opaque MCP bearer token bound to (org_id, subject, mcp_server). No per-user enrollment, no shared secrets, no copy-paste.

The per-user lane uses OAuth 2.1 with PKCE (the S256-only flow Claude Desktop and similar local clients drive). The gateway exposes /mcp/v1/authorize (params guard + redirect to the consent UI), /approve (mints a one-time code bound to the verifier), and /deny. Defense in depth: the same redirect_uri policy (RFC 8252 §7.3 loopback-only) is enforced at four layers — the planner, the approver, the deny path, and the client-registry CRUD form on the dashboard.

A leaked or coerced redirect URI can't get a code minted against it. We checked that one twice.

The cost catalog

Every (org_id, server_uri, tool_name) tuple gets a cost estimate. The operator manages it from /mcp/servers/[uri] — full CRUD, inline edit, delete with confirmation. When a client calls tools/call, the gateway:

Looks up the cost in the catalog (404-equivalent JSON-RPC error if not registered).
Debits the caller's spend token by that amount (MCP_BUDGET_EXHAUSTED JSON-RPC error if the budget is empty).
Dispatches to the upstream invoker.
Logs the invocation to mcp_invocations (best-effort — writer failures don't fail the response, but the operator sees them).

The catalog can be edited live. Cost estimates change frequently; the table is the source of truth.

What the operator sees

/mcp lists every MCP server the operator has seeded, with tool counts, sum-of-estimates per call, and active grant counts (mcp_grant rows that haven't expired). Click into a server and you get the catalog editor plus a 30-day per-tool spend chart, sourced from mcp_invocations.

/mcp/clients is the client-registry CRUD. Each row pins one client_id to one or more allowed redirect_uris (strict equality per RFC 6749 §3.1.2; no wildcards). Empty registry → the gateway falls back to the loopback-only policy. Populated registry → the policy is enforced strictly. Either posture is valid; the operator chooses.

/mcp/audit is the SOC 2 evidence timeline. Every approve/deny event in the last 30 days, with subject, client, redirect, registry status (matched | unregistered_client | redirect_mismatch | skipped | not_checked), and reason. The summary cards show approve rate and registry-status distribution at a glance. Filterable by subject, client, outcome.

The anomaly detector

This is the one the operator asks about most. Per (server_uri, tool_name), the detector watches three axes against a 30-day rolling mean:

tool_volume_spike — today calls > 10× mean.
tool_cost_spike — today spend > 10× mean.
tool_error_spike — today error rate > 10× mean (with a min-daily-calls gate to avoid the 1-call-1-error noise floor).

Severity scales with ratio: critical at 25×, warning at 15×, info below. The /mcp page surfaces the worst ten breaches with severity badges and direct links back to the tool catalog row.

The detector is a pure function over McpInvocationRow[]. No DB calls in the detector itself. The dashboard pulls the rows from mcp_invocations and hands them in. Same shape as our request-logs anomaly module — cost_spike, volume_spike, model_swap, new_model — which has been in production since v1.5.

What about sweep?

PKCE codes have a 60-second TTL by default (5–600s configurable). The Postgres-backed PkceAuthCodeStore exposes a sweep(olderThanMs) method, and the gateway exposes POST /mcp/v1/cron/pkce-sweep guarded by a shared secret. Render cron POSTs nightly. The table stays bounded; the operator doesn't need to think about it.

What we didn't build (yet)

Client capability negotiation — the discovery doc lists what the gateway speaks; we don't yet expose per-server capabilities.
Streaming JSON-RPC notifications — important for long-running tools, queued for Sprint Ω.
Per-tool quota beyond the spend token's overall budget — operators have asked.

If you're piloting MCP at the enterprise scale, the gateway exists for the reason every gateway exists: the protocol assumes you have a billing surface, an identity surface, and an audit surface. We built those for you.

MCP, Billed Per Tool: A Gateway Built for the New Agent Surface