How it works
One base URL change. Eight checkpoints. Total governance.
Visionality sits between your applications and every LLM provider. Every request passes through. Every checkpoint enforces. Every response gets logged. Here is the full architecture, end to end.
Architecture
From client SDK to model provider.
CLIENTS
VISIONALITY GATEWAY
PROVIDERS
↓ FINANCE
Live ledger · Chargeback CSV
↓ COMPLIANCE
Immutable audit log · SOC 2 evidence
↓ SECURITY
PII events · Policy enforcement log
Request flow
The eight checkpoints of every LLM call.
Total added latency: under 5ms for a typical request.
Your app makes the call
Your code uses the OpenAI / Anthropic / Bedrock SDK exactly as before. Only your base URL points at the gateway. No SDK migration. No prompt rewrites.
Gateway authenticates the Spend Token
A Spend Token is a budget envelope, scoped to a project, a model allowlist, and a PII policy. The gateway resolves it before any work starts.
Pre-flight: budget check
Is there balance remaining on this Spend Token? If no — HTTP 402, request never leaves your infrastructure. No model is called. No cost is incurred.
Pre-flight: model allowlist
Is the requested model on this project's allowlist? Production projects can't accidentally route to research-preview models at 10× cost.
Pre-flight: PII detection
Twelve detectors scan the prompt. Per project policy: block, obfuscate (reversible tokens), or log. Default for regulated industries: block or obfuscate.
Forwarded to provider
Gateway speaks each provider's wire format natively. Streaming responses flow through SSE-passthrough. Added latency: <5ms typical.
Response logged immutably
Five append-only audit tables: request, response, tokens, cost, policy result. The application database role has UPDATE and DELETE revoked at the SQL layer.
Three dashboards, same data
Finance sees the ledger. Compliance sees the audit log. Security sees the policy enforcement events. Same source of truth, three lenses.
Components
What's actually deployed.
Gateway API
Receives requests, runs pre-flight checks, forwards to providers
Spend Token registry
Project budget envelopes with hard dollar limits
PII engine
12 detectors, reversible obfuscation, per-project policy
Allocation rules
Maps every request to project, GL code, cost centre
Append-only audit DB
Five tables, SQL-layer immutability, deploy-time invariant check
Dashboard
Live ledger, anomaly inbox, chargeback CSV export
Deploy in 30 minutes
Three things to do. Then it's running.
STEP 01
Point your base URL
One environment variable. Your SDK calls, prompts, and application code stay the same.
OPENAI_BASE_URL= https://gw.visionality.ai/v1
STEP 02
Mint Spend Tokens
Per project, per team, per task class. Set the dollar cap, model allowlist, PII policy.
vsn tokens create \ --project=cust-bot \ --limit=500
STEP 03
Invite Finance & Compliance
Share the dashboard URL. They get the views that matter to them. You stop being the human API for spend questions.
vsn invite \ [email protected] \ --role=ledger
Want the why behind the architecture?
See it on your own traffic.
A live demo on real LLM calls — not a slideshow.