How it works

One base URL change. Eight checkpoints. Total governance.

Visionality sits between your applications and every LLM provider. Every request passes through. Every checkpoint enforces. Every response gets logged. Here is the full architecture, end to end.

Architecture

From client SDK to model provider.

CLIENTS

customer-bot

agent-runner

analytics-py

internal-tools

VISIONALITY GATEWAY

↳ Authenticate Spend Token

↳ Budget check

↳ Model allowlist

↳ PII detection

↳ Forward to provider

↳ Log to audit DB

PROVIDERS

OpenAI

Anthropic

Bedrock

Azure OpenAI

Gemini

↓ FINANCE

Live ledger · Chargeback CSV

↓ COMPLIANCE

Immutable audit log · SOC 2 evidence

↓ SECURITY

PII events · Policy enforcement log

Request flow

The eight checkpoints of every LLM call.

Total added latency: under 5ms for a typical request.

Your app makes the call

Your code uses the OpenAI / Anthropic / Bedrock SDK exactly as before. Only your base URL points at the gateway. No SDK migration. No prompt rewrites.

Gateway authenticates the Spend Token

A Spend Token is a budget envelope, scoped to a project, a model allowlist, and a PII policy. The gateway resolves it before any work starts.

Pre-flight: budget check

Is there balance remaining on this Spend Token? If no — HTTP 402, request never leaves your infrastructure. No model is called. No cost is incurred.

Pre-flight: model allowlist

Is the requested model on this project's allowlist? Production projects can't accidentally route to research-preview models at 10× cost.

Pre-flight: PII detection

Twelve detectors scan the prompt. Per project policy: block, obfuscate (reversible tokens), or log. Default for regulated industries: block or obfuscate.

Forwarded to provider

Gateway speaks each provider's wire format natively. Streaming responses flow through SSE-passthrough. Added latency: <5ms typical.

Response logged immutably

Five append-only audit tables: request, response, tokens, cost, policy result. The application database role has UPDATE and DELETE revoked at the SQL layer.

Three dashboards, same data

Finance sees the ledger. Compliance sees the audit log. Security sees the policy enforcement events. Same source of truth, three lenses.

Components

What's actually deployed.

Gateway API

Receives requests, runs pre-flight checks, forwards to providers

Spend Token registry

Project budget envelopes with hard dollar limits

PII engine

12 detectors, reversible obfuscation, per-project policy

Allocation rules

Maps every request to project, GL code, cost centre

Append-only audit DB

Five tables, SQL-layer immutability, deploy-time invariant check

Dashboard

Live ledger, anomaly inbox, chargeback CSV export

Deploy in 30 minutes

Three things to do. Then it's running.

STEP 01

Point your base URL

One environment variable. Your SDK calls, prompts, and application code stay the same.

OPENAI_BASE_URL=
  https://gw.visionality.ai/v1

STEP 02

Mint Spend Tokens

Per project, per team, per task class. Set the dollar cap, model allowlist, PII policy.

vsn tokens create \
  --project=cust-bot \
  --limit=500

STEP 03

Invite Finance & Compliance

Share the dashboard URL. They get the views that matter to them. You stop being the human API for spend questions.

vsn invite \
  [email protected] \
  --role=ledger

Want the why behind the architecture?

Why it matters →What is an LLM gateway? →

See it on your own traffic.

A live demo on real LLM calls — not a slideshow.

Read the deploy guide Request a Demo