Architecture
A lightweight, transparent proxy sits between your application and every AI provider. Zero latency penalty. Total financial visibility.
Architecture overview
TOLVYN intercepts every AI request before it reaches the provider — recording, attributing, and enforcing limits in real time.
Step by step
Change a single environment variable in your application —
TOLVYN_BASE_URL — to route your AI SDK calls through
TOLVYN's proxy. No SDK changes, no code refactors, no downtime.
Works with any HTTP-based model API.
Use your existing API keys. TOLVYN passes them through securely and never stores credentials in plaintext. TLS everywhere, end to end.
Every request is received by TOLVYN's proxy layer, which operates with sub-millisecond overhead. The request metadata — model, token counts, latency, status — is captured and appended to the immutable ledger.
Each ledger entry is hash-chained to the previous, making it cryptographically tamper-proof. Any retroactive modification of a record breaks the chain — immediately detectable.
Tag requests with arbitrary metadata — team,
service, environment,
feature — via HTTP headers or the TOLVYN SDK helper.
TOLVYN uses these tags to break down costs at any granularity you care about.
See which team spent $2,400 on GPT-4o last week, which microservice is driving token growth, and which model is delivering the best cost-per-output ratio — all in one dashboard.
Set hard spending limits per team, per service, or per model. Define alert thresholds at 50%, 80%, and 95% of budget. TOLVYN can automatically block requests once a budget is exhausted — no surprise invoices at month end.
Finance teams get a complete audit trail. Engineering teams get guardrails. Everyone stays aligned on AI spend without slowing down development velocity.