Product

What TOLVYN does

The complete financial control layer for AI infrastructure. Eleven systems that work together so you always know what was spent, by whom, and why.

The Immutable Ledger

Every AI request that passes through TOLVYN is recorded in an append-only ledger. Each entry is hashed with SHA-256, chained to the previous entry, and signed with HMAC-SHA256. The result: a cryptographically verifiable record of every API call your organization made to every AI provider, ever.

How the chain works

Each ledger record holds a prev_hash field pointing at the SHA-256 hash of the previous record's serialized payload. Any retroactive modification of any record breaks the chain — and breakage is detectable in O(1) at any sequence number.

HMAC-SHA256 signature

On top of the hash chain, each record is signed with HMAC-SHA256 using a per-tenant key. Signature verification proves both integrity (the record wasn't modified) and authenticity (the record was written by TOLVYN, not forged by someone with read access to the database).

Advisory lock for sequence integrity

Sequence numbers are allocated under a Postgres advisory lock per tenant. There are no gaps and no duplicates — even under concurrent writes from multiple proxy workers.

Verification endpoint

# Verify the entire ledger for your tenant
$ tolvyn ledger verify --from 1 --to latest
verifying 4,821 records...
✓ hash chain intact
✓ all HMAC signatures valid
✓ no sequence gaps
ledger integrity: PASS

What it proves — and what it doesn't

The ledger proves what was billable: which request, at which timestamp, to which model, with which token counts, at which cost. It does not store prompt or response content. There is nothing in the ledger your finance team can't show an auditor.

Six dimensions of cost attribution

Every request is tagged across six dimensions. Slice cost reports by any combination.

  • TeamX-Tolvyn-Team — engineering, marketing, support
  • ServiceX-Tolvyn-Service — chatbot-api, search-svc, content-gen
  • FeatureX-Tolvyn-Feature — autocomplete, summarize, classify
  • AgentX-Tolvyn-Agent — sdr-agent, support-bot, code-reviewer
  • UserX-Tolvyn-User — your end user's ID, hashed before storage
  • End-customerX-Tolvyn-Customer — your customer ID for COGS-per-customer

Hierarchy

Dimensions are independent — but in the dashboard they nest naturally: end-customer → team → service → feature → agent → user. Roll up to any level for board reports; drill down to any level for incident response.

Budgets & Enforcement

Set spending limits at any granularity. Choose how strictly they're enforced. Get pre-request cost estimation so the proxy can refuse a request before it hits the provider.

Hard vs soft mode

Hard mode — the proxy returns HTTP 429 with a x-tolvyn-budget header explaining which budget was hit. Your application can fall back, queue, or surface an error to the user. Soft mode — the request goes through and an alert fires.

Scope hierarchy

Budgets evaluate in order: agent → service → team → org. The first exhausted budget wins. So a $500/mo agent budget will hard-block even when the parent team budget has plenty of headroom — useful for runaway agent protection.

Pre-request cost estimation

Token counts and per-model pricing are known before the request goes out. The proxy rejects a request that would push you over budget rather than incurring the charge and refusing the response.

Four alert types

  • Budget threshold — fires at 75% (warning) and 90%/100% (critical) of any budget scope.
  • Cost anomaly — fires when cost-per-request deviates >3σ from rolling baseline (e.g. someone shipped a code change that switched gpt-4o-mini → gpt-4o).
  • Model change — fires when a service starts using a model it has never used before.
  • Pricing change — fires when a provider you depend on changes a model's price (see Pricing Change Governance).

Severity

Two levels: warning (informational) and critical (action required). Critical alerts always page; warnings are configurable.

Delivery

In-app · email · webhook. Webhooks are HMAC-signed (see below).

Webhooks

Subscribe to four event types, or use the alert.all wildcard.

  • alert.budget_threshold — any budget threshold crossed
  • alert.cost_anomaly — cost anomaly detected
  • alert.model_change — service switched models
  • alert.pricing_change — provider changed a model price
  • alert.all — receive all of the above

HMAC-SHA256 signature verification

Every webhook payload is signed with HMAC-SHA256 using your endpoint secret. Verify the X-Tolvyn-Signature header before processing. Replay protection via X-Tolvyn-Timestamp with a 5-minute tolerance.

Retry policy with jitter

Failed deliveries (non-2xx response or timeout) are retried with exponential backoff (1s, 4s, 16s, 1m, 5m, 30m) plus uniform jitter to prevent thundering-herd. Six attempts total over ~36 minutes before the event is moved to a dead-letter queue visible in the dashboard.

Invoice Reconciliation

Every month, you receive an invoice from OpenAI, Anthropic, and Google. TOLVYN tells you whether what they billed matches what your applications actually requested.

Upload your invoice

Upload the CSV export from your provider dashboard. TOLVYN parses it, matches each line item against the ledger by model and date, and computes the gap.

Gap = Invoice − TOLVYN

A positive gap means you were billed for usage that didn't pass through the TOLVYN proxy. That's shadow AI: applications using provider keys without your observability. The reconciliation report names the model, the gap amount, and the date range — so you can hunt down the source.

Three-way match

Available on Growth and above: the proxy ledger, the provider invoice, and your customer billing system can be reconciled simultaneously — so you can prove unit economics per customer line by line.

Savings Analyzer

Nightly at 02:00 UTC, TOLVYN runs a savings analysis over the previous 30 days of traffic and recommends concrete, dollar-quantified migrations.

Four rules

  • Small-token requests on premium models — if >80% of your gpt-4o requests use fewer than 500 tokens, gpt-4o-mini is likely the right model.
  • Duplicate prompts — identical prompt signatures should hit cache, not the provider.
  • Underutilized prompt cache — long system prompts that don't trigger Anthropic's cache reuse.
  • Idle models — models you provisioned but barely use; consolidation simplifies governance.

Output

Each recommendation includes: estimated monthly savings, affected service, suggested migration steps, and confidence interval. You can dismiss, defer, or schedule the recommendation directly from the dashboard.

Kill Switch

A panic button for AI traffic. The proxy returns HTTP 451 ("Unavailable For Legal Reasons") for any request matching a kill-switch rule. Checked before budget enforcement, so killing a runaway agent always works even if all budgets are exhausted.

Five scope types

  • Org-wide — stop all AI traffic immediately
  • Provider — stop all traffic to a specific provider
  • Model — stop all traffic to a specific model
  • Service — stop all traffic from a specific service
  • Agent — stop a specific agent (most common)

Activation

Dashboard, CLI (tolvyn kill agent sdr-agent), or API. Activation is logged to the ledger with operator identity. Deactivation requires the same role that activated.

Pricing Change Governance

Provider pricing changes silently. Sometimes a new model is cheaper. Sometimes a model gets repriced upward. Sometimes a model is deprecated and routed to a more expensive replacement. TOLVYN monitors all of this and tells your customers before they're surprised.

Daily scrape

Every day TOLVYN automatically checks the public pricing pages of every supported provider. Diffs are captured, persisted, and surface in the operator console.

Operator approval workflow

A detected change does not auto-update billing. A TOLVYN operator reviews the diff, confirms it's a real change (not a transient page error), and approves. Only then is it propagated to customer accounts.

Customer impact notifications

Any customer whose recent usage included the affected model receives an alert with exact dollar impact estimate: "Your projected monthly cost on gpt-4o-mini goes from $1,240 to $1,615 (+30%) at the new $0.30/M output rate."

AI Cost Index

Opt-in benchmarks from real production AI traffic across TOLVYN customers. Published monthly, free, no signup required. See cost-index.html for the live data.

k-anonymity ≥ 3

A data point is published only if at least three independent tenants contribute to it. Below that threshold, the cell is suppressed.

Opt-out

Opted in by default on signup. Toggle off anytime in account settings. Opted-out data is excluded from the next nightly collection — never persists.

Apache 2.0 data

The published aggregates are released under Apache 2.0. Use them in your own analyses, research, blog posts, or vendor evaluations. Attribution appreciated but not required.

SDKs & Integration

  • Pythontolvyn 0.1.5 — drop-in replacement for openai, anthropic, and Google's GenAI client.
  • Node.jstolvyn 1.0.6 — drop-in replacement for openai and @anthropic-ai/sdk.
  • Gotolvyn-go v0.1.3 — idiomatic Go bindings.
  • CLI — 58 commands across ledger, budgets, alerts, agents, kill-switch, reconciliation, and savings.
  • Direct HTTP — change the base URL, keep your existing request code.

Documentation

Full reference at docs.tolvyn.io including the API reference and the CLI reference.

Start free — 10,000 requests