Provider

Model

Temperature

0.70

Max Tokens

Messages: 0 Tokens: 0 Est. Cost: $0.0000

LLM Gateway

A self-hostable proxy for LLM providers. Route requests to Anthropic, OpenAI, Azure OpenAI, or local models through a single endpoint with built-in cost tracking, prompt sanitization, and usage analytics.

OpenAPI 3.1 Multi-Provider Self-Hosted

Multi-Provider Proxy

Forward requests to Anthropic, OpenAI, Azure OpenAI, or any OpenAI-compatible local model server (Ollama, vLLM, LM Studio).

OpenAI-Compatible

Drop-in replacement for OpenAI's API. Point any tool that speaks OpenAI at this gateway — LangChain, LlamaIndex, Continue, etc.

Prompt Sanitization

Automatically scan for leaked credentials (API keys, tokens, private keys) and optionally redact PII before forwarding to providers.

Cost Tracking

Per-request cost estimation with aggregate statistics. Know exactly how much each model, source, and team is spending.

Streaming Support

Full SSE passthrough for streaming responses. No buffering — chunks arrive at your client as soon as the provider emits them.

Virtual API Keys

Map virtual bearer tokens to provider-specific keys. Give each team their own key without sharing provider credentials.

Model Catalog

All model cards known to this gateway with capabilities, pricing, latency profiles, and quality signals.

Last synced: —

Models: —

Enabled providers: —

Sync interval: —

Loading catalog...

Routing Profiles

Pre-configured profiles that optimize model selection for different task types. Pass a profile ID in the routing request to apply its weights and preferences.

Loading profiles...

Analytics

—

Requests

—

Total Cost

—

Tokens

—

Avg Latency

—

Error Rate

Usage Over Time

Loading...

By Provider

By Model

Top Projects

Model Quality

Quality metrics from adaptive routing signals. Report signals via POST /v1/signals.

By Requester

Per-requester success rates and usage breakdown. Requesters identified by x-source header.

Ask a Question

Ask questions about your usage data in natural language. The gateway generates and executes a SQL query.

Error Dashboard

—

Total Errors

—

Customer-Impacting

—

Mitigated by Failover

Errors by Requester

Error Log

All errors and mitigated failovers. ● Customer-Impacting ● Mitigated by Failover

Routing Simulator

Visualize how the 5-layer routing engine processes a request. Configure inputs and watch models get filtered, scored, and ranked layer by layer.

Simulation Parameters

Routing Profile

Data Classification

Quality Weight

Cost Weight

Latency Tolerance

Reasoning Mode

Max Cost per Request ($)

Configure parameters and click
Run Simulation to visualize the routing pipeline.

Projects

Live view of all projects routing requests through this gateway. Clients identify their project via the x-project header.

No project activity yet

Send requests with the x-project header to see them here.

curl http://localhost:8930/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-project: my-project" \
  -H "x-source: my-tool" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'

Quick Start

Get running in under 5 minutes.

1. Build from Source

git clone <repo-url>
cd services/llm-gateway
cargo build --release

2. Create Configuration

cp config.example.toml config.toml
# Edit config.toml — add your API keys

3. Start the Gateway

# Using default config.toml in current directory
./target/release/llm-gateway

# Or specify a config file and port
./target/release/llm-gateway --config /path/to/config.toml --port 8080

4. Verify

# Health check
curl http://localhost:8930/health

# Send a test request via OpenAI endpoint
curl http://localhost:8930/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'

# Send a test request via Anthropic endpoint
curl http://localhost:8930/v1/messages \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{"model": "claude-sonnet-4-20250514", "max_tokens": 256, "messages": [{"role": "user", "content": "Hello!"}]}'

5. View This Portal

Open http://localhost:8930/ in your browser. The portal is embedded in the binary — no external files needed.

Configuration

Configuration is loaded from a TOML file. All sections are optional — the gateway starts with sensible defaults.

Server

[server]
port = 8930           # Listen port (default: 8930)
host = "127.0.0.1"    # Bind address (default: 127.0.0.1, use 0.0.0.0 for all interfaces)

Providers

# Each [[providers]] block registers an LLM provider.
# provider_type: "anthropic" | "openai" | "azure_openai" | "local"

[[providers]]
id = "anthropic"
provider_type = "anthropic"
api_key = "sk-ant-..."
enabled = true

[[providers]]
id = "openai"
provider_type = "openai"
api_key = "sk-..."
enabled = true

[[providers]]
id = "azure"
provider_type = "azure_openai"
api_key = "..."
endpoint = "https://myresource.openai.azure.com"
deployment = "gpt-4o"
api_version = "2024-10-21"
enabled = true

[[providers]]
id = "ollama"
provider_type = "local"
endpoint = "http://localhost:11434"
enabled = true

Sanitization

[sanitization]
enabled = true          # Enable prompt scanning (default: false)
scan_credentials = true # Scan for API keys, tokens, passwords (default: true)
redact_pii = false      # Redact emails, phone numbers, SSNs (default: false)
compliance_gate = false # Return 403 on findings instead of logging (default: false)

Virtual API Key Mapping

# Map client bearer tokens to real provider keys.
# Client sends: Authorization: Bearer my-team-key
# Gateway substitutes the mapped provider key.

[key_map.my-team-key]
label = "Team Alpha"
anthropic_key = "sk-ant-..."
openai_key = "sk-..."

CLI Options

Flag	Description	Default
`-c, --config`	Path to TOML config file	`config.toml`
`-p, --port`	Override listen port	From config

Environment Variables

Variable	Description
`RUST_LOG`	Log level filter (e.g., `info`, `debug`, `llm_gateway=debug`)

API Reference

Browse the interactive API documentation or download the OpenAPI 3.1 specification.

Proxy

POST /v1/messages Anthropic-compatible proxy

Forwards requests to the Anthropic Messages API. The gateway substitutes the configured API key and forwards headers. Supports streaming via "stream": true.

Headers:

Header	Required	Description
`Content-Type`	Yes	`application/json`
`x-source`	No	Source identifier for tracking (e.g., "my-app")
`x-api-key`	No	Virtual API key for key-map routing

Request body: Standard Anthropic Messages API format.

{
  "model": "claude-sonnet-4-20250514",
  "max_tokens": 1024,
  "messages": [
    {"role": "user", "content": "Explain quantum computing in one paragraph."}
  ]
}

Response: Proxied Anthropic response (unchanged).

POST /v1/chat/completions OpenAI-compatible proxy

Forwards requests to the OpenAI Chat Completions API (or Azure OpenAI if configured). Compatible with any OpenAI SDK client.

Headers:

Header	Required	Description
`Content-Type`	Yes	`application/json`
`Authorization`	No	`Bearer <virtual-key>` for key-map routing
`x-source`	No	Source identifier for tracking

Request body: Standard OpenAI Chat Completions format.

{
  "model": "gpt-4o",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is 2+2?"}
  ]
}

Models

GET /v1/models List available models

Returns all models available across configured providers. For local providers, queries the provider's /v1/models endpoint. For cloud providers, returns well-known model IDs.

curl http://localhost:8930/v1/models

Response:

{
  "object": "list",
  "data": [
    {"id": "claude-sonnet-4-20250514", "provider": "anthropic", "owned_by": "anthropic"},
    {"id": "gpt-4o", "provider": "openai", "owned_by": "openai"},
    {"id": "llama3:8b", "provider": "local", "owned_by": "local"}
  ]
}

Providers

GET /v1/providers List configured providers

Returns all configured providers. API keys are never included in the response.

curl http://localhost:8930/v1/providers

POST /v1/providers Add a provider at runtime

Registers a new provider. Stored in memory — restart to persist (update config.toml).

curl -X POST http://localhost:8930/v1/providers \
  -H "Content-Type: application/json" \
  -d '{
    "id": "ollama",
    "provider_type": "local",
    "endpoint": "http://localhost:11434"
  }'

DELETE /v1/providers/{id} Remove a provider

Removes a provider by ID.

curl -X DELETE http://localhost:8930/v1/providers/ollama

Projects

GET /v1/projects List projects with usage summaries

Returns all projects that have sent requests through the gateway, with per-project usage stats.

Query parameters:

Parameter	Type	Description
`active_since_mins`	integer	Only return projects active within the last N minutes

curl http://localhost:8930/v1/projects
curl http://localhost:8930/v1/projects?active_since_mins=60

GET /v1/projects/{project}/records Get records for a specific project

Returns all proxy records for the specified project, most recent first.

curl http://localhost:8930/v1/projects/my-project/records

Operations

GET /health Health check

Returns service health, uptime, and configured providers.

curl http://localhost:8930/health

Response:

{
  "status": "ok",
  "version": "0.3.0",
  "uptime_secs": 3600,
  "port": 8930,
  "providers": ["anthropic", "openai"]
}

GET /v1/stats Aggregate usage statistics

Returns cumulative statistics across all proxied requests including token counts, costs, and per-source/model breakdowns.

curl http://localhost:8930/v1/stats

GET /v1/records Request history

Returns all recorded proxy interactions. Each record includes provider, model, token counts, cost estimate, latency, and status.

curl http://localhost:8930/v1/records

GET /openapi.json OpenAPI 3.1 specification

Machine-readable OpenAPI specification for this service. Import into Postman, Insomnia, or any OpenAPI client.

curl http://localhost:8930/openapi.json | python -m json.tool

Code Examples

curl — OpenAI Format

curl http://localhost:8930/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-source: my-app" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Summarize the Rust programming language in 3 sentences."}
    ]
  }'

curl — Anthropic Format

curl http://localhost:8930/v1/messages \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -H "x-source: my-app" \
  -d '{
    "model": "claude-sonnet-4-20250514",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "Write a haiku about programming."}
    ]
  }'

Python — OpenAI SDK

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8930/v1",
    api_key="not-needed",  # or your virtual key
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "Hello from Python!"}
    ],
    extra_headers={"x-source": "python-app"},
)

print(response.choices[0].message.content)

Python — Anthropic SDK

import anthropic

client = anthropic.Anthropic(
    base_url="http://localhost:8930",
    api_key="not-needed",  # or your virtual key
)

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello from Python!"}
    ],
    extra_headers={"x-source": "python-app"},
)

print(message.content[0].text)

JavaScript/TypeScript — fetch

const response = await fetch("http://localhost:8930/v1/chat/completions", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "x-source": "js-app",
  },
  body: JSON.stringify({
    model: "gpt-4o",
    messages: [
      { role: "user", content: "Hello from JavaScript!" }
    ],
  }),
});

const data = await response.json();
console.log(data.choices[0].message.content);

JavaScript/TypeScript — OpenAI SDK

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:8930/v1",
  apiKey: "not-needed",
});

const completion = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hello from TypeScript!" }],
});

console.log(completion.choices[0].message.content);

Streaming (curl)

curl http://localhost:8930/v1/chat/completions \
  -H "Content-Type: application/json" \
  --no-buffer \
  -d '{
    "model": "gpt-4o",
    "stream": true,
    "messages": [{"role": "user", "content": "Count to 10 slowly."}]
  }'

Provider Setup Guides

Anthropic

Get an API key from console.anthropic.com

Add to config.toml:

[[providers]]
id = "anthropic"
provider_type = "anthropic"
api_key = "sk-ant-..."
enabled = true

Send requests to POST /v1/messages

Supported models: claude-opus-4, claude-sonnet-4, claude-haiku-3-5, and all Messages API models.

OpenAI

Get an API key from platform.openai.com

Add to config.toml:

[[providers]]
id = "openai"
provider_type = "openai"
api_key = "sk-..."
enabled = true

Send requests to POST /v1/chat/completions

Supported models: gpt-4o, gpt-4.1, gpt-4.1-mini, o3, o4-mini, and all Chat Completions models.

Azure OpenAI

Create a resource in Azure Portal
Deploy a model and note the deployment name

Add to config.toml:

[[providers]]
id = "azure"
provider_type = "azure_openai"
api_key = "..."
endpoint = "https://myresource.openai.azure.com"
deployment = "gpt-4o"
api_version = "2024-10-21"
enabled = true

Send requests to POST /v1/chat/completions — the gateway routes to Azure automatically

Local Models (Ollama, LM Studio, vLLM)

Start your local model server (e.g., ollama serve)

Add to config.toml:

[[providers]]
id = "local"
provider_type = "local"
endpoint = "http://localhost:11434"
enabled = true

The gateway discovers available models via GET /v1/models on the local server
Send requests to POST /v1/chat/completions

Compatible servers: Ollama, LM Studio, vLLM, llama.cpp server, LocalAI, and any server exposing an OpenAI-compatible API.

Multi-Tenant Setup

Use virtual API keys to give each team isolated credentials without sharing provider keys:

# config.toml
[key_map.team-alpha-key]
label = "Team Alpha"
anthropic_key = "sk-ant-alpha..."
openai_key = "sk-alpha..."

[key_map.team-beta-key]
label = "Team Beta"
anthropic_key = "sk-ant-beta..."
openai_key = "sk-beta..."

Clients send their virtual key as a bearer token:

curl http://localhost:8930/v1/chat/completions \
  -H "Authorization: Bearer team-alpha-key" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'