A self-hostable proxy for LLM providers. Route requests to Anthropic, OpenAI, Azure OpenAI, or local models through a single endpoint with built-in cost tracking, prompt sanitization, and usage analytics.
Forward requests to Anthropic, OpenAI, Azure OpenAI, or any OpenAI-compatible local model server (Ollama, vLLM, LM Studio).
Drop-in replacement for OpenAI's API. Point any tool that speaks OpenAI at this gateway — LangChain, LlamaIndex, Continue, etc.
Automatically scan for leaked credentials (API keys, tokens, private keys) and optionally redact PII before forwarding to providers.
Per-request cost estimation with aggregate statistics. Know exactly how much each model, source, and team is spending.
Full SSE passthrough for streaming responses. No buffering — chunks arrive at your client as soon as the provider emits them.
Map virtual bearer tokens to provider-specific keys. Give each team their own key without sharing provider credentials.
All model cards known to this gateway with capabilities, pricing, latency profiles, and quality signals.
Loading catalog...
Pre-configured profiles that optimize model selection for different task types. Pass a profile ID in the routing request to apply its weights and preferences.
Loading profiles...
Quality metrics from adaptive routing signals. Report signals via POST /v1/signals.
Per-requester success rates and usage breakdown. Requesters identified by x-source header.
Ask questions about your usage data in natural language. The gateway generates and executes a SQL query.
All errors and mitigated failovers. ● Customer-Impacting ● Mitigated by Failover
Visualize how the 5-layer routing engine processes a request. Configure inputs and watch models get filtered, scored, and ranked layer by layer.
Live view of all projects routing requests through this gateway. Clients identify their project via the x-project header.
Send requests with the x-project header to see them here.
curl http://localhost:8930/v1/chat/completions \
-H "Content-Type: application/json" \
-H "x-project: my-project" \
-H "x-source: my-tool" \
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'
Get running in under 5 minutes.
git clone <repo-url>
cd services/llm-gateway
cargo build --release
cp config.example.toml config.toml
# Edit config.toml — add your API keys
# Using default config.toml in current directory
./target/release/llm-gateway
# Or specify a config file and port
./target/release/llm-gateway --config /path/to/config.toml --port 8080
# Health check
curl http://localhost:8930/health
# Send a test request via OpenAI endpoint
curl http://localhost:8930/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'
# Send a test request via Anthropic endpoint
curl http://localhost:8930/v1/messages \
-H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{"model": "claude-sonnet-4-20250514", "max_tokens": 256, "messages": [{"role": "user", "content": "Hello!"}]}'
Open http://localhost:8930/ in your browser. The portal is embedded in the binary — no external files needed.
Configuration is loaded from a TOML file. All sections are optional — the gateway starts with sensible defaults.
[server]
port = 8930 # Listen port (default: 8930)
host = "127.0.0.1" # Bind address (default: 127.0.0.1, use 0.0.0.0 for all interfaces)
# Each [[providers]] block registers an LLM provider.
# provider_type: "anthropic" | "openai" | "azure_openai" | "local"
[[providers]]
id = "anthropic"
provider_type = "anthropic"
api_key = "sk-ant-..."
enabled = true
[[providers]]
id = "openai"
provider_type = "openai"
api_key = "sk-..."
enabled = true
[[providers]]
id = "azure"
provider_type = "azure_openai"
api_key = "..."
endpoint = "https://myresource.openai.azure.com"
deployment = "gpt-4o"
api_version = "2024-10-21"
enabled = true
[[providers]]
id = "ollama"
provider_type = "local"
endpoint = "http://localhost:11434"
enabled = true
[sanitization]
enabled = true # Enable prompt scanning (default: false)
scan_credentials = true # Scan for API keys, tokens, passwords (default: true)
redact_pii = false # Redact emails, phone numbers, SSNs (default: false)
compliance_gate = false # Return 403 on findings instead of logging (default: false)
# Map client bearer tokens to real provider keys.
# Client sends: Authorization: Bearer my-team-key
# Gateway substitutes the mapped provider key.
[key_map.my-team-key]
label = "Team Alpha"
anthropic_key = "sk-ant-..."
openai_key = "sk-..."
| Flag | Description | Default |
|---|---|---|
-c, --config | Path to TOML config file | config.toml |
-p, --port | Override listen port | From config |
| Variable | Description |
|---|---|
RUST_LOG | Log level filter (e.g., info, debug, llm_gateway=debug) |
Browse the interactive API documentation or download the OpenAPI 3.1 specification.
Forwards requests to the Anthropic Messages API. The gateway substitutes the configured API key and forwards headers. Supports streaming via "stream": true.
Headers:
| Header | Required | Description |
|---|---|---|
Content-Type | Yes | application/json |
x-source | No | Source identifier for tracking (e.g., "my-app") |
x-api-key | No | Virtual API key for key-map routing |
Request body: Standard Anthropic Messages API format.
{
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Explain quantum computing in one paragraph."}
]
}
Response: Proxied Anthropic response (unchanged).
Forwards requests to the OpenAI Chat Completions API (or Azure OpenAI if configured). Compatible with any OpenAI SDK client.
Headers:
| Header | Required | Description |
|---|---|---|
Content-Type | Yes | application/json |
Authorization | No | Bearer <virtual-key> for key-map routing |
x-source | No | Source identifier for tracking |
Request body: Standard OpenAI Chat Completions format.
{
"model": "gpt-4o",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is 2+2?"}
]
}
Returns all models available across configured providers. For local providers, queries the provider's /v1/models endpoint. For cloud providers, returns well-known model IDs.
curl http://localhost:8930/v1/models
Response:
{
"object": "list",
"data": [
{"id": "claude-sonnet-4-20250514", "provider": "anthropic", "owned_by": "anthropic"},
{"id": "gpt-4o", "provider": "openai", "owned_by": "openai"},
{"id": "llama3:8b", "provider": "local", "owned_by": "local"}
]
}
Returns all configured providers. API keys are never included in the response.
curl http://localhost:8930/v1/providers
Registers a new provider. Stored in memory — restart to persist (update config.toml).
curl -X POST http://localhost:8930/v1/providers \
-H "Content-Type: application/json" \
-d '{
"id": "ollama",
"provider_type": "local",
"endpoint": "http://localhost:11434"
}'
Removes a provider by ID.
curl -X DELETE http://localhost:8930/v1/providers/ollama
Returns all projects that have sent requests through the gateway, with per-project usage stats.
Query parameters:
| Parameter | Type | Description |
|---|---|---|
active_since_mins | integer | Only return projects active within the last N minutes |
curl http://localhost:8930/v1/projects
curl http://localhost:8930/v1/projects?active_since_mins=60
Returns all proxy records for the specified project, most recent first.
curl http://localhost:8930/v1/projects/my-project/records
Returns service health, uptime, and configured providers.
curl http://localhost:8930/health
Response:
{
"status": "ok",
"version": "0.3.0",
"uptime_secs": 3600,
"port": 8930,
"providers": ["anthropic", "openai"]
}
Returns cumulative statistics across all proxied requests including token counts, costs, and per-source/model breakdowns.
curl http://localhost:8930/v1/stats
Returns all recorded proxy interactions. Each record includes provider, model, token counts, cost estimate, latency, and status.
curl http://localhost:8930/v1/records
Machine-readable OpenAPI specification for this service. Import into Postman, Insomnia, or any OpenAPI client.
curl http://localhost:8930/openapi.json | python -m json.tool
curl http://localhost:8930/v1/chat/completions \
-H "Content-Type: application/json" \
-H "x-source: my-app" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Summarize the Rust programming language in 3 sentences."}
]
}'
curl http://localhost:8930/v1/messages \
-H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-H "x-source: my-app" \
-d '{
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Write a haiku about programming."}
]
}'
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8930/v1",
api_key="not-needed", # or your virtual key
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "user", "content": "Hello from Python!"}
],
extra_headers={"x-source": "python-app"},
)
print(response.choices[0].message.content)
import anthropic
client = anthropic.Anthropic(
base_url="http://localhost:8930",
api_key="not-needed", # or your virtual key
)
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello from Python!"}
],
extra_headers={"x-source": "python-app"},
)
print(message.content[0].text)
const response = await fetch("http://localhost:8930/v1/chat/completions", {
method: "POST",
headers: {
"Content-Type": "application/json",
"x-source": "js-app",
},
body: JSON.stringify({
model: "gpt-4o",
messages: [
{ role: "user", content: "Hello from JavaScript!" }
],
}),
});
const data = await response.json();
console.log(data.choices[0].message.content);
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:8930/v1",
apiKey: "not-needed",
});
const completion = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Hello from TypeScript!" }],
});
console.log(completion.choices[0].message.content);
curl http://localhost:8930/v1/chat/completions \
-H "Content-Type: application/json" \
--no-buffer \
-d '{
"model": "gpt-4o",
"stream": true,
"messages": [{"role": "user", "content": "Count to 10 slowly."}]
}'
[[providers]]
id = "anthropic"
provider_type = "anthropic"
api_key = "sk-ant-..."
enabled = true
POST /v1/messagesSupported models: claude-opus-4, claude-sonnet-4, claude-haiku-3-5, and all Messages API models.
[[providers]]
id = "openai"
provider_type = "openai"
api_key = "sk-..."
enabled = true
POST /v1/chat/completionsSupported models: gpt-4o, gpt-4.1, gpt-4.1-mini, o3, o4-mini, and all Chat Completions models.
[[providers]]
id = "azure"
provider_type = "azure_openai"
api_key = "..."
endpoint = "https://myresource.openai.azure.com"
deployment = "gpt-4o"
api_version = "2024-10-21"
enabled = true
POST /v1/chat/completions — the gateway routes to Azure automaticallyollama serve)[[providers]]
id = "local"
provider_type = "local"
endpoint = "http://localhost:11434"
enabled = true
GET /v1/models on the local serverPOST /v1/chat/completionsCompatible servers: Ollama, LM Studio, vLLM, llama.cpp server, LocalAI, and any server exposing an OpenAI-compatible API.
Use virtual API keys to give each team isolated credentials without sharing provider keys:
# config.toml
[key_map.team-alpha-key]
label = "Team Alpha"
anthropic_key = "sk-ant-alpha..."
openai_key = "sk-alpha..."
[key_map.team-beta-key]
label = "Team Beta"
anthropic_key = "sk-ant-beta..."
openai_key = "sk-beta..."
Clients send their virtual key as a bearer token:
curl http://localhost:8930/v1/chat/completions \
-H "Authorization: Bearer team-alpha-key" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'