Pay Per Token
No subscription. Pay only for what you use
Claude Models
Access Opus 4.5 and Sonnet 4.5 through Venice
Prompt Caching
Venice caching works alongside Claude Code
Why You Need a Router
Claude Code connects directly to Anthropic’s API by default. To use it with Venice, you need claude-code-router, an open-source local proxy that:Intercepts
Catches Claude Code’s outgoing requests before they reach Anthropic
Transforms
Converts request format and maps model IDs (e.g.,
claude-opus-45)Redirects
Forwards requests to Venice at
api.venice.ai/api/v1/chat/completionsPrerequisites
Setup
1
Install Claude Code
If you haven’t already, install Anthropic’s Claude Code CLI:
2
Install the Router
3
Get Your API Key
Generate a key from venice.ai/settings/api. You’ll paste it directly in the config file in the next step.
4
Create Configuration
Create the config directory:Then create Paste the following configuration:
~/.claude-code-router/config.json with your preferred editor:If you modify
config.json while the router is running, restart it with ccr restart to apply changes.5
Launch
Start the router, then Claude Code:Or use the activation method:
Supported Models
| Model | Venice ID | Best For |
|---|---|---|
| Claude Opus 4.5 | claude-opus-45 | Complex reasoning, large refactors |
| Claude Sonnet 4.5 | claude-sonnet-45 | Fast iteration, everyday coding |
Router Features
The router provides several useful features beyond basic routing:Switch models on the fly
Switch models on the fly
Use the Useful when you want Opus for complex tasks and Sonnet for quick iterations.
/model command inside Claude Code to switch models without restarting:Visual configuration with UI mode
Visual configuration with UI mode
Prefer a GUI? Launch the web-based config editor:This opens a browser interface for editing your
config.json without touching the file directly.Router scenarios explained
Router scenarios explained
The
You can route different scenarios to different models. For example, use Sonnet for background tasks to save costs.
Router config section controls which model handles different task types:| Scenario | When it’s used |
|---|---|
default | General requests |
think | Reasoning-heavy tasks (Plan Mode) |
background | Background operations |
longContext | When context exceeds longContextThreshold tokens |
Debugging with logs
Debugging with logs
If something isn’t working, check the logs:Set
"LOG_LEVEL": "debug" in your config for more verbose output.Caching Behavior
Venice prompt caching works alongside Claude Code’s native cache markers. Venice automatically detects when Claude Code sendscache_control fields and adjusts its caching strategy accordingly.
| Scenario | Cache TTL | Who Controls |
|---|---|---|
| Default (recommended) | 5 minutes | Claude Code + Venice |
With cleancache transformer | 1 hour | Venice only |
When NOT to use cleancache (most users)
When NOT to use cleancache (most users)
The default configuration lets both systems cooperate:
- Claude Code sends its native
cache_controlmarkers - Venice adds caching around them with a 5-minute TTL
- Both systems share the 4-block cache limit
When to use cleancache
When to use cleancache
Add This strips Claude Code’s cache markers, giving Venice full control with a longer TTL.
cleancache to the transformer if you:- Are hitting the 4-block cache limit errors
- Experience strange caching behavior
- Prefer Venice’s 1-hour TTL for longer sessions