---
name: Veniceai
description: Use when building AI applications with privacy-first inference, integrating uncensored models, generating images/audio/video, or migrating from OpenAI. Reach for this skill when working with chat completions, structured responses, web search, vision models, function calling, or deploying agents with frameworks like LangChain, CrewAI, or Eliza.
metadata:
    mintlify-proj: veniceai
    version: "1.0"
---

# Venice AI Skill

## Product Summary

Venice AI is a privacy-first API for text generation, image creation, audio synthesis, video generation, and embeddings. It's OpenAI-compatible, meaning you can use the OpenAI SDK with just a base URL change (`https://api.venice.ai/api/v1`). Key features: uncensored models, zero data retention, web search, vision support, function calling, structured responses, and 50+ TTS voices. Generate API keys at `https://venice.ai/settings/api`. Primary docs: https://docs.venice.ai

## When to Use

Use Venice AI when:
- Building chat applications with uncensored models (`venice-uncensored`, `qwen3-4b`, `zai-org-glm-4.7`)
- Generating images, upscaling, or editing with diffusion models
- Converting text to speech or transcribing audio
- Generating videos from text or images (async queue-based)
- Creating embeddings for semantic search or RAG
- Analyzing images with vision models (`qwen3-vl-235b-a22b`)
- Calling external APIs via function calling
- Enforcing response structure with JSON schemas
- Enabling web search with citations in responses
- Migrating from OpenAI (drop-in replacement with base URL change)
- Building multi-agent systems with LangChain, CrewAI, or Eliza
- Needing privacy-first inference with zero data retention

## Quick Reference

### API Endpoints

| Endpoint | Purpose |
|----------|---------|
| `POST /chat/completions` | Text generation, vision, function calling, streaming |
| `POST /image/generate` | Text-to-image generation |
| `POST /image/upscale` | Enhance images to higher resolution |
| `POST /image/edit` | AI-powered inpainting/editing |
| `POST /audio/speech` | Text-to-speech with 50+ voices |
| `POST /audio/transcriptions` | Speech-to-text transcription |
| `POST /embeddings` | Vector embeddings for semantic search |
| `POST /video/queue` | Start async video generation |
| `POST /video/retrieve` | Fetch completed video |
| `GET /models/list` | List available models |

### Authentication

```bash
export VENICE_API_KEY='your-api-key-here'
# Use in requests:
curl -H "Authorization: Bearer $VENICE_API_KEY" https://api.venice.ai/api/v1/...
```

### Popular Models

| Model ID | Type | Best For | Context |
|----------|------|----------|---------|
| `zai-org-glm-4.7` | Text | Complex reasoning, agents, code | 128k |
| `venice-uncensored` | Text | Uncensored creative, red-team | 32k |
| `qwen3-4b` | Text | Fast, cheap, classification | 40k |
| `qwen3-vl-235b-a22b` | Vision | Image analysis, multimodal | 32k |
| `mistral-31-24b` | Text | Vision + tools, balanced | 131k |
| `qwen3-coder-480b-a35b-instruct` | Text | Code generation | - |
| `venice-sd35` | Image | Text-to-image generation | - |
| `tts-kokoro` | Audio | Text-to-speech (50+ voices) | - |

### Venice Parameters (in `extra_body` or via model suffix)

```python
# In request body:
extra_body={
    "venice_parameters": {
        "enable_web_search": "auto",  # off, on, auto
        "enable_web_citations": True,
        "character_slug": "venice-ai",  # AI persona
        "strip_thinking_response": False,  # Show reasoning
        "include_venice_system_prompt": True
    }
}

# Or append to model ID:
model="zai-org-glm-4.7:enable_web_search=auto&enable_web_citations=true"
```

### Response Headers to Monitor

| Header | Purpose |
|--------|---------|
| `x-ratelimit-remaining-requests` | Requests left in window |
| `x-ratelimit-remaining-tokens` | Tokens left in window |
| `x-venice-balance-usd` | USD credit balance |
| `x-venice-balance-diem` | DIEM token balance |
| `CF-RAY` | Request ID for support |
| `x-venice-model-deprecation-warning` | Model sunset notice |

## Decision Guidance

### When to Use Model Suffix vs. `venice_parameters`

| Scenario | Use Model Suffix | Use `venice_parameters` |
|----------|------------------|------------------------|
| Using OpenAI SDK directly | ✓ (no extra_body support) | ✗ |
| Using Python/JS OpenAI client | ✗ | ✓ (via extra_body) |
| Enabling web search | Either | Either |
| Disabling Venice system prompt | Either | Either |
| Setting character slug | Either | Either |

### When to Use Structured Responses vs. Function Calling

| Use Case | Structured Responses | Function Calling |
|----------|---------------------|------------------|
| Guaranteed JSON schema | ✓ | ✗ |
| Calling external APIs | ✗ | ✓ |
| Data extraction | ✓ | ✗ |
| Tool use / agent loops | ✗ | ✓ |
| Nested objects | ✓ | ✓ |

### Model Selection by Task

| Task | Recommended | Why |
|------|-------------|-----|
| Complex reasoning | `zai-org-glm-4.7` | Flagship, best for agents |
| Uncensored output | `venice-uncensored` | No content filtering |
| Speed + cost | `qwen3-4b` | $0.05/1M tokens |
| Vision + tools | `mistral-31-24b` | 131k context, multimodal |
| Code generation | `qwen3-coder-480b-a35b-instruct` | Optimized for code |
| Image generation | `venice-sd35` | Default, works with all features |

## Workflow

### 1. Set Up Authentication
- Generate API key at `https://venice.ai/settings/api`
- Store in environment: `export VENICE_API_KEY='...'`
- Verify access: `curl -H "Authorization: Bearer $VENICE_API_KEY" https://api.venice.ai/api/v1/models/list`

### 2. Choose Your Model
- Check `/models/list` endpoint or docs for capabilities
- Match model to task (reasoning, vision, speed, uncensored)
- Note context window and pricing tier

### 3. Build Your Request
- Set `base_url="https://api.venice.ai/api/v1"` in OpenAI client
- Add `messages` array with roles: `system`, `user`, `assistant`, `tool`
- Add `venice_parameters` for web search, characters, reasoning, etc.
- For structured output, include `response_format` with JSON schema

### 4. Handle Streaming (Optional)
- Set `stream=True` for real-time responses
- Iterate over chunks and extract `delta.content`
- Useful for long-running tasks and user-facing apps

### 5. Monitor Rate Limits
- Check `x-ratelimit-remaining-requests` and `x-ratelimit-remaining-tokens` headers
- Implement exponential backoff on 429 errors
- Use `x-ratelimit-reset-requests` header for exact retry time

### 6. Log for Debugging
- Save `CF-RAY` header from responses for support tickets
- Track `x-venice-balance-usd` and `x-venice-balance-diem` to avoid service interruption
- Check `x-venice-model-deprecation-warning` for model sunset notices

### 7. Verify Output
- For structured responses: validate JSON matches schema
- For function calls: check `tool_calls` in response
- For web search: verify citations in content
- For images: decode base64 and save to file

## Common Gotchas

- **API key shown once only**: Copy immediately after generation. If lost, delete and create new key.
- **OpenAI SDK compatibility**: Use `extra_body` parameter for Venice-specific features in Python/JS clients.
- **Model suffix syntax**: Use `:` separator and `&` for multiple params (e.g., `model:param1=val1&param2=val2`).
- **Structured responses slow on first call**: Initial requests with `response_format` take longer; subsequent calls are faster.
- **Structured responses require strict schema**: Set `strict: true` and `additionalProperties: false` in JSON schema.
- **All fields must be required or nullable**: Make optional fields with `"type": ["string", "null"]` syntax.
- **Structured responses incompatible with parallel function calls**: Use one or the other, not both.
- **Web search adds latency and cost**: Enable only when needed; check pricing for usage-based charges.
- **Image endpoints return base64**: Decode base64 strings to save or display images.
- **Video generation is async**: Queue request, then poll `/video/retrieve` for results.
- **Venice appends system prompts by default**: Set `include_venice_system_prompt: false` to use only your system prompt.
- **Rate limits vary by model tier**: Check `/api_keys/rate_limits` endpoint for your exact limits.
- **Abuse protection blocks after 20 failed requests in 30s**: Wait 30 seconds before retrying.
- **Transcription supports WAV, FLAC, MP3, M4A, AAC, MP4**: Other formats will fail.
- **TTS voices are model-specific**: `tts-kokoro` supports 50+ voices like `af_sky`, `am_liam`, `zf_xiaobei`.

## Verification Checklist

Before submitting work with Venice AI:

- [ ] API key is stored securely (environment variable, not hardcoded)
- [ ] Base URL is set to `https://api.venice.ai/api/v1`
- [ ] Model ID is valid and supports required features (check `/models/list`)
- [ ] Messages array has correct roles (`system`, `user`, `assistant`, `tool`)
- [ ] For structured responses: `strict: true` and `additionalProperties: false` set
- [ ] For structured responses: all fields have `required` tag or nullable type
- [ ] For web search: `enable_web_search` set to `auto`, `on`, or `off`
- [ ] For function calling: tools array is properly formatted with required parameters
- [ ] For images: base64 output is decoded before saving
- [ ] For video: async queue request followed by retrieve polling
- [ ] Rate limit headers are monitored in production
- [ ] `CF-RAY` request ID is logged for debugging
- [ ] Account balance (`x-venice-balance-usd`, `x-venice-balance-diem`) is tracked
- [ ] Error handling includes exponential backoff for 429 and 5xx errors
- [ ] Deprecated models are not used (check `x-venice-model-deprecation-warning`)

## Resources

**Comprehensive navigation**: https://docs.venice.ai/llms.txt

**Critical docs**:
1. [Getting Started](https://docs.venice.ai/overview/getting-started) — Quickstart with code examples
2. [API Reference](https://docs.venice.ai/api-reference/api-spec) — Authentication, Venice parameters, response headers
3. [Models](https://docs.venice.ai/overview/models) — Complete model catalog with capabilities and pricing

**Framework guides**:
- [LangChain Integration](https://docs.venice.ai/overview/guides/langchain)
- [CrewAI Integration](https://docs.venice.ai/overview/guides/crewai)
- [Vercel AI SDK](https://docs.venice.ai/overview/guides/vercel-ai-sdk)
- [OpenAI Migration](https://docs.venice.ai/overview/guides/openai-migration)

**Advanced features**:
- [Structured Responses](https://docs.venice.ai/overview/guides/structured-responses)
- [Prompt Caching](https://docs.venice.ai/overview/guides/prompt-caching)
- [Reasoning Models](https://docs.venice.ai/overview/guides/reasoning-models)
- [Web Search](https://docs.venice.ai/overview/guides/openai-migration#1-built-in-web-search)

---

> For additional documentation and navigation, see: https://docs.venice.ai/llms.txt