API Reference | Venice API Docs

Venice API는 검열되지 않은 모델과 비공개 추론으로 AI 애플리케이션을 구축하기 위한 HTTP 기반 REST 및 스트리밍 인터페이스를 제공합니다. 텍스트 생성, 이미지 생성, embeddings 등을 모두 제한적인 콘텐츠 정책 없이 만들 수 있습니다. 통합 예제와 SDK는 문서에서 사용할 수 있습니다. API 참조는 OpenAPI YAML spec으로도 제공됩니다.

인증

Venice API는 인증을 위해 API 키를 사용합니다. API 설정에서 API 키를 생성하고 관리하세요. 모든 API 요청에는 HTTP Bearer 인증이 필요합니다:

Authorization: Bearer VENICE_API_KEY

API 키는 비밀입니다. 공유하거나 클라이언트 측 코드에 노출하지 마세요.

OpenAI 호환성

Venice의 API는 OpenAI API 사양을 구현하여 기존 OpenAI 클라이언트 및 도구와의 호환성을 보장합니다. 이를 통해 익숙한 OpenAI 인터페이스를 사용하여 Venice와 통합하면서 Venice의 고유한 기능과 검열되지 않은 모델에 액세스할 수 있습니다.

설정

Venice의 base URL(https://api.venice.ai/api/v1)을 사용하도록 클라이언트를 구성하고 첫 번째 요청을 보내세요:

curl https://api.venice.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "venice-uncensored",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.VENICE_API_KEY,
  baseURL: "https://api.venice.ai/api/v1",
});

const response = await client.chat.completions.create({
  model: "venice-uncensored",
  messages: [{ role: "user", content: "Hello!" }]
});

console.log(response.choices[0].message.content);

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("VENICE_API_KEY"),
    base_url="https://api.venice.ai/api/v1"
)

response = client.chat.completions.create(
    model="venice-uncensored",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

Venice 전용 기능

시스템 프롬프트

Venice는 검열되지 않은 자연스러운 모델 응답을 보장하도록 설계된 기본 시스템 프롬프트를 제공합니다. 시스템 프롬프트 처리에는 두 가지 옵션이 있습니다:

기본 동작: 시스템 프롬프트가 Venice의 기본값에 추가됨
사용자 지정 동작: Venice의 시스템 프롬프트를 완전히 비활성화

Venice 시스템 프롬프트 비활성화

Venice의 기본 시스템 프롬프트를 제거하려면 venice_parameters 옵션을 사용하세요:

curl https://api.venice.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "venice-uncensored",
    "messages": [
      {"role": "system", "content": "Your custom system prompt"},
      {"role": "user", "content": "Why is the sky blue?"}
    ],
    "venice_parameters": {
      "include_venice_system_prompt": false
    }
  }'

const completion = await client.chat.completions.create({
  model: "venice-uncensored",
  messages: [
    {
      role: "system",
      content: "Your custom system prompt",
    },
    {
      role: "user",
      content: "Why is the sky blue?",
    },
  ],
  venice_parameters: {
    include_venice_system_prompt: false,
  },
});

response = client.chat.completions.create(
    model="venice-uncensored",
    messages=[
        {"role": "system", "content": "Your custom system prompt"},
        {"role": "user", "content": "Why is the sky blue?"}
    ],
    extra_body={
        "venice_parameters": {
            "include_venice_system_prompt": False
        }
    }
)

Venice 매개변수

venice_parameters 객체를 사용하면 표준 OpenAI API에서 사용할 수 없는 Venice 전용 기능에 액세스할 수 있습니다:

매개변수	유형	설명	기본값
`character_slug`	string	공개 Venice 캐릭터의 캐릭터 슬러그 (게시된 캐릭터 페이지에서 “Public ID”로 검색 가능)	-
`strip_thinking_response`	boolean	응답에서 `<think></think>` 블록 제거 (레거시 `<think>` 태그 형식을 사용하는 모델). 추론 모델 참조.	`false`
`disable_thinking`	boolean	지원되는 추론 모델에서 사고를 비활성화하고 응답에서 `<think></think>` 블록 제거	`false`
`enable_web_search`	string	이 요청에 대해 웹 검색 활성화 (`off`, `on`, `auto` - auto는 모델 재량에 따라 활성화) 추가 사용량 기반 가격 적용, 가격 참조.	`off`
`enable_web_scraping`	boolean	사용자 메시지에서 감지된 최대 5개의 URL에 대해 웹 스크래핑 활성화. 스크래핑된 콘텐츠는 응답을 보강하고 웹 검색을 우회합니다. 성공적으로 스크래핑된 URL만 청구됩니다. 추가 사용량 기반 가격 적용, 가격 참조.	`false`
`enable_x_search`	boolean	지원되는 Grok 모델(예: `grok-4-20-beta`)에 대해 xAI의 네이티브 검색(web + X/Twitter) 활성화. xAI의 검색 인프라를 사용하여 더 높은 품질의 검색 결과를 제공합니다. 활성화되면 Venice의 표준 웹 검색이 우회됩니다. 추가 사용량 기반 가격 적용, 가격 참조.	`false`
`enable_web_citations`	boolean	웹 검색이 활성화된 경우, LLM이 `[REF]0[/REF]` 형식을 사용하여 출처를 인용하도록 요청	`false`
`include_search_results_in_stream`	boolean	실험적: 첫 번째 방출 청크로 스트림에 검색 결과 포함	`false`
`return_search_results_as_documents`	boolean	LangChain 통합을 위해 `venice_web_search_documents`라는 OpenAI 호환 도구 호출에 검색 결과 노출	`false`
`include_venice_system_prompt`	boolean	지정된 시스템 프롬프트와 함께 Venice의 기본 시스템 프롬프트를 포함할지 여부	`true`

이러한 매개변수는 모델 이름에 추가된 모델 접미사로도 지정할 수 있습니다 (예: zai-org-glm-5:enable_web_search=auto). 자세한 내용은 모델 기능 접미사를 참조하세요.

프롬프트 캐싱

Venice는 반복되는 콘텐츠에 대한 지연 시간과 비용을 줄이기 위해 선택된 모델에서 프롬프트 캐싱을 지원합니다. 지원되는 모델의 경우 Venice는 시스템 프롬프트를 자동으로 캐시합니다 — 코드 변경이 필요하지 않습니다. 메시지 콘텐츠의 cache_control 속성을 사용하여 캐싱할 콘텐츠를 수동으로 표시할 수도 있습니다.

매개변수	유형	설명
`prompt_cache_key`	string	캐시 히트율을 개선하기 위한 선택적 라우팅 힌트. 제공되면 Venice는 요청을 동일한 백엔드 인프라로 라우팅하여 다중 턴 대화에서 캐시 히트 가능성을 높입니다.

캐싱 작동 방식, 청구 및 모범 사례에 대한 자세한 내용은 프롬프트 캐싱을 참조하세요.

응답 헤더 참조

모든 Venice API 응답에는 요청, 속도 제한, 모델 정보 및 계정 잔액에 대한 메타데이터를 제공하는 HTTP 헤더가 포함됩니다. API 응답에서 반환된 오류 코드 외에도 이러한 헤더를 검사하여 특정 API 요청의 고유 ID를 얻고, 속도 제한을 모니터링하고, 계정 잔액을 추적할 수 있습니다. Venice는 필요한 경우 지원팀과 보다 효율적인 문제 해결을 위해 프로덕션 배포에서 요청 ID(CF-RAY 헤더)를 기록할 것을 권장합니다. 아래 표는 발생할 수 있는 모든 헤더에 대한 포괄적인 참조를 제공합니다:

Header	Type	Purpose	When Returned
Standard HTTP Headers
`Content-Type`	string	응답 본문의 MIME 유형 (`application/json`, `text/csv`, `image/png`, 등)	Always
`Content-Encoding`	string	응답 본문 압축에 사용된 인코딩 (`gzip`, `br`)	When client sends `Accept-Encoding` header
`Content-Disposition`	string	콘텐츠 표시 방법 (예: `attachment; filename=export.csv`)	When downloading files or exports
`Date`	string	RFC 7231 형식의 응답 생성 타임스탬프	Always
Request Identification
`CF-RAY`	string	이 API 요청의 고유 식별자, 문제 해결 및 지원 요청에 사용	Always
`x-venice-version`	string	Venice API 서비스의 현재 버전/리비전 (예: `20250828.222653`)	Always
`x-venice-timestamp`	string	요청이 처리된 서버 타임스탬프 (ISO 8601 형식)	When timestamp tracking is enabled
`x-venice-host-name`	string	요청을 처리한 서버의 호스트명	Error responses and debugging scenarios
Model Information
`x-venice-model-id`	string	요청에 사용된 AI 모델의 고유 식별자 (예: `venice-01-lite`)	Inference endpoints using AI models
`x-venice-model-name`	string	사용된 AI 모델의 친숙한/표시 이름 (예: `Venice Lite`)	Inference endpoints using AI models
`x-venice-model-router`	string	모델 추론을 처리한 라우터/백엔드 서비스	Inference endpoints when routing info available
`x-venice-model-deprecation-warning`	string	지원 중단 예정 모델에 대한 경고 메시지	When using a deprecated model
`x-venice-model-deprecation-date`	string	모델이 지원 중단되는 날짜 (ISO 8601 날짜)	When using a deprecated model
Rate Limiting Information
`x-ratelimit-limit-requests`	number	현재 시간 윈도우에서 허용된 최대 요청 수	All authenticated requests
`x-ratelimit-remaining-requests`	number	현재 시간 윈도우에서 남은 요청 수	All authenticated requests
`x-ratelimit-reset-requests`	number	요청 속도 제한이 재설정되는 Unix 타임스탬프	All authenticated requests
`x-ratelimit-limit-tokens`	number	시간 윈도우에서 허용된 최대 토큰 수 (프롬프트 + 완료)	All authenticated requests
`x-ratelimit-remaining-tokens`	number	현재 시간 윈도우에서 남은 토큰 수	All authenticated requests
`x-ratelimit-reset-tokens`	number	토큰 속도 제한이 재설정될 때까지의 초	All authenticated requests
`x-ratelimit-type`	string	적용된 속도 제한 유형 (`user`, `api_key`, `global`)	When rate limiting is enforced
Pagination Headers
`x-pagination-limit`	number	페이지당 항목 수	Paginated endpoints
`x-pagination-page`	number	현재 페이지 번호 (1 기반)	Paginated endpoints
`x-pagination-total`	number	모든 페이지의 총 항목 수	Paginated endpoints
`x-pagination-total-pages`	number	총 페이지 수	Paginated endpoints
Account Balance Information
`x-venice-balance-diem`	string	요청 처리 전 DIEM 토큰 잔액	All authenticated requests
`x-venice-balance-usd`	string	요청 처리 전 USD 크레딧 잔액	All authenticated requests
Content Safety Headers
`x-venice-is-blurred`	string	콘텐츠 정책으로 인해 생성된 이미지가 흐려졌는지 여부 (`true`/`false`)	Image generation with Safe Venice enabled
`x-venice-is-content-violation`	string	콘텐츠가 Venice의 콘텐츠 정책을 위반하는지 여부 (`true`/`false`)	Content generation endpoints
`x-venice-is-adult-model-content-violation`	string	콘텐츠가 성인 모델 콘텐츠 정책을 위반하는지 여부 (`true`/`false`)	Image generation endpoints
`x-venice-contains-minor`	string	이미지에 미성년자가 포함되어 있는지 여부 (`true`/`false`)	Image analysis endpoints with age detection
Client Information
`x-venice-middleface-version`	string	Venice middleface 클라이언트 버전	Requests from Venice middleface clients
`x-venice-mobile-version`	string	Venice 모바일 앱 클라이언트 버전	Requests from mobile applications
`x-venice-request-timestamp-ms`	number	클라이언트 제공 요청 타임스탬프 (밀리초)	When client provides timestamp in request
`x-venice-control-instance`	string	디버깅용 제어 인스턴스 식별자	Image generation endpoints for debugging
Authentication Headers
`x-auth-refreshed`	string	요청 중에 인증 토큰이 새로 고침되었는지 여부 (`true`/`false`)	When authentication tokens are auto-refreshed
`x-retry-count`	number	요청의 재시도 횟수	When request retries occur

중요 참고 사항

헤더 이름 대소문자: HTTP 헤더는 대소문자를 구분하지 않지만 Venice는 일관성을 위해 소문자와 하이픈을 사용합니다
문자열 값: 헤더의 부울 값은 문자열 ("true" 또는 "false")로 반환됩니다
숫자 값: 정밀도 손실을 방지하기 위해 큰 숫자와 잔액 값은 문자열로 반환될 수 있습니다
선택적 헤더: 모든 헤더가 모든 응답에 반환되는 것은 아닙니다; 존재 여부는 엔드포인트와 요청 컨텍스트에 따라 다릅니다
압축: 지원되는 경우 압축된 응답을 받으려면 요청에 Accept-Encoding: gzip, br을 사용하세요

예제: 응답 헤더 접근

// After making an API request, access headers from the response object
const requestId = response.headers.get('CF-RAY');
const remainingRequests = response.headers.get('x-ratelimit-remaining-requests');
const remainingTokens = response.headers.get('x-ratelimit-remaining-tokens');
const usdBalance = response.headers.get('x-venice-balance-usd');

// Check for model deprecation warnings
const deprecationWarning = response.headers.get('x-venice-model-deprecation-warning');
if (deprecationWarning) {
  console.warn(`Model Deprecation: ${deprecationWarning}`);
}

모범 사례

속도 제한: x-ratelimit-remaining-requests 및 x-ratelimit-remaining-tokens 헤더를 모니터링하고 지수 백오프 구현
잔액 모니터링: 서비스 중단을 방지하기 위해 x-venice-balance-usd 및 x-venice-balance-diem 헤더 추적
시스템 프롬프트: Venice의 시스템 프롬프트를 사용하거나 사용하지 않고 테스트하여 사용 사례에 가장 적합한 것 찾기
API 키: API 키를 안전하게 보관하고 정기적으로 회전
요청 로깅: 지원과의 문제 해결을 위해 CF-RAY 헤더 값 기록
모델 지원 중단: 모델 사용 시 x-venice-model-deprecation-warning 헤더 확인

OpenAI API와의 차이점

Venice는 OpenAI API 사양과 높은 호환성을 유지하지만 몇 가지 주요 차이점이 있습니다:

venice_parameters: 확장 기능을 위한 enable_web_search, character_slug, strip_thinking_response와 같은 추가 구성
시스템 프롬프트: Venice는 검열되지 않은 응답에 최적화된 기본값에 시스템 프롬프트를 추가합니다 (include_venice_system_prompt: false로 비활성화)
모델 생태계: Venice는 검열되지 않은 모델과 추론 모델을 포함한 자체 모델 라인업을 제공합니다 - OpenAI 매핑이 아닌 Venice 모델 ID를 사용하세요
응답 헤더: 잔액 추적(x-venice-balance-usd, x-venice-balance-diem), 모델 지원 중단 경고 및 콘텐츠 안전 플래그를 위한 고유 헤더
콘텐츠 정책: 전용 검열되지 않은 모델과 선택적 콘텐츠 필터링이 있는 보다 허용적인 정책

API 안정성

Venice는 v1 엔드포인트와 매개변수에 대한 이전 버전과의 호환성을 유지합니다. 모델 수명 주기 정책, 지원 중단 공지 및 마이그레이션 가이드는 지원 중단을 참조하세요.

OpenAPI 사양 & 원시 데이터

Venice API 문서와 데이터에 대한 프로그래밍 방식 접근(RAG(Retrieval-Augmented Generation)와 함께 사용 포함)을 위해 다음 리소스를 사용할 수 있습니다:

OpenAPI Spec (YAML) — YAML 형식의 전체 API 사양
API Docs Source — 다운로드 가능한 아카이브로 제공되는 모든 문서 페이지(.mdx 형식)

_{이 문서에 나열되지 않은 요청 필드는 전달될 수 있지만 유효성 검사되거나 작동이 보장되지 않습니다.}

Venice APIs

소개

인증

OpenAI 호환성

설정

Venice 전용 기능

시스템 프롬프트

Venice 시스템 프롬프트 비활성화

Venice 매개변수

프롬프트 캐싱

응답 헤더 참조

중요 참고 사항

예제: 응답 헤더 접근

모범 사례

OpenAI API와의 차이점

API 안정성

OpenAPI 사양 & 원시 데이터

​인증

​OpenAI 호환성

​설정

​Venice 전용 기능

​시스템 프롬프트

​Venice 시스템 프롬프트 비활성화

​Venice 매개변수

​프롬프트 캐싱

​응답 헤더 참조

​중요 참고 사항

​예제: 응답 헤더 접근

​모범 사례

​OpenAI API와의 차이점

​API 안정성

​OpenAPI 사양 & 원시 데이터

인증

OpenAI 호환성

설정

Venice 전용 기능

시스템 프롬프트

Venice 시스템 프롬프트 비활성화

Venice 매개변수

프롬프트 캐싱

응답 헤더 참조

중요 참고 사항

예제: 응답 헤더 접근

모범 사례

OpenAI API와의 차이점

API 안정성

OpenAPI 사양 & 원시 데이터