How do I use the BazaarLink API?

Use any OpenAI-compatible SDK. Set the base URL to https://bazaarlink.ai/api/v1 and use your BazaarLink API key (sk-bl-...). Model IDs include the provider prefix, e.g. openai/gpt-4o, anthropic/claude-sonnet-4-6, deepseek/deepseek-chat.

BazaarLink 支援哪些 SDK？

BazaarLink 相容所有 OpenAI SDK，包括 Python openai、TypeScript openai、LangChain、LlamaIndex、Vercel AI SDK 等。只需更改 base_url 即可使用。

How do I use embeddings on BazaarLink?

Use POST /api/v1/embeddings with any supported embedding model like openai/text-embedding-3-small. The request format is identical to OpenAI's embeddings API. BazaarLink supports 20+ embedding models.

BazaarLink 的 Worker Network 是什麼？

Worker Network 目前暫停使用。原先讓 GPU 擁有者可以連接自己的 GPU 來服務推論請求、賺取獎勵，其他用戶可以用更低費率呼叫社群托管的模型。

Does BazaarLink support tool calling and structured output?

Yes. BazaarLink fully supports OpenAI-compatible tool calling (function calling), structured outputs (JSON mode), streaming, and prompt caching across supported models.

Is BazaarLink's chat completions API OpenAI-compatible?

Yes. Point any OpenAI SDK at https://bazaarlink.ai/api/v1 and pass your BazaarLink API key as OPENAI_API_KEY. All standard parameters (messages, temperature, stream, tools, response_format, tool_choice) are supported across all major models.

How do I stream responses from the chat completions endpoint?

Pass stream: true in your request body. BazaarLink returns a Server-Sent Events stream identical to OpenAI's format — each chunk is a delta with choices[0].delta.content. Works with the official OpenAI SDK's .stream() helpers.

Can I call multiple models with fallback if one fails?

Yes. Use BazaarLink's provider routing — pass models: ['anthropic/claude-sonnet-4-6', 'openai/gpt-5.2'] in your request and BazaarLink will try them in order. You can also use reserved model IDs like auto:free to route to any available free model.

Does the API support tool calling / function calling?

Yes. Pass a tools array with function definitions in the standard OpenAI format. BazaarLink translates to Anthropic's tool_use / Google's functionCalls automatically when routing to those providers. Works with parallel tool calls.

What are the rate limits for the API?

Free tier has a shared rate limit suitable for development. Higher limits are available per-key via the Subscription API — see /docs/subscription for plan quotas and upgrade flow. Rate limit headers (X-RateLimit-*) are returned on every response.

Docs API Reference SDK Reference Agentic Usage AI Skills

API Reference

Chat Completions

The primary endpoint. Compatible with the OpenAI Chat Completions API.

POST/api/v1/chat/completions

Request Body

modelrequired

string

Model ID, e.g. "openai/gpt-4o" or "anthropic/claude-3.5-sonnet"

messagesrequired

Message[]

Array of message objects with role and content

stream

boolean

If true, returns a Server-Sent Events stream. Default: false

temperature

number

Sampling temperature 0–2. Higher = more random. Default: 1

max_tokens

integer

Maximum number of tokens to generate

max_completion_tokens

integer

Alias for max_tokens (OpenAI o-series compatible). Both are accepted; whichever is provided takes effect

top_p

number

Nucleus sampling probability mass. Default: 1

top_k

integer

Limit token choices to top-K. 0 = disabled (consider all). Default: 0

frequency_penalty

number

Penalize repeated tokens. Range: [-2, 2]. Default: 0

presence_penalty

number

Penalize tokens based on presence. Range: [-2, 2]. Default: 0

repetition_penalty

number

Reduce token repetition from input. Range: (0, 2]. Default: 1

min_p

number

Minimum probability relative to the top token. Range: [0, 1]. Default: 0

top_a

number

Dynamic top-P based on highest-probability token. Range: [0, 1]. Default: 0

seed

integer

Integer seed for deterministic sampling. Not guaranteed for all models

n

integer

Number of completions to generate. Default: 1

user

string

End-user identifier for monitoring and abuse detection. Has no effect on billing

stop

string | string[]

Stop sequences — generation halts when encountered

logit_bias

object

Map token IDs to bias values [-100, 100] added before sampling

logprobs

boolean

Return log probabilities of each output token

top_logprobs

integer

Number of most-likely tokens to return per position (requires logprobs: true). Range: 0–20

tools

Tool[]

List of tools (functions) the model may call

tool_choice

string | object

Controls tool use: "auto", "none", or specific tool

parallel_tool_calls

boolean

Enable parallel function calling when tools are provided. Default: true

response_format

object

Force structured JSON output. See Structured Output section

transforms

string[]

Message transforms to apply, e.g. ["middle-out"]. Omit to auto-apply on ≤8k-context models

models

string[]

Fallback model list — BazaarLink tries each in order if primary fails

route

string

Set to "fallback" to enable waterfall routing through the models array

provider

object

Provider routing preferences — order, only, ignore, sort, allow_fallbacks

debug

object

Debug options. echo_upstream_body: true returns transformed request body as first SSE chunk (streaming only)

Request Schema (TypeScript)

typescript

type Request = {
  // Required
  model: string;                    // "provider/model-name"
  messages: Message[];

  // Common
  stream?: boolean;                 // Default: true
  temperature?: number;             // Range: [0, 2], default: 0.7
  max_tokens?: number;              // Range: [1, context_length)
  n?: number;                       // Default: 1
  seed?: integer;                   // Deterministic sampling
  stop?: string | string[];

  // Sampling
  top_p?: number;                   // Range: (0, 1]
  top_k?: integer;                  // Default: 0 (disabled)
  frequency_penalty?: number;       // Range: [-2, 2]
  presence_penalty?: number;        // Range: [-2, 2]
  repetition_penalty?: number;      // Range: (0, 2], default: 1
  min_p?: number;                   // Range: [0, 1]
  top_a?: number;                   // Range: [0, 1]

  // Logprobs
  logit_bias?: Record<number, number>;  // Token ID → bias [-100, 100]
  logprobs?: boolean;
  top_logprobs?: number;            // Range: [0, 20], requires logprobs: true

  // Tools & output
  tools?: Tool[];
  tool_choice?: ToolChoice;
  parallel_tool_calls?: boolean;    // Default: true
  response_format?: ResponseFormat;

  // BazaarLink-only
  transforms?: string[];            // e.g. ["middle-out"]
  models?: string[];                // Fallback model list
  route?: "fallback";
  provider?: ProviderPreferences;
  debug?: {
    echo_upstream_body?: boolean;   // Streaming only
  };
};

type Message =
  | { role: "system" | "user" | "assistant"; content: string | ContentPart[] }
  | { role: "tool"; content: string; tool_call_id: string };

type ContentPart =
  | { type: "text"; text: string }
  | { type: "image_url"; image_url: { url: string; detail?: string } };

type Tool = {
  type: "function";
  function: {
    name: string;
    description?: string;
    parameters: object;  // JSON Schema
  };
};

type ToolChoice =
  | "none" | "auto" | "required"
  | { type: "function"; function: { name: string } };

type ResponseFormat =
  | { type: "json_object" }
  | { type: "json_schema"; json_schema: { name: string; strict?: boolean; schema: object } };

type ProviderPreferences = {
  order?: string[];
  only?: string[];
  ignore?: string[];
  allow_fallbacks?: boolean;
  sort?: "price" | "latency" | "throughput";
};

Example Request

bash

curl https://bazaarlink.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $BAZAARLINK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4.1",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in one paragraph."}
    ],
    "temperature": 0.7,
    "max_tokens": 512
  }'

Response

json

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1740000000,
  "model": "openai/gpt-4.1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing leverages quantum mechanics..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 74,
    "total_tokens": 102,
    "cost": 0.0006480,
    "prompt_tokens_details": {
      "cached_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0
    }
  }
}

Response Schema (TypeScript)

typescript

type Response = {
  id: string;
  object: "chat.completion" | "chat.completion.chunk";
  created: number;                 // Unix timestamp
  model: string;
  choices: (NonStreamingChoice | StreamingChoice)[];
  usage?: ResponseUsage;
  cost?: number;                   // Total cost in USD
};

type NonStreamingChoice = {
  index: number;
  finish_reason: "stop" | "length" | "tool_calls" | "content_filter" | null;
  native_finish_reason: string | null;  // Provider's original finish reason
  message: {
    role: "assistant";
    content: string | null;
    tool_calls?: ToolCall[];
  };
};

type StreamingChoice = {
  index: number;
  finish_reason: string | null;
  native_finish_reason: string | null;  // Provider's original finish reason
  delta: {
    role?: string;
    content?: string | null;
    tool_calls?: ToolCall[];
  };
};

type ResponseUsage = {
  prompt_tokens: number;
  completion_tokens: number;
  total_tokens: number;
  cost: number;                      // Total cost for this request in USD
  prompt_tokens_details?: {
    cached_tokens: number;           // Tokens served from prompt cache (reduced cost)
    cache_write_tokens?: number;     // Tokens written to cache in this request
    audio_tokens?: number;
  };
  completion_tokens_details?: {
    reasoning_tokens?: number;       // Thinking/reasoning tokens (e.g. o3, Qwen3, DeepSeek R1)
    image_tokens?: number;
  };
};

type ToolCall = {
  id: string;
  type: "function";
  function: { name: string; arguments: string };
};

Image Generation

BazaarLink offers two image-generation paths: (A) /v1/chat/completions with modalities: ["image"] — the native path, supports SSE streaming and mixed text+image output; recommended for new integrations. (B) /v1/images/generations — OpenAI DALL·E-compatible request shape, but responses are SSE event streams (required for slow models to dodge the 100 s upstream timeout). Both paths emit the same SSE event protocol — endpoint choice is purely a request-shape preference.

Breaking change in v0.200.0

/api/v1/images/generations now returns text/event-stream (SSE) instead of sync JSON. This is required to support models with >100 s generation time (Cloudflare origin timeout). Existing fast-model integrations need to switch to SSE consumption — see the migration snippet below for a 5-line wrapper that restores sync semantics on the client side.

A. /v1/chat/completions (native, recommended)

POST/api/v1/chat/completions

The canonical streaming path. Recommended for any new integration.

bash

curl -N https://bazaarlink.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $BL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5.4-image-2",
    "messages": [{"role":"user","content":"a red cat on a sofa"}],
    "modalities": ["image","text"],
    "stream": true
  }'

圖生圖（image-to-image）：在 content 陣列帶 image_url 部件即可，支援 data URI 或 https 圖片網址（不接受 http://），最多 8 張、單張 data URI 約 10MB 內；帶圖的那則訊息必須同時含 text 部件（編輯指令）。部分模型另支援 image_config（例如 {"strength": 0.7}， 0–1，值越低越貼近原圖），原樣透傳給上游。

bash

curl -N https://bazaarlink.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $BL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5.4-image-2",
    "messages": [{"role":"user","content":[
      {"type":"image_url","image_url":{"url":"data:image/png;base64,..."}},
      {"type":"text","text":"把背景換成夜晚的城市"}
    ]}],
    "modalities": ["image"],
    "stream": true
  }'

圖片編輯（OpenAI 相容）

POST/api/v1/images/edits

OpenAI SDK 的 client.images.edit() 可直接使用（multipart 上傳、同步 JSON 回應，回傳 data: [{ url }]）。限制同圖生圖：最多 8 張、單張 10MB；mask 與 response_format=b64_json 暫不支援。

bash

curl https://bazaarlink.ai/api/v1/images/edits \
  -H "Authorization: Bearer $BL_API_KEY" \
  -F model="gpt-5.4-image-2" \
  -F [email protected] \
  -F prompt="把背景換成夜晚的城市"

B. /v1/images/generations (DALL·E-compatible)

POST/api/v1/images/generations

OpenAI DALL-E request shape. Returns SSE event stream regardless of model.

modelrequired

string

Model id, e.g. google/gemini-2.5-flash-image

promptrequired

string

Text prompt

size

string

Output size (auto-mapped)

n

integer

Number of images (default 1)

bash

curl -N https://bazaarlink.ai/api/v1/images/generations \
  -H "Authorization: Bearer $BL_API_KEY" \
  -H "Accept: text/event-stream" \
  -H "Content-Type: application/json" \
  -d '{"model":"openai/gpt-5.4-image-2","prompt":"a red cat on a sofa"}'

SSE event protocol

Both endpoints emit the same event types:

text

event: heartbeat        # every 60 s, keeps Cloudflare happy
data: {}

event: image            # upstream URL, fastest path
data: {"index": 0, "url": "https://upstream/a.png"}

event: image-cached     # bazaarlink Redis-backed proxy URL (1 hr TTL)
data: {"index": 0, "url": "https://bazaarlink.ai/api/v1/images/proxy/<token>"}

event: usage            # final cost / token count
data: {"promptTokens": 12, "completionTokens": 7080, "cost": 0.226, "durationMs": 163400, "imageCount": 1}

event: done
data: {}

Migration: sync wrapper for clients that need it

If your integration genuinely needs sync semantics (synchronous scripts, environments without SSE-capable clients), drop in either snippet below to consume the SSE stream and return a DALL-E-shaped object.

Python (requires pip install requests sseclient-py):

python

import json, requests, sseclient

def generate_image_sync(model, prompt, api_key, host="https://bazaarlink.ai"):
    res = requests.post(
        f"{host}/api/v1/images/generations",
        headers={"Authorization": f"Bearer {api_key}", "Accept": "text/event-stream"},
        json={"model": model, "prompt": prompt},
        stream=True,
    )
    images, usage = [], None
    for ev in sseclient.SSEClient(res).events():
        data = json.loads(ev.data) if ev.data else {}
        if ev.event == "image-cached":
            images.append({"url": data["url"]})
        elif ev.event == "usage":
            usage = data
        elif ev.event == "error":
            raise RuntimeError(data["reason"])
        elif ev.event == "done":
            break
    return {"data": images, "usage": usage}

Node.js (built-in fetch, no deps):

javascript

async function generateImageSync(model, prompt, apiKey, host = "https://bazaarlink.ai") {
  const res = await fetch(`${host}/api/v1/images/generations`, {
    method: "POST",
    headers: { Authorization: `Bearer ${apiKey}`, Accept: "text/event-stream", "Content-Type": "application/json" },
    body: JSON.stringify({ model, prompt }),
  });
  const reader = res.body.getReader();
  const dec = new TextDecoder();
  let buf = "", images = [], usage = null;
  while (true) {
    const { value, done } = await reader.read();
    if (done) break;
    buf += dec.decode(value, { stream: true });
    let m;
    while ((m = buf.match(/event: (\S+)\ndata: (.+?)\n\n/))) {
      const [, event, data] = m;
      buf = buf.slice(m.index + m[0].length);
      const d = JSON.parse(data);
      if (event === "image-cached") images.push({ url: d.url });
      else if (event === "usage") usage = d;
      else if (event === "error") throw new Error(d.reason);
    }
  }
  return { data: images, usage };
}

Supported image models

Model ID	Modality

Video Generation

Asynchronous three-step flow (submit → poll → content). Video generation takes 30 s–5 min, which doesn't fit the synchronous request/response shape of chat-completions — so BazaarLink exposes it as a dedicated /api/v1/videos endpoint using a job-id pattern: submit returns a vjob_* ID → poll status → fetch bytes on completion. Calling a video model via /chat/completions or /images/generations returns 400 (code: wrong_endpoint_for_video). Billing settles against real usage.cost when the job reaches completed.

1. Submit job (returns vjob_xxx immediately)

POST/api/v1/videos

modelrequired

string

Model id, e.g. alibaba/wan-2.7

promptrequired

string

Text prompt

duration

integer

Duration in seconds (model-dependent)

resolution

string

Resolution — 480p / 720p / 1080p etc.

generate_audio

boolean

Generate audio track (true/false)

bash

curl https://bazaarlink.ai/api/v1/videos \
  -H "Authorization: Bearer $BL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "alibaba/wan-2.7",
    "prompt": "a bird flying over mountains",
    "duration": 3,
    "resolution": "720p",
    "generate_audio": false
  }'
# → 202 { "id": "vjob_xxx", "status": "pending" }

Note

Submit reserves worst-case × duration × multiplier; on completion the real upstream `usage.cost` settles the delta.

2. Poll status

GET/api/v1/videos/{id}

bash

curl -H "Authorization: Bearer $BL_API_KEY" \
  https://bazaarlink.ai/api/v1/videos/vjob_xxx

Note

Each GET must be ≥ 8 s apart to actually hit the upstream (avoids rate limits).

Note

When status=failed, the full reserve amount is refunded to the user.

3. Fetch video content (MP4)

GET/api/v1/videos/{id}/content

bash

curl -H "Authorization: Bearer $BL_API_KEY" \
  -o output.mp4 \
  https://bazaarlink.ai/api/v1/videos/vjob_xxx/content

Supported video models

Model ID	Modality

Audio Inputs

BazaarLink supports two audio paths: inline audio in chat messages for multimodal models, and a dedicated transcription endpoint for speech-to-text.

Path 1 — Chat completions (multimodal input)

Base64 only

Audio data must be **raw base64** — do NOT include a data URI prefix (`data:audio/...;base64,`). That prefix is only used for `image_url`. Pass the bare base64 string directly in the `data` field.

Supported Formats

WAVMP3AIFFAACOGG (Opus / Vorbis)FLACM4APCM16 (raw)PCM24 (raw)

python

import base64

with open("audio.wav", "rb") as f:
    audio_b64 = base64.b64encode(f.read()).decode()
    # audio_b64 is a raw base64 string — do NOT add data: prefix

response = client.chat.completions.create(
    model="openai/gpt-4o-audio-preview",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Transcribe this audio:"},
            {
                "type": "input_audio",
                "input_audio": {
                    "data": audio_b64,   # raw base64, no "data:audio/...;base64," prefix
                    "format": "wav",     # wav, mp3, flac, ogg, m4a, aac
                },
            },
        ],
    }],
)
print(response.choices[0].message.content)

Compatibility

Not all models support all audio formats — check the model page for supported modalities. Billing is based on token usage proportional to audio duration.

Path 2 — Transcription endpoint

POST /v1/audio/transcriptions is a dedicated speech-to-text endpoint compatible with the OpenAI Whisper API. Drop-in replacement: point your existing OpenAI SDK client at BazaarLink and it just works.

Model

Billing

Description

openai/whisper-1Duration (seconds)Classic Whisper — reliable, low cost, broad format support

openai/gpt-4o-transcribeTokensGPT-4o powered — higher accuracy for noisy audio and accents

python

from openai import OpenAI

client = OpenAI(
    base_url="https://bazaarlink.ai/v1",
    api_key="YOUR_API_KEY",
)

with open("audio.wav", "rb") as f:
    transcript = client.audio.transcriptions.create(
        model="openai/whisper-1",   # or "openai/gpt-4o-transcribe"
        file=f,
    )

print(transcript.text)

Billing

whisper-1 bills by audio duration (seconds). gpt-4o-transcribe bills by tokens. Both return the exact upstream cost in the usage.cost field.

Text-to-Speech (TTS)

Synthesize natural speech from text via POST /v1/audio/speech. Returns binary audio (mp3/opus/aac/flac/wav/pcm depending on model and response_format).

Binary response, not JSON

The response body is raw audio bytes — save with --output in curl, or call .arrayBuffer() in fetch. Usage is recorded server-side; query /v1/usage for spend.

TTS Models

Model	Billing	Notes
openai/gpt-4o-mini-tts-2025-12-15	$0.60 / 1M chars	Cost-efficient, 11 voices, accepts natural-language instructions for tone/pace/style.
openai/tts-1	$15 / 1M chars	Original OpenAI TTS — faster, lower fidelity.
openai/tts-1-hd	$30 / 1M chars	Higher-fidelity OpenAI TTS.
mistralai/voxtral-mini-tts-2603	$16 / 1M chars	Mistral's voice model — different voice character.

Voices (gpt-4o-mini-tts)

alloyashballadcoralechofablenovaonyxsageshimmerverse

Voice availability varies by model — check the model card. For natural Mandarin female voice, pick coral, nova, sage, or shimmer.

Response Formats

mp3 (default)opusaacflacwavpcm

Each model accepts a subset of formats. mp3 and pcm are the safest defaults.

python

from openai import OpenAI

client = OpenAI(
    base_url="https://bazaarlink.ai/api/v1",
    api_key="sk-bl-YOUR_API_KEY",
)

with client.audio.speech.with_streaming_response.create(
    model="openai/gpt-4o-mini-tts-2025-12-15",
    voice="coral",
    input="Hello from BazaarLink.",
    response_format="mp3",
) as response:
    response.stream_to_file("hello.mp3")

Steering with instructions

openai/gpt-4o-mini-tts accepts an optional instructions string in natural language to direct voice tone, pace, and style.

Less is more

For natural-sounding voice, leave instructions empty and pick a suitable voice. Heavy direction (e.g. "cute, melodic, singing-style") can cause the model to over-emote.

Limits

Input text is limited to 4096 characters per request. For longer text, split into multiple calls.

Video Inputs

Send video content for models to analyze visuals, generate descriptions, or answer questions about scenes and events. Supported via Gemini models today.

Supported Formats

MP4 (H.264)MPEGMOVWebM

Provider Variations

Google AI (gemini-*): Supports YouTube URLs and Google Cloud Storage (gs://) URIs. File size limit: ~1 GB / up to 1 hour (Gemini 1.5 Pro); ~50 MB / 5 min (Gemini 1.5 Flash, 2.0 Flash).
Vertex AI (gemini-* via vertex): Supports base64-encoded video data and GCS URIs. Suited for private or enterprise storage.
Other providers: MP4, MPEG, MOV, WebM via URL or base64 (model-dependent).

python

# Video analysis via Gemini (Google AI — direct video URL)
response = client.chat.completions.create(
    model="google/gemini-2.5-flash",
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "video_url",
                "video_url": {"url": "https://example.com/video.mp4"},
            },
            {"type": "text", "text": "What is happening in this video?"},
        ],
    }],
)

# Via base64 — local file upload
import base64
with open("video.mp4", "rb") as f:
    vid_b64 = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
    model="google/gemini-2.5-flash",
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "video_url",
                "video_url": {"url": f"data:video/mp4;base64,{vid_b64}"},
            },
            {"type": "text", "text": "Describe the key scenes."},
        ],
    }],
)

Best Practices

Trim to only the necessary segments — shorter videos reduce cost and latency.
Use 720p resolution or lower — higher resolution rarely improves model understanding.
Compress with H.264 codec for widest compatibility.
For long videos, provide a text description of the relevant time range.

PDF Inputs

Send PDF documents directly in messages for models to analyze, summarize, or answer questions about. BazaarLink passes file content through to upstream providers that support it.

Supported Formats

PDF documents (text, images, tables, scanned)
Base64-encoded data URL (`data:application/pdf;base64,...`)
Multi-page documents
Password-free PDFs only

python

import base64

with open("document.pdf", "rb") as f:
    pdf_data = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4.6",
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "file",
                "file": {
                    "filename": "document.pdf",
                    "file_data": f"data:application/pdf;base64,{pdf_data}",
                },
            },
            {"type": "text", "text": "Summarize this document."},
        ],
    }],
)

Processing Engines

Engine

Pricing

Description

nativeModel costForwards PDF bytes directly. Requires model-native PDF support (Claude, Gemini).

pdf-textFreeExtracts embedded text. Best for text-only PDFs with embedded fonts. Fast, zero extra cost.

mistral-ocrPaid / pageOCR extraction — works on scanned PDFs and image-heavy documents. Higher accuracy.

Selecting a Processing Engine

Pass a `plugins` array to select the PDF parsing engine. The parsed content is automatically injected into the model context — works with any model, not just PDF-native ones.

python

import base64

with open("document.pdf", "rb") as f:
    pdf_data = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
    model="openai/gpt-4o",        # any model — parser injects text into context
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "file",
                "file": {
                    "filename": "report.pdf",
                    "file_data": f"data:application/pdf;base64,{pdf_data}",
                },
            },
            {"type": "text", "text": "Summarize this document."},
        ],
    }],
    plugins=[{
        "id": "file-parser",
        "pdf": {"engine": "mistral-ocr"},  # or "pdf-text" (free), "native"
    }],
)

Responses API

An OpenAI Responses API-compatible endpoint for stateless multi-turn conversations, tool calling, and multimodal inputs. Ideal for agents and frameworks that use the OpenAI Python SDK ≥ 1.x with client.responses.create().

POST/api/v1/responses

Note

Accepts the same authentication and model routing as Chat Completions.

Request Body

modelrequired

string

Model ID, e.g. "openai/gpt-4o" or "anthropic/claude-3.5-sonnet"

inputrequired

string | Item[]

User input — a plain string (single message) or an array of input items for multi-turn / multimodal conversations.

instructions

string

System-level instructions, equivalent to a system message. Must be re-sent on every request.

stream

boolean

If true, returns Responses API SSE stream events. Event types: response.created, response.output_text.delta, response.completed.

max_output_tokens

integer

Maximum number of output tokens to generate (includes reasoning tokens for o-series models).

temperature

number

Sampling temperature 0–2. Higher = more random. Default: 1

top_p

number

Nucleus sampling probability mass. Default: 1

tools

Tool[]

Tool (function) definitions — same JSON Schema format as Chat Completions. Built-in tools (web_search, file_search, computer_use) are not supported.

tool_choice

string | object

Controls tool use: "auto", "none", or specific tool

parallel_tool_calls

boolean

Enable parallel function calling when tools are provided. Default: true

response_format

object

Force structured JSON output. See Structured Output section

models

string[]

Fallback model list — BazaarLink tries each in order if primary fails

transforms

string[]

Message transforms to apply, e.g. ["middle-out"]. Omit to auto-apply on ≤8k-context models

previous_response_id

string

Not supported in this implementation. Use stateless mode: pass the full conversation history in the input array instead.

provider

object

Provider routing preferences — order, only, ignore, sort, allow_fallbacks

Request Schema (TypeScript)

typescript

type ResponsesRequest = {
  model: string;                    // "provider/model-name"
  input: string | InputItem[];      // string or multi-turn array

  // Optional
  instructions?: string;            // System-level message
  stream?: boolean;                 // Default: false
  max_output_tokens?: number;
  temperature?: number;             // Range: [0, 2], default: 0.7
  top_p?: number;
  tools?: Tool[];
  tool_choice?: "auto" | "none" | "required" | object;
  parallel_tool_calls?: boolean;    // Default: true
  previous_response_id?: string;    // Not supported — use full input array
  provider?: ProviderPreferences;   // Same as Chat Completions
};

type InputItem =
  | { type?: "message"; role: "user" | "assistant" | "system" | "developer"; content: string | ContentBlock[] }
  | { type: "function_call_output"; call_id: string; output: string }   // tool result
  | { type: "function_call"; call_id: string; name: string; arguments: string };

type ContentBlock =
  | { type: "input_text"; text: string }
  | { type: "input_image"; image_url: string; detail?: "auto" | "low" | "high" };

Example Request

bash

curl https://bazaarlink.ai/api/v1/responses \
  -H "Authorization: Bearer $ROUTEFREE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "instructions": "You are a helpful assistant.",
    "input": "What is the capital of Taiwan?"
  }'

Response Format

typescript

// Non-streaming response object
type ResponsesResponse = {
  id: string;             // "resp_..."
  object: "response";
  created_at: number;
  completed_at: number;
  status: "completed" | "failed" | "incomplete";
  model: string;
  output: OutputItem[];
  usage: {
    input_tokens: number;   // equivalent to prompt_tokens
    output_tokens: number;  // equivalent to completion_tokens
    total_tokens: number;
    cost?: number;          // actual cost in credits
  } | null;
  error: null | { code: string; message: string };
};

type OutputItem =
  | {
      type: "message";
      id: string;
      role: "assistant";
      status: "completed";
      content: Array<{ type: "output_text"; text: string; annotations: [] }>;
    }
  | { type: "function_call"; id: string; call_id: string; name: string; arguments: string; status: "completed" };

Migrating from Chat Completions

Replace messages with input (string or array), use instructions instead of a system-role message, and read output[0].content[0].text instead of choices[0].message.content.

python

# Chat Completions (before)
response = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are helpful."},
        {"role": "user",   "content": "Hello"},
    ]
)
text = response.choices[0].message.content

# Responses API (after)
response = client.responses.create(
    model="openai/gpt-4o-mini",
    instructions="You are helpful.",
    input="Hello"
)
text = response.output[0].content[0].text

Limitations

previous_response_id is accepted but ignored — use stateless mode (full input array).
Built-in tools (web_search_preview, file_search, computer_use_preview) are not supported.
background: true (async execution) is not supported.

Messages (Anthropic)

Anthropic-compatible Messages API for Claude SDK. Use exactly as you would with Anthropic's API — just change base URL and auth header.

POST/api/v1/messages

Note

Accepts Bearer token or x-api-key header (Anthropic SDK compatibility). Max body size: 10 MB.

Request Body

modelrequired

string

Model ID, e.g. "openai/gpt-4o" or "anthropic/claude-3.5-sonnet"

max_tokensrequired

integer

Maximum number of tokens to generate (positive integer).

messagesrequired

Message[]

Array of conversation messages (non-empty).

system

string

Optional system prompt.

stream

boolean

If true, returns a Server-Sent Events stream. Default: false

temperature

number

Sampling temperature 0–2. Higher = more random. Default: 1

top_p

number

Nucleus sampling probability mass. Default: 1

top_k

integer

Limit token choices to top-K. 0 = disabled (consider all). Default: 0

stop_sequences

string[]

Custom stop sequence strings.

tools

Tool[]

List of tools (functions) the model may call

tool_choice

string | object

Controls tool use: "auto", "none", or specific tool

Example Request

python

from anthropic import Anthropic

client = Anthropic(
    base_url="https://bazaarlink.ai/api/v1",
    api_key="sk-bl-YOUR_KEY"
)

response = client.messages.create(
    model="anthropic/claude-opus-4",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.content[0].text)

Response

json

{
  "id": "msg_...",
  "type": "message",
  "role": "assistant",
  "model": "anthropic/claude-opus-4",
  "content": [
    { "type": "text", "text": "Hello! How can I help you today?" }
  ],
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 10,
    "output_tokens": 12,
    "cache_read_input_tokens": 0,
    "cache_creation_input_tokens": 0,
    "bz_cost": 0.00042
  }
}

Errors: 400 (validation), 402 (insufficient credits), 429 (rate limit), 502 (upstream error / missing key), 503 (server restart).

Models

List all available models with pricing and capability information. No authentication required for this endpoint.

GET/api/v1/models

bash

curl https://bazaarlink.ai/api/v1/models

Response

json

{
  "data": [
    {
      "id": "openai/gpt-4.1",
      "name": "GPT 4.1",
      "context_length": 1047576,
      "modality": "text+image+file->text",
      "pricing": {
        "prompt": "2.00",
        "completion": "8.00"
      }
    }
  ]
}

typescript

// /v1/models — Response Schema
type ModelsResponse = {
  data: Model[];
};

type Model = {
  id: string;                    // Model ID (e.g. "openai/gpt-4.1")
  name: string;                  // Human-readable name
  context_length: number | null; // Max context window in tokens
  modality: string | null;       // e.g. "text->text", "text+image->text"
  pricing: {
    prompt: string;              // Input price per 1M tokens (USD)
    completion: string;          // Output price per 1M tokens (USD)
  };
  description?: string | null;   // Model description
  top_provider?: {
    max_completion_tokens?: number;
  };
  supported_parameters?: string[]; // e.g. ["tools", "response_format", "reasoning"]
};

Available Models (293)

These are the models currently available on BazaarLink, loaded dynamically from our database:

OpenAI

openai/gpt-5.3-codex400K ctx · $1.75/$14.00text+image+file->text

openai/gpt-5400K ctx · $1.25/$10.00text+image+file->text

openai/gpt-5-codex400K ctx · $1.25/$10.00text+image->text

openai/gpt-5.1400K ctx · $1.25/$10.00text+image+file->text

openai/o3-pro200K ctx · $20.00/$80.00text+image+file->text

openai/o4-mini-high200K ctx · $1.10/$4.40text+image+file->text

openai/gpt-oss-120b131K ctx · $0.03/$0.15text->text

openai/gpt-4o-2024-05-13128K ctx · $5.00/$15.00text+image+file->text

openai/o3-deep-research200K ctx · $10.00/$40.00text+image+file->text

openai/gpt-4o-search-preview128K ctx · $2.50/$10.00text->text

openai/gpt-4-turbo-preview128K ctx · $10.00/$30.00text->text

openai/gpt-5.2400K ctx · $1.75/$14.00text+image+file->text

openai/o1200K ctx · $15.00/$60.00text+image+file->text

openai/gpt-4.1-nano1048K ctx · $0.10/$0.40text+image+file->text

openai/o3200K ctx · $2.00/$8.00text+image+file->text

openai/gpt-4.11048K ctx · $2.00/$8.00text+image+file->text

openai/gpt-5.2-pro400K ctx · $21.00/$168.00text+image+file->text

openai/o4-mini200K ctx · $1.10/$4.40text+image+file->text

openai/gpt-5-pro400K ctx · $15.00/$120.00text+image+file->text

openai/gpt-5-nano400K ctx · $0.05/$0.40text+image+file->text

openai/o3-mini200K ctx · $1.10/$4.40text+file->text

openai/o1-pro200K ctx · $150.00/$600.00text+image+file->text

openai/gpt-5.1-codex400K ctx · $1.25/$10.00text+image->text

openai/gpt-4-turbo128K ctx · $10.00/$30.00text+image->text

openai/gpt-5.3-chat128K ctx · $1.75/$14.00text+image+file->text

openai/gpt-5.4-image-2272K ctx · $8.00/$15.00text+image+file->text+image

openai/o3-mini-high200K ctx · $1.10/$4.40text+file->text

openai/gpt-5.2-codex400K ctx · $1.75/$14.00text+image->text

openai/gpt-5.4-mini400K ctx · $0.75/$4.50text+image+file->text

openai/gpt-4.1-mini1048K ctx · $0.40/$1.60text+image+file->text

openai/gpt-5.4-pro1050K ctx · $30.00/$180.00text+image+file->text

openai/gpt-3.5-turbo-06134K ctx · $1.00/$2.00text->text

openai/gpt-4o-mini-2024-07-18128K ctx · $0.15/$0.60text+image+file->text

openai/gpt-5-chat128K ctx · $1.25/$10.00text+image+file->text

openai/gpt-3.5-turbo-instruct4K ctx · $1.50/$2.00text->text

openai/gpt-oss-20b131K ctx · $0.03/$0.14text->text

openai/gpt-oss-safeguard-20b131K ctx · $0.07/$0.30text->text

openai/gpt-5.4-nano400K ctx · $0.20/$1.25text+image+file->text

openai/gpt-5.1-codex-max400K ctx · $1.25/$10.00text+image->text

openai/gpt-3.5-turbo-16k16K ctx · $3.00/$4.00text->text

openai/gpt-3.5-turbo16K ctx · $0.50/$1.50text->text

openai/gpt-5.1-codex-mini400K ctx · $0.25/$2.00text+image->text

openai/gpt-4o-mini-search-preview128K ctx · $0.15/$0.60text->text

openai/text-embedding-3-large8K ctx · $0.13/$undefinedtext->embeddings

openai/gpt-48K ctx · $30.00/$60.00text->text

openai/gpt-4o-2024-08-06128K ctx · $2.50/$10.00text+image+file->text

openai/gpt-5.1-chat128K ctx · $1.25/$10.00text+image+file->text

openai/gpt-5-mini400K ctx · $0.25/$2.00text+image+file->text

openai/gpt-5.2-chat128K ctx · $1.75/$14.00text+image+file->text

openai/gpt-5.41050K ctx · $2.50/$15.00text+image+file->text

openai/gpt-5.51050K ctx · $5.00/$30.00text+image+file->text

openai/o4-mini-deep-research200K ctx · $2.00/$8.00text+image+file->text

openai/text-embedding-3-small8K ctx · $0.02/$undefinedtext->embeddings

openai/text-embedding-ada-0028K ctx · $0.10/$undefinedtext->embeddings

openai/gpt-4o-mini128K ctx · $0.15/$0.60text+image+file->text

openai/gpt-4o-2024-11-20128K ctx · $2.50/$10.00text+image+file->text

openai/gpt-4o128K ctx · $2.50/$10.00text+image+file->text

Qwen

qwen/qwen3.7-max1000K ctx · $1.25/$3.75text->text

qwen/qwen-plus-2025-07-281000K ctx · $0.26/$0.78text->text

qwen/qwen3-vl-30b-a3b-thinking131K ctx · $0.13/$1.56text+image->text

qwen/qwen-2.5-7b-instruct131K ctx · $0.04/$0.10text->text

qwen/qwen3-vl-30b-a3b-instruct262K ctx · $0.13/$0.52text+image->text

qwen/qwen3-32b131K ctx · $0.08/$0.28text->text

qwen/qwen3.6-35b-a3b262K ctx · $0.14/$1.00text+image+video->text

qwen/qwen3.5-9b262K ctx · $0.10/$0.15text+image+video->text

qwen/qwen3-30b-a3b-instruct-2507131K ctx · $0.05/$0.19text->text

qwen/qwen3-max262K ctx · $0.78/$3.90text->text

qwen/qwen-plus1000K ctx · $0.26/$0.78text->text

qwen/qwen3-next-80b-a3b-instruct262K ctx · $0.09/$1.10text->text

qwen/qwen3-235b-a22b-2507262K ctx · $0.09/$0.10text->text

qwen/qwen-plus-2025-07-28:thinking1000K ctx · $0.26/$0.78text->text

qwen/qwen3.6-27b262K ctx · $0.29/$3.17text+image+video->text

qwen/qwen3.5-flash-02-231000K ctx · $0.07/$0.26text+image+video->text

qwen/qwen3-30b-a3b131K ctx · $0.12/$0.50text->text

qwen/qwen3-vl-8b-instruct256K ctx · $0.08/$0.50text+image->text

qwen/qwen3-max-thinking262K ctx · $0.78/$3.90text->text

qwen/qwen3-next-80b-a3b-thinking262K ctx · $0.10/$0.78text->text

qwen/qwen3-vl-8b-thinking256K ctx · $0.12/$1.36text+image->text

qwen/qwen3-235b-a22b131K ctx · $0.46/$1.82text->text

qwen/qwen3-8b131K ctx · $0.05/$0.40text->text

qwen/qwen3-coder-plus1000K ctx · $0.65/$3.25text->text

qwen/qwen3-coder-next262K ctx · $0.11/$0.80text->text

qwen/qwen3-235b-a22b-thinking-2507262K ctx · $0.10/$0.10text->text

qwen/qwen2.5-vl-72b-instruct131K ctx · $0.80/$1.00text+image->text

qwen/qwen3.5-plus-202604201000K ctx · $0.30/$1.80text+image+video->text

qwen/qwen3-coder1049K ctx · $0.22/$1.80text->text

qwen/qwen3-vl-235b-a22b-thinking131K ctx · $0.26/$2.60text+image->text

qwen/qwen3-embedding-4b33K ctx · $0.02/$undefinedtext->embeddings

qwen/qwen3-coder-30b-a3b-instruct160K ctx · $0.07/$0.27text->text

qwen/qwen3.5-122b-a10b262K ctx · $0.26/$2.08text+image+video->text

qwen/qwen3-vl-235b-a22b-instruct262K ctx · $0.20/$0.88text+image->text

qwen/qwen3.5-plus-02-151000K ctx · $0.26/$1.56text+image+video->text

qwen/qwen3.6-plus1000K ctx · $0.33/$1.95text+image+video->text

qwen/qwen3.5-397b-a17b256K ctx · $0.39/$2.45text+image+video->text

qwen/qwen3.5-35b-a3b262K ctx · $0.14/$1.00text+image+video->text

qwen/qwen3.5-27b262K ctx · $0.20/$1.56text+image+video->text

qwen/qwen3-embedding-8b32K ctx · $0.01/$undefinedtext->embeddings

qwen/qwen-2.5-coder-32b-instruct128K ctx · $0.66/$1.00text->text

qwen/qwen3-coder-flash1000K ctx · $0.20/$0.97text->text

qwen/qwen3-vl-32b-instruct262K ctx · $0.10/$0.42text+image->text

qwen/qwen3-30b-a3b-thinking-2507131K ctx · $0.08/$0.40text->text

qwen/qwen-2.5-72b-instruct131K ctx · $0.36/$0.40text->text

qwen/qwen3-14b132K ctx · $0.10/$0.24text->text

Google

google/gemini-3.1-flash-lite-preview1049K ctx · $0.25/$1.50text+image+file+audio+video->text

google/gemini-3.1-pro-preview1049K ctx · $2.00/$12.00text+image+file+audio+video->text

google/gemini-2.5-flash-lite1049K ctx · $0.10/$0.40text+image+file+audio+video->text

google/gemini-3.5-flash1049K ctx · $1.50/$9.00text+image+file+audio+video->text

google/gemini-3.1-flash-image131K ctx · $0.50/$3.00text+image->text+image

google/gemini-3-pro-image66K ctx · $2.00/$12.00text+image->text+image

google/gemini-2.5-flash1049K ctx · $0.30/$2.50text+image+file+audio+video->text

google/gemini-2.5-flash-image33K ctx · $0.30/$2.50text+image->text+image

google/gemma-4-31b-it262K ctx · $0.12/$0.35text+image+video->text

google/gemma-3-12b-it131K ctx · $0.05/$0.15text+image->text

google/gemini-embedding-00120K ctx · $0.15/$undefinedtext->embeddings

google/gemini-2.5-flash-lite-preview-09-20251049K ctx · $0.10/$0.40text+image+file+audio+video->text

google/gemini-3-flash-preview1049K ctx · $0.50/$3.00text+image+file+audio+video->text

google/gemini-embedding-2-preview8K ctx · $0.20/$undefinedtext+image+file+audio+video->embeddings

google/gemini-2.5-pro-preview-05-061049K ctx · $1.25/$10.00text+image+file+audio+video->text

google/gemma-3-27b-it131K ctx · $0.08/$0.16text+image->text

google/gemini-2.5-pro1049K ctx · $1.25/$10.00text+image+file+audio+video->text

google/gemini-3.1-flash-image-preview131K ctx · $0.50/$3.00text+image->text+image

google/gemma-2-27b-it8K ctx · $0.65/$0.65text->text

google/gemma-3-4b-it131K ctx · $0.05/$0.10text+image->text

google/gemini-3.1-pro-preview-customtools1049K ctx · $2.00/$12.00text+image+file+audio+video->text

google/gemma-4-26b-a4b-it262K ctx · $0.06/$0.33text+image+video->text

google/gemma-3n-e4b-it33K ctx · $0.06/$0.12text->text

google/gemini-2.5-pro-preview1049K ctx · $1.25/$10.00text+image+file+audio->text

Mistral

mistralai/ministral-14b-2512262K ctx · $0.20/$0.20text+image->text

mistralai/mistral-large-2512262K ctx · $0.50/$1.50text+image+file->text

mistralai/mistral-small-3.2-24b-instruct128K ctx · $0.07/$0.20text+image->text

mistralai/mistral-medium-3.1131K ctx · $0.40/$2.00text+image+file->text

mistralai/codestral-embed-25058K ctx · $0.15/$undefinedtext->embeddings

mistralai/ministral-3b-2512131K ctx · $0.10/$0.10text+image->text

mistralai/voxtral-small-24b-250732K ctx · $0.10/$0.30text+file+audio->text

mistralai/codestral-2508256K ctx · $0.30/$0.90text+file->text

mistralai/mistral-large128K ctx · $2.00/$6.00text+file->text

mistralai/mistral-large-2407131K ctx · $2.00/$6.00text+file->text

mistralai/mistral-small-3.1-24b-instruct128K ctx · $0.35/$0.56text+image->text

mistralai/mistral-small-2603262K ctx · $0.15/$0.60text+image->text

mistralai/mistral-nemo131K ctx · $0.02/$0.03text->text

mistralai/mistral-small-24b-instruct-250133K ctx · $0.05/$0.08text->text

mistralai/mistral-medium-3131K ctx · $0.40/$2.00text+image+file->text

mistralai/mixtral-8x22b-instruct66K ctx · $2.00/$6.00text+file->text

mistralai/mistral-embed-23128K ctx · $0.10/$undefinedtext->embeddings

mistralai/devstral-2512262K ctx · $0.40/$2.00text+file->text

mistralai/mistral-saba33K ctx · $0.20/$0.60text+file->text

mistralai/ministral-8b-2512262K ctx · $0.15/$0.15text+image->text

z-ai

z-ai/glm-4.6203K ctx · $0.43/$1.74text->text

z-ai/glm-4.7-flash203K ctx · $0.06/$0.40text->text

z-ai/glm-4.6v131K ctx · $0.30/$0.90text+image+video->text

z-ai/glm-5.21049K ctx · $0.95/$3.00text->text

z-ai/glm-5.1203K ctx · $0.98/$3.08text->text

z-ai/glm-4.5-air131K ctx · $0.13/$0.85text->text

z-ai/glm-5203K ctx · $0.60/$1.92text->text

z-ai/glm-5v-turbo203K ctx · $1.20/$4.00text+image+video->text

z-ai/glm-5-turbo262K ctx · $1.20/$4.00text->text

z-ai/glm-4.7203K ctx · $0.40/$1.75text->text

z-ai/glm-4.5v66K ctx · $0.60/$1.80text+image->text

z-ai/glm-4.5131K ctx · $0.60/$2.20text->text

Anthropic

anthropic/claude-opus-4200K ctx · $15.00/$75.00text+image+file->text

anthropic/claude-opus-4.81000K ctx · $5.00/$25.00text+image+file->text

anthropic/claude-opus-4.5200K ctx · $5.00/$25.00text+image+file->text

anthropic/claude-sonnet-41000K ctx · $3.00/$15.00text+image+file->text

anthropic/claude-sonnet-4.51000K ctx · $3.00/$15.00text+image+file->text

anthropic/claude-opus-4.71000K ctx · $5.00/$25.00text+image+file->text

anthropic/claude-haiku-4.5200K ctx · $1.00/$5.00text+image+file->text

anthropic/claude-sonnet-4.61000K ctx · $3.00/$15.00text+image+file->text

anthropic/claude-3-haiku200K ctx · $0.25/$1.25text+image->text

anthropic/claude-opus-4.1200K ctx · $15.00/$75.00text+image+file->text

anthropic/claude-opus-4.61000K ctx · $5.00/$25.00text+image+file->text

DeepSeek

deepseek/deepseek-v4-flash1049K ctx · $0.09/$0.18text->text

deepseek/deepseek-r1-distill-llama-70b128K ctx · $0.80/$0.80text->text

deepseek/deepseek-v4-pro1049K ctx · $0.43/$0.87text->text

deepseek/deepseek-chat-v3-0324164K ctx · $0.20/$0.77text->text

deepseek/deepseek-v3.1-terminus164K ctx · $0.27/$0.95text->text

deepseek/deepseek-chat131K ctx · $0.20/$0.80text->text

deepseek/deepseek-v3.2131K ctx · $0.23/$0.34text->text

deepseek/deepseek-r1-0528164K ctx · $0.50/$2.15text->text

deepseek/deepseek-v3.2-exp164K ctx · $0.27/$0.41text->text

deepseek/deepseek-chat-v3.1164K ctx · $0.21/$0.79text->text

deepseek/deepseek-r1164K ctx · $0.70/$2.50text->text

Streaming

Set stream: true to receive a Server-Sent Events (SSE) stream. Each event contains a chunk of the response.

python

from openai import OpenAI

client = OpenAI(
    base_url="https://bazaarlink.ai/api/v1",
    api_key="sk-bl-YOUR_API_KEY",
)

stream = client.chat.completions.create(
    model="anthropic/claude-sonnet-4.6",
    messages=[{"role": "user", "content": "Count to 10 slowly."}],
    stream=True,
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

SSE Format

bash

data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"Hello"},"index":0}]}

data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":" world"},"index":0}]}

data: {"id":"chatcmpl-abc","choices":[{"delta":{},"finish_reason":"stop","index":0}],"usage":{"prompt_tokens":10,"completion_tokens":4,"total_tokens":14}}

data: [DONE]

Usage in streaming

When streaming, usage data is returned in the final chunk before the [DONE] message, with an empty choices array.

Cursor IDE Integration

Use BazaarLink as Cursor's OpenAI Override URL. Drop-in setup with auto-conversion of Responses API bodies, tool-format normalization, and the bz- prefix convention for Claude models.

Quick setup

In Cursor, open Settings → Models, then:

Set Override OpenAI Base URL to https://bazaarlink.ai/v1
Set Override OpenAI API Key to your sk-bl-... BazaarLink key
Add the model name you want — see below for Claude (the bz- prefix).

Backward compatibility

The legacy URL https://bazaarlink.ai/v1/cursor still works — it's now a thin re-export of /v1/chat/completions. New setups should use /v1 directly.

The bz- prefix (for Claude models)

Cursor's client-side validation reroutes any model name starting with claude- through Cursor's own Anthropic integration, bypassing your Override URL. To force Cursor to send the request to BazaarLink, prefix the model name with bz-. The server strips the prefix and resolves the rest via the alias map.

Type in Cursor	Resolves to
bz-claude-sonnet-4.6	anthropic/claude-sonnet-4.6
bz-claude-opus-4-7	anthropic/claude-opus-4-7
gpt-4o	openai/gpt-4o
gemini-2.5-flash	google/gemini-2.5-flash

The dot-vs-hyphen variants are normalized: bz-claude-sonnet-4.6 and bz-claude-sonnet-4-6 both resolve to the same model.

CURSOR_MODEL_MAP env var (operator override)

For self-hosted BazaarLink deployments, set this env var to remap arbitrary Cursor-side model names to canonical OpenRouter ids:

bash

CURSOR_MODEL_MAP=gpt-claude-sonnet:anthropic/claude-sonnet-4.6,gpt-opus:anthropic/claude-opus-4-7

Now gpt-claude-sonnet typed in Cursor maps to anthropic/claude-sonnet-4.6 server-side. Useful when you want Cursor to think a model is GPT-family (so it routes through the Override URL) while you actually serve Claude.

What happens automatically

When a request hits /api/v1/chat/completions, BazaarLink applies these compatibility transforms transparently — you don't need to do anything client-side:

Auto-detects Responses API bodies — if the body has input instead of messages, it's converted to Chat Completions shape (Cursor sends Responses API format for GPT-family models).
Wraps flat tool definitions — Cursor Agent sends { name, description, parameters } without a function wrapper. We wrap it so Anthropic doesn't reject as Tool '' not found in provided tools.
Coerces malformed tool_choice — Cursor sends { type: "auto" } (object form, no function). The OpenAI spec requires the string form for auto/none/required, so we coerce.
Strips OpenAI-only fields when routing to non-OpenAI providers — parallel_tool_calls, logprobs, top_logprobs, logit_bias, service_tier, user are removed before forwarding (Anthropic returns 400 otherwise).
Maps max_output_tokens → max_tokens and strips Responses-API-only fields (previous_response_id, truncation, background, store). The reasoning field is preserved for Chat-Completions-native bodies (it's a valid OpenRouter passthrough).

Cursor Agent mode

Tool calling works through the standard Chat Completions tool-call flow. Cursor sends tools (Shell, Read, Write, Grep, etc.) with tool_choice: "auto"; BazaarLink forwards to your chosen provider, which decides whether to call a tool. Tool calls return as standard OpenAI tool_calls deltas; Cursor executes locally and continues the conversation. Works the same regardless of whether you pick gpt-4o (native OpenAI) or bz-claude-sonnet-4.6 (Anthropic via OpenRouter).

Debugging upstream rejects

If you see provider 4xx errors, check the admin Provider Health panel. Every 4xx response is persisted with the full upstream error body and a summary of the request body we forwarded — click any 🔴 row to expand the JSON.

Embeddings

Generate text embeddings compatible with the OpenAI Embeddings API.

POST/api/v1/embeddings

Note

Not all upstream providers support embeddings. If your configured provider does not support the requested model, BazaarLink will automatically failover to the next available provider.

python

from openai import OpenAI

client = OpenAI(
    base_url="https://bazaarlink.ai/api/v1",
    api_key="sk-bl-YOUR_API_KEY",
)

response = client.embeddings.create(
    model="openai/text-embedding-3-small",
    input="The quick brown fox jumps over the lazy dog",
)

print(response.data[0].embedding)  # 1536-dimensional vector

Parameters

Sampling parameters shape the token generation process. BazaarLink passes supported parameters to the upstream provider; unsupported parameters are silently ignored.

Sampling Parameters

temperature

number

Sampling temperature 0–2. Higher = more random. Default: 1

top_p

number

Nucleus sampling probability mass. Default: 1

top_k

integer

Limit token choices to top-K. 0 = disabled (consider all). Default: 0

frequency_penalty

number

Penalize repeated tokens. Range: [-2, 2]. Default: 0

presence_penalty

number

Penalize tokens based on presence. Range: [-2, 2]. Default: 0

repetition_penalty

number

Reduce token repetition from input. Range: (0, 2]. Default: 1

min_p

number

Minimum probability relative to the top token. Range: [0, 1]. Default: 0

top_a

number

Dynamic top-P based on highest-probability token. Range: [0, 1]. Default: 0

seed

integer

Integer seed for deterministic sampling. Not guaranteed for all models

max_tokens

integer

Maximum number of tokens to generate

n

integer

Number of completions to generate. Default: 1

logit_bias

object

Map token IDs to bias values [-100, 100] added before sampling

logprobs

boolean

Return log probabilities of each output token

top_logprobs

integer

Number of most-likely tokens to return per position (requires logprobs: true). Range: 0–20

response_format

object

Force structured JSON output. See Structured Output section

stop

string | string[]

Stop sequences — generation halts when encountered

tools

Tool[]

List of tools (functions) the model may call

tool_choice

string | object

Controls tool use: "auto", "none", or specific tool

parallel_tool_calls

boolean

Enable parallel function calling when tools are provided. Default: true

BazaarLink-only Parameters

transforms

string[]

Message transforms to apply, e.g. ["middle-out"]. Omit to auto-apply on ≤8k-context models

models

string[]

Fallback model list — BazaarLink tries each in order if primary fails

route

string

Set to "fallback" to enable waterfall routing through the models array

provider

object

Provider routing preferences — order, only, ignore, sort, allow_fallbacks

debug

object

Debug options. echo_upstream_body: true returns transformed request body as first SSE chunk (streaming only)

Tool Calling

Tool calling (also known as function calling) lets models invoke external functions you define. The model decides when to call a tool and generates structured arguments — your code executes the function and returns results to continue the conversation.

Supported Models

Most frontier models support tool calling. Here are some popular choices:

Defining Tools

Each tool is a JSON object describing a function the model can call. The parameters field uses JSON Schema.

namerequired

string

Function name (a-z, A-Z, 0-9, underscores, dashes)

descriptionrequired

string

Clear description of when and how the function should be used

parametersrequired

object

JSON Schema object defining function parameters

tool_choice Options

Value

Behavior

"auto"Model decides whether to call a tool (default)

"none"Model will not call any tool

"required"Model must call at least one tool

{"type": "function", "function": {"name": "get_weather"}}Model must call the specified function

Complete Flow

Tool calling is a multi-turn process: (1) send the request with tools → (2) model returns tool_calls → (3) execute functions → (4) send results back → (5) model generates final response.

python

import json
from openai import OpenAI

client = OpenAI(
    base_url="https://bazaarlink.ai/api/v1",
    api_key="sk-bl-YOUR_API_KEY",
)

# Step 1: Define tools and send request
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["city"]
        }
    }
}]

response = client.chat.completions.create(
    model="openai/gpt-4.1",
    messages=[{"role": "user", "content": "What's the weather in Taipei?"}],
    tools=tools,
    tool_choice="auto",
)

# Step 2: Check for tool calls
message = response.choices[0].message
if message.tool_calls:
    tool_call = message.tool_calls[0]
    args = json.loads(tool_call.function.arguments)

    # Step 3: Execute your function
    result = {"temperature": 28, "unit": "celsius", "condition": "Partly cloudy"}

    # Step 4: Send result back
    final = client.chat.completions.create(
        model="openai/gpt-4.1",
        messages=[
            {"role": "user", "content": "What's the weather in Taipei?"},
            message,
            {"role": "tool", "tool_call_id": tool_call.id, "content": json.dumps(result)},
        ],
        tools=tools,
    )

    # Step 5: Get final response
    print(final.choices[0].message.content)
    # "The weather in Taipei is 28°C and partly cloudy."

Parallel Tool Calls

Some models can call multiple tools in a single response. Handle each tool call and return all results:

python

# Model may return multiple tool_calls
if message.tool_calls:
    messages = [
        {"role": "user", "content": "Weather and time in Tokyo?"},
        message,
    ]

    for tool_call in message.tool_calls:
        # Execute each function
        if tool_call.function.name == "get_weather":
            result = {"temperature": 22, "condition": "Clear"}
        elif tool_call.function.name == "get_time":
            result = {"time": "2026-02-23T15:30:00+09:00"}

        messages.append({
            "role": "tool",
            "tool_call_id": tool_call.id,
            "content": json.dumps(result),
        })

    # Send all results back at once
    final = client.chat.completions.create(
        model="openai/gpt-4.1",
        messages=messages,
        tools=tools,
    )
    print(final.choices[0].message.content)

Structured Output

Force the model to return valid JSON matching a schema. This is essential for building reliable applications that parse model outputs programmatically.

Method 1: response_format (JSON Schema)

to enforce strict JSON Schema compliance:

typerequired

string

Must be "json_schema"

json_schema.namerequired

string

A name for the schema (used for caching)

json_schema.strict

boolean

When true, guarantees exact schema compliance

json_schema.schemarequired

object

The JSON Schema definition

python

response = client.chat.completions.create(
    model="openai/gpt-4.1",
    messages=[{"role": "user", "content": "Review the movie Inception"}],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "movie_review",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {
                    "title": {"type": "string"},
                    "rating": {"type": "integer", "description": "Rating 1-10"},
                    "summary": {"type": "string"},
                    "pros": {"type": "array", "items": {"type": "string"}},
                    "cons": {"type": "array", "items": {"type": "string"}},
                },
                "required": ["title", "rating", "summary", "pros", "cons"],
                "additionalProperties": False,
            },
        },
    },
)

import json
review = json.loads(response.choices[0].message.content)
print(review["title"])    # "Inception"
print(review["rating"])   # 9

Tips

Use clear, descriptive property names — the model uses them as context.
Add descriptions to schema properties to guide the model.
Set strict: true for guaranteed schema compliance (may increase latency slightly).
Keep schemas simple — deeply nested schemas may reduce output quality.
Test with different models — some handle complex schemas better than others.

Assistant Prefill

Guide the model to respond in a specific way by including a partial assistant message at the end of your messages array. The model will continue from where you left off.

python

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4.6",
    messages=[
        {"role": "user", "content": "What is the capital of France?"},
        {"role": "assistant", "content": "The capital of France is"},
    ],
)

# Model continues: " Paris, known for the Eiffel Tower..."
print(response.choices[0].message.content)

How it works

BazaarLink passes messages directly to the upstream provider. Assistant prefill works with any model that supports it — including Anthropic Claude and most OpenAI models.

Model Routing

BazaarLink uses the provider/model-name format to route requests to the correct upstream provider. This gives you access to all major models through a single API endpoint.

Model ID Format

bash

{provider}/{model-name}

# Examples:
openai/gpt-4.1
anthropic/claude-sonnet-4.6
google/gemini-2.5-flash
deepseek/deepseek-chat
meta-llama/llama-4-maverick

Routing Priority

When you send a request, BazaarLink resolves the upstream provider in this order:

Exact match — looks for a model route matching the full model ID
Provider wildcard — falls back to provider/* routes (e.g. openai/*)
Global wildcard — falls back to * wildcard routes
Default provider — uses the provider key marked as default
Environment fallback — uses the configured API key as last resort

Browse all available models on the Models page.

Auto Router

The Auto Router analyzes your request and selects the most appropriate model automatically — no need to manually specify a model. Two modes are available:

auto — paid mode. Routes to premium models for best quality. Billed at the resolved model's pricing.
auto:free — free mode. Routes to cost-efficient free models. No credits required, subject to rate limits.

How to Use

Set the model to "auto" (paid) or "auto:free" (free) to enable automatic routing:

python

from openai import OpenAI

client = OpenAI(
    base_url="https://bazaarlink.ai/api/v1",
    api_key="sk-bl-YOUR_API_KEY",
)

# Paid auto routing — uses premium models, charges credits
response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Explain quantum computing"}],
)
print(f"Model used: {response.model}")  # e.g. anthropic/claude-4.6-opus

# Free auto routing — uses free models, no credits needed
response = client.chat.completions.create(
    model="auto:free",
    messages=[{"role": "user", "content": "What is 2+2?"}],
)
print(f"Model used: {response.model}")  # e.g. openai/gpt-5-nano
print(f"Cost: {response.usage.cost}")   # 0

How to Use

Task classification: your prompt is analyzed to determine the task type
Model selection: the best model for that task type is selected from the corresponding pool (paid or free)
Request forwarding: the request is forwarded to the selected model transparently
Response tracking: the resolved model is returned in the response body and X-Auto-Resolved-Model header

Classification Logic

The router computes weighted scores across all categories simultaneously (English + Chinese keywords + structural features) and picks the highest-scoring category:

#CategorySignalExample

Hard ruleTool CallingRequest contains tools parameterAny request with function calling

Hard ruleTrivialVery short prompt with no signals"Hi", "OK", "Thanks"

WeightedMath / ReasoningMath/logic keywords (1.3x boost)"Prove that √2 is irrational"

WeightedCodingCode keywords, file extensions, code blocks"Write a Python sort function"

WeightedDeep AnalysisAnalysis/comparison keywords (1.1x boost)"Analyze the impact of AI on employment"

WeightedCreativeCreative/narrative keywords (1.1x boost)"Write a short story about time travel"

WeightedInstructionTutorial/setup/install keywords"How to set up a Node.js project"

FallbackComplex Chat≥6 messages + long prompt (no keyword hit)Multi-turn conversation with context

FallbackSimple QAShort prompt (no keyword hit)"What is the capital of France?"

FallbackGeneralDefault fallbackEverything else

auto vs auto:free

Both modes use the same classification logic. The difference is the model pool:

auto — selects from premium models (GPT-5, Claude Opus, Gemini Pro, etc.). Credits are charged at the resolved model's rate.
auto:free — selects from free-tier models (DeepSeek, Gemini Flash Lite, GPT-5 Nano, etc.). No credits deducted. RPM and daily request limits apply.
Fallback: if auto:free hits a rate limit and the user has credits + fallback enabled, the request is automatically re-routed as a paid auto request.

Response Header

When model="auto" or "auto:free" is used, the response includes an X-Auto-Resolved-Model header showing the actual model selected. The response body's model field also reflects the resolved model.

Use Cases

General-purpose apps: when you don't know what types of prompts users will send
Cost optimization: use auto:free for development/testing, auto for production
Quality optimization: ensure complex prompts are routed to capable models
Free-tier products: offer AI features to users without requiring them to top up

Limitations

Requires messages format (not raw prompt strings)
Streaming is supported for both auto and auto:free
All standard BazaarLink features (tool calling, response_format, etc.) work with the selected model
auto:free has RPM and daily request limits that vary by user tier

Auto Router

Auto Router is live. Set model to "auto" (paid) or "auto:free" (free) to enable it. The resolved model is returned in the X-Auto-Resolved-Model response header and the response body's model field.

Fallbacks

When a provider experiences an outage or returns an error, BazaarLink can automatically retry your request with alternative models. This ensures high availability without any code changes on your side.

Fully Implemented

The `body.models[]` fallback list is fully supported. Requests are automatically retried across the fallback chain in order until one succeeds or all models are exhausted.

How It Works

Your request is sent to the primary model.
If the primary model fails (5xx error, timeout, or rate limit), BazaarLink automatically retries with the next model in the list.
This continues until a model succeeds or all models have been tried.
The response includes a header indicating which model actually served the request.

Best Practices

Order models from preferred to least preferred — the first model is always tried first.
Mix providers for maximum resilience (e.g. OpenAI → Anthropic → Google).
Use models with similar capabilities to ensure consistent results.
Set reasonable timeouts to avoid long waits before fallback triggers.
Monitor X-Fallback-Used header to track provider reliability.

Provider Selection

Control which providers are used when routing your requests. Include a provider object in your request body to customize routing behaviour. BazaarLink applies preferences locally or passes them through natively depending on upstream support.

Multi-Provider

Provider preferences are fully supported. BazaarLink applies order, only, ignore, sort, and allow_fallbacks locally, and passes advanced parameters to supported upstreams.

Supported Fields

Field	Type	Description
order	string[]	Provider slugs to try in order
allow_fallbacks	boolean	Allow providers outside order/only as fallbacks (default: true)
only	string[]	Only allow these provider slugs
ignore	string[]	Skip these provider slugs
sort	string \| object	"price" \| "throughput" \| "latency" (or { by, partition })
quantizations	string[]	Filter by quantization level
data_collection	string	"allow" \| "deny"
require_parameters	boolean	Only use providers supporting all params
max_price	object	{ prompt, completion } max $/M tokens
zdr	boolean	Only route to Zero Data Retention endpoints
enforce_distillable_text	boolean	Only route to models allowing text distillation
preferred_min_throughput	number \| object	Preferred minimum throughput (tokens/sec)
preferred_max_latency	number \| object	Preferred maximum latency (seconds)

Ordering Providers

Use the order field to specify which providers to try first. Providers not in the list are used as fallbacks (unless allow_fallbacks is false).

json

{
  "model": "meta-llama/llama-4-maverick",
  "messages": [{"role": "user", "content": "Hello"}],
  "provider": {
    "order": ["together", "fireworks"]
  }
}

Filtering Providers

Use only to whitelist specific providers, or ignore to blacklist them. These filters are applied before ordering.

json

// Only use specific providers
{
  "model": "openai/gpt-4o",
  "provider": { "only": ["openai"] }
}

// Skip a provider
{
  "model": "openai/gpt-4o",
  "provider": { "ignore": ["deepinfra"] }
}

Sorting by Price

Set sort to 'price' to automatically route to the cheapest available provider. Throughput and latency sorting are also passed through to supported upstreams.

json

{
  "model": "meta-llama/llama-4-maverick",
  "messages": [{"role": "user", "content": "Hello"}],
  "provider": {
    "sort": "price"
  }
}

Disabling Fallbacks

Set allow_fallbacks to false to restrict routing strictly to your ordered or whitelisted providers. If none are available, the request will fail rather than falling back.

json

{
  "model": "openai/gpt-4o",
  "messages": [{"role": "user", "content": "Hello"}],
  "provider": {
    "order": ["openai", "azure"],
    "allow_fallbacks": false
  }
}

Advanced Parameters

Advanced provider parameters (including quantizations, data_collection, require_parameters, max_price, etc.) are passed through to supported upstream providers for native handling.

json

// Advanced provider selection example
{
  "model": "deepseek/deepseek-v3.2",
  "messages": [{"role": "user", "content": "Hello"}],
  "provider": {
    "order": ["deepinfra", "together"],
    "sort": "throughput",
    "quantizations": ["fp8"],
    "data_collection": "deny",
    "require_parameters": true,
    "allow_fallbacks": true
  }
}

Model Variants

Append a suffix to any model ID to change routing behaviour. BazaarLink supports 7 variant types.

Multi-Provider

Model variants are now supported. Append a suffix like :free, :nitro, or :floor to any model ID. BazaarLink passes the suffix through natively or handles variant routing locally depending on upstream support.

Variant Types

There are two categories of variants: Independent Model IDs (the suffixed model is a distinct endpoint) and Routing Shortcuts (the suffix changes how BazaarLink selects a provider without altering the model itself).

Independent Model IDs

These variants exist as separate models with their own pricing and capabilities. BazaarLink tries the full model ID (with suffix) first, then falls back to the base model.

Suffix	Description	Example
:free	Free-tier version (rate-limited)	deepseek/deepseek-r1:free
:extended	Extended context window	anthropic/claude-sonnet-4.5:extended
:thinking	Extended reasoning / chain-of-thought	deepseek/deepseek-r1:thinking
:exacto	Curated providers for tool-calling accuracy	moonshotai/kimi-k2-0905:exacto

Routing Shortcuts

These suffixes modify provider selection without changing the model identity. The suffix is stripped before matching routes.

Suffix	Equivalent	Behaviour
:nitro	provider.sort="throughput"	Prioritise highest throughput providers
:floor	provider.sort="price"	Sort candidates by price ASC (cheapest first)
:online	plugins: { web: {} }	Enable real-time web search

Multi-Provider Behaviour

For upstreams that support variants, suffixes are passed through as-is. For direct providers (e.g., direct OpenAI, Fireworks), the suffix is stripped and BazaarLink handles routing locally.

Examples

json

// Independent variant — use free tier
{
  "model": "deepseek/deepseek-r1:free",
  "messages": [{"role": "user", "content": "Hello"}]
}

// Routing shortcut — cheapest provider first
{
  "model": "meta-llama/llama-4-maverick:floor",
  "messages": [{"role": "user", "content": "Hello"}]
}

// Routing shortcut — highest throughput
{
  "model": "openai/gpt-4o:nitro",
  "messages": [{"role": "user", "content": "Hello"}]
}

// Web search
{
  "model": "openai/gpt-4o:online",
  "messages": [{"role": "user", "content": "What happened today?"}]
}

Message Transforms

Automatically transform messages to fit within model context limits. When your messages exceed a model's context window, transforms intelligently condense the conversation by removing messages from the middle.

Auto

Models with a context window of 8,192 tokens or less apply middle-out automatically by default. To opt out, pass `transforms: []`. To enable for any model, pass `transforms: ["middle-out"]`.

Usage

json

// Enable middle-out on any model
{
  "model": "openai/gpt-4.1",
  "transforms": ["middle-out"],
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    ... // long conversation — middle will be trimmed to fit context
  ]
}

// Disable auto-trimming for small-context models
{ "transforms": [] }

Transform Types

Transform

Description

middle-outRemoves messages from the middle first, preserving the beginning (system prompt, context) and end (recent messages)

Default Behavior

Models with ≤8k context have middle-out enabled automatically. For larger context models, opt in explicitly. Anthropic Claude models also automatically enforce the 1,000-message limit regardless of the transforms setting.

Zero Completion Insurance

Provides billing protection for requests that completely fail to connect. If a stream never starts, you won't be charged.

Beta

Coverage is partially implemented. Streams that fail mid-way still incur a 10% minimum charge; empty completions (0 output tokens) are still billed for input tokens.

Covered Cases

Upstream refused connection / returned empty body — full refund
Stream failed before starting — full refund

Not Covered

Stream failed mid-way after starting: 10% minimum charge applied. Model returned 0 output tokens (empty content): input tokens are still billed.

Guardrails

Add safety guardrails to your API requests to filter harmful content, enforce compliance policies, and protect your application.

Beta

Guardrails are a planned feature. Currently, content filtering is handled by each upstream model provider's built-in safety systems.

Planned Features

Guardrail

Description

Content filteringBlock harmful, toxic, or inappropriate content in inputs and outputs

PII detectionDetect and redact personally identifiable information

Topic restrictionRestrict model responses to approved topics only

Output validationValidate model outputs against custom rules before returning

Current Behavior

All upstream providers have their own content safety systems. Model responses that trigger content filters will return with finish_reason: "content_filter". Custom guardrail configuration will be available in a future update.

Zero Data Retention

BazaarLink does not store your message content by default. This page describes how your data is handled. Suitable for applications processing sensitive data.

Current Data Handling

Message content: not stored by default, discarded from memory after processing
Billing metadata: token counts, timestamps, model IDs
Usage logs: request statistics only, no message content
Upstream forwarding: messages forwarded to upstream providers — subject to their privacy policies

Prompt Caching

Prompt caching reuses previously computed prompt tokens, significantly reducing cost and latency — especially for applications with large, repeated system prompts.

Note

BazaarLink automatically tracks cache savings and reflects them in billing. The `cached_tokens` field in the response shows actual cache hits; `cacheDiscount` shows the amount saved on that request.

How It Works

Caching is handled automatically by each model provider — no extra configuration needed. BazaarLink transparently proxies cache parameters and reports results in usage responses. For models that support caching, repeated prompt prefixes are typically charged at 10–50% of the normal input rate.

python

response = client.chat.completions.create(
    model="anthropic/claude-3-7-sonnet",
    messages=[
        {"role": "system", "content": "You are an expert..."},  # Long system prompt cached
        {"role": "user", "content": "Question here"},
    ],
)

# Check cache savings in the response usage
usage = response.usage
print(f"Prompt tokens: {usage.prompt_tokens}")
print(f"Cached tokens: {usage.prompt_tokens_details.cached_tokens}")
print(f"Cache savings: {usage.prompt_tokens_details.cached_tokens / usage.prompt_tokens * 100:.1f}%")

Reasoning Tokens

Reasoning models (e.g., DeepSeek R1, o1 series) think internally before producing their final answer. These internal tokens are called reasoning tokens and are billed separately.

Note

BazaarLink reports reasoning tokens in `usage.completion_tokens_details.reasoning_tokens` and shows them separately in billing.

Reading Reasoning Tokens from Responses

python

response = client.chat.completions.create(
    model="deepseek/deepseek-r1",
    messages=[{"role": "user", "content": "Solve: if f(x) = x^2 + 3x, what is f(5)?"}],
)

# Read reasoning tokens from usage
usage = response.usage
print(f"Completion tokens: {usage.completion_tokens}")
if hasattr(usage, "completion_tokens_details"):
    details = usage.completion_tokens_details
    print(f"Reasoning tokens: {details.reasoning_tokens}")
    print(f"Output tokens: {details.accepted_prediction_tokens}")

typescript

const response = await client.chat.completions.create({
  model: "openai/o3-mini",
  messages: [{ role: "user", content: "Prove that sqrt(2) is irrational." }],
  // @ts-ignore - BazaarLink extension
  reasoning_effort: "high",  // low | medium | high
});

const usage = response.usage;
console.log("Reasoning tokens:", usage?.completion_tokens_details?.reasoning_tokens);

Thinking Mode Control

Some models support toggling their "thinking" mode. Thinking mode generates internal reasoning tokens before producing the final answer, improving quality at the cost of more tokens.

Model Family	Parameter	Default
qwen3-*	enable_thinking: boolean	false (platform default)
openai/o1, o3, o4-mini	reasoning_effort: "low" \| "medium" \| "high"	medium
deepseek/deepseek-r1	—	Always enabled (cannot disable)

python

# Qwen3: explicitly enable thinking mode
response = client.chat.completions.create(
    model="qwen/qwen3-32b",
    messages=[{"role": "user", "content": "Prove the Pythagorean theorem"}],
    extra_body={"enable_thinking": True},  # opt-in to thinking
)

# usage.completion_tokens_details.reasoning_tokens shows thinking token count

Unified reasoning Object (New Format)

BazaarLink also supports the unified reasoning object, which works across all model families with a single consistent API:

Field	Values	Applies to
reasoning.effort	"xhigh" \| "high" \| "medium" \| "low" \| "none"	OpenAI o-series, Grok
reasoning.max_tokens	integer	Anthropic Claude, Gemini
reasoning.exclude	boolean	Hide thinking from response (model still reasons)

typescript

// Claude extended thinking — specify thinking budget in tokens
const response = await client.chat.completions.create({
  model: "anthropic/claude-sonnet-4-5",
  messages: [{ role: "user", content: "Prove the Pythagorean theorem" }],
  // @ts-ignore - BazaarLink extension
  reasoning: { max_tokens: 5000 },
});

// OpenAI o3 — specify effort level
const response2 = await client.chat.completions.create({
  model: "openai/o3",
  messages: [{ role: "user", content: "Solve this math problem..." }],
  // @ts-ignore - BazaarLink extension
  reasoning: { effort: "high" },
});

// Hide thinking content from response (model still thinks)
const response3 = await client.chat.completions.create({
  model: "anthropic/claude-sonnet-4-5",
  messages: [{ role: "user", content: "What is 2+2?" }],
  // @ts-ignore - BazaarLink extension
  reasoning: { max_tokens: 2000, exclude: true },
});

Pricing

Thinking tokens are billed as completion tokens. Some providers charge a higher rate for thinking mode — Qwen3 costs 2x standard pricing when thinking is active. BazaarLink defaults Qwen3 to enable_thinking=false to avoid unexpected costs.

Latency & Performance

Optimizing AI API response latency is critical for user experience. Below are the key factors that affect latency in the BazaarLink architecture and best practices for optimization.

Note

BazaarLink logs `latencyMs` (end-to-end latency) and `throughput` (tokens/sec) for every request, visible in your usage logs.

Factors That Affect Latency

Model size: larger models (70B+) are generally slower to generate
Provider load: varies across providers and time of day
Token count: higher max_tokens means longer completion time
Streaming vs. non-streaming: stream: true delivers the first token faster
Context length: very long contexts increase pre-processing time

Optimization Tips

Prefer streaming (stream: true) to improve perceived latency
Use the :nitro variant to select high-throughput providers
Choose smaller models (flash/mini/haiku) for latency-sensitive scenarios
Use provider.sort: "latency" to automatically select the lowest-latency provider
Enable prompt caching to reduce latency for repeated requests

python

import time

# Measure time to first token with streaming
start = time.time()
first_token_time = None

stream = client.chat.completions.create(
    model="google/gemini-2.5-flash",  # Fast model
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)

for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content and not first_token_time:
        first_token_time = time.time() - start

print(f"Time to first token: {first_token_time:.3f}s")

# Check latency in usage logs via /api/v1/usage
# Each log entry includes: latency_ms, throughput (tokens/sec)

python

# Use provider.sort for automatic latency optimization
response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={
        "provider": {
            "sort": "latency",  # Always pick lowest-latency provider
        }
    },
)

Uptime Optimization

BazaarLink maximizes API availability through multiple layers: automatic failover, circuit breakers, and provider health monitoring.

Note

BazaarLink tracks availability for all upstream providers. When a provider's error rate exceeds a threshold, the circuit breaker automatically triggers and routes requests to the next available provider.

Availability Mechanisms

Circuit breaker: auto-detects and isolates failing providers
Automatic failover: seamlessly switches to a backup provider — no code changes needed
Provider health monitoring: continuously tracks error rates and latency per provider
Retry logic: transient errors (5xx) are automatically retried

Circuit Breaker

python

# BazaarLink handles failover automatically — no code changes needed.
# Configure fallback models for maximum resilience:

response = client.chat.completions.create(
    model="openai/gpt-4o",       # Primary model
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={
        "models": [              # Fallback chain
            "openai/gpt-4o",
            "anthropic/claude-3.5-sonnet",
            "google/gemini-2.5-flash",
        ],
        "route": "fallback",     # Enable fallback routing
    },
)

# Check if failover was used (in usage logs)
# "is_failover": true indicates the primary provider was bypassed

bash

# Check provider health (admin only)
GET https://bazaarlink.ai/api/admin/provider-health
Authorization: Bearer sk-bl-ADMIN_KEY

# Response
{
  "providers": [
    {
      "id": "provider-1",
      "name": "Anthropic",
      "status": "healthy",
      "error_rate": 0.002,
      "avg_latency_ms": 145,
      "circuit_open": false
    }
  ]
}

Credits

Query current credit balance and lifetime API usage (OpenRouter-compat format).

GET/api/v1/credits

Auth

Requires Bearer token (standard API key sk-bl-...).

Example Request

bash

curl https://bazaarlink.ai/api/v1/credits \
  -H "Authorization: Bearer sk-bl-YOUR_KEY"

Response

json

{
  "data": {
    "total_credits": 100.00,
    "total_usage": 12.34
  }
}

Errors: 401 (missing/invalid key), 403 (suspended user).

Generation Details

Retrieve detailed statistics for a single completion by generation ID (from chat/completions response id or streaming x-bz-gen-id header).

GET/api/v1/generation?id=<generation-id>

Auth

Requires Bearer token (standard API key). Required query param: id.

Example Request

bash

curl "https://bazaarlink.ai/api/v1/generation?id=gen_abc123" \
  -H "Authorization: Bearer sk-bl-YOUR_KEY"

Response

json

{
  "data": {
    "id": "gen_xyz...",
    "model": "openai/gpt-4.1",
    "provider": "openai",
    "created_at": "2026-04-20T10:00:00.000Z",
    "app_name": "MyApp",
    "finish_reason": "stop",
    "status": 200,
    "duration_ms": 1234,
    "first_token_ms": 234,
    "throughput": 45.6,
    "usage": {
      "prompt_tokens": 100,
      "completion_tokens": 200,
      "total_tokens": 300,
      "prompt_tokens_details": { "cached_tokens": 50 },
      "completion_tokens_details": { "reasoning_tokens": 30 },
      "cost": 0.00123
    },
    "cost_breakdown": {
      "subtotal": 0.00123,
      "cache_discount": 0.00015,
      "total": 0.00108
    }
  }
}

Errors: 400 (missing id), 401 (auth), 404 (generation not found).

API Key Info

Query current API key's rate limit tier and aggregated usage counters (OpenRouter-compat format).

GET/api/v1/key

Auth

Requires Bearer token (standard API key).

Response

json

{
  "data": {
    "label": "Production Key",
    "limit": null,
    "limit_remaining": 100.00,
    "is_free_tier": false,
    "usage": 12.34,
    "usage_daily": 0.12,
    "usage_weekly": 0.45,
    "usage_monthly": 1.23,
    "requests": 1234,
    "requests_daily": 10,
    "requests_weekly": 50,
    "requests_monthly": 200,
    "rate_limit": { "rpm": 600, "tier": "paid" }
  }
}

Note

is_free_tier = true when credit balance < $10. Windows are in UTC: daily = current day, weekly = Mon–Sun, monthly = 1st–EOM.

Errors: 401 (auth), 404 (user missing — rare).

AI Points Balance

Query current daily AI points balance and next reset time (separate from credit-based rate limiting).

GET/api/v1/points/balance

Auth

Requires Bearer token (standard API key).

Response

json

{
  "dailyLimit": 100,
  "dailyUsed": 23,
  "dailyRemaining": 77,
  "resetAt": "2026-04-21T16:00:00.000Z",
  "plan": { "slug": "starter", "dailyPoints": 100 }
}

Note

resetAt is the next UTC+8 midnight (in ISO UTC format). plan is null if the user has no active subscription.

Errors: 401 (auth), 403 (suspended).

Subscription Plans

Public endpoint listing all active subscription plans with pricing, rate limits, daily points, and included models.

GET/api/v1/plans

No authentication required for this endpoint.

Response

json

[
  {
    "slug": "starter",
    "priceUsd": 29,
    "rpmLimit": 600,
    "tokenLimit": null,
    "tokenLimitPeriod": "month",
    "dailyPoints": 100,
    "newAccountDailyPoints": 50,
    "newAccountCooldownHrs": 24,
    "models": [
      { "modelId": "openai/gpt-4.1", "basePoints": 1 }
    ]
  }
]

Cache: s-maxage=300, stale-while-revalidate=60.

Tier Brackets

Public reference for model pricing tiers and per-million-token caps.

GET/api/v1/tier-brackets

No authentication required for this endpoint.

Response

json

[
  { "pts": 1,  "label": "Budget",    "maxIn": 0.30, "maxOut": 1.00 },
  { "pts": 2,  "label": "Mid",       "maxIn": 0.50, "maxOut": 1.50 },
  { "pts": 3,  "label": "Mid",       "maxIn": 1.20, "maxOut": 2.50 },
  { "pts": 5,  "label": "Premium",   "maxIn": 1.00, "maxOut": 3.50 },
  { "pts": 8,  "label": "Premium",   "maxIn": 2.00, "maxOut": 6.00 },
  { "pts": 15, "label": "Flagship",  "maxIn": 3.00, "maxOut": 12.00 },
  { "pts": 25, "label": "Flagship",  "maxIn": null, "maxOut": null }
]

Note

null values indicate no cap. Cache: s-maxage=300.

Agent Registration

Self-service registration for AI agents (bots, autonomous systems). Returns an API key with trial credits and a claim token for account upgrade.

POST/api/v1/agents/register

Rate Limit

No authentication required, but IP-limited to 1 registration per 24 hours.

Request Body

namerequired

string

Agent name (non-empty, trimmed, max 100 chars).

description

string

Optional agent description.

referral_code

string

Optional referral code.

Example Request

bash

curl -X POST https://bazaarlink.ai/api/v1/agents/register \
  -H "content-type: application/json" \
  -d '{
    "name": "My Agent",
    "description": "Autonomous research bot"
  }'

Response

json

{
  "api_key": "sk-bl-xxxxx...",
  "credits": 0.10,
  "credits_usd": "$0.1000",
  "claim_token": "abc...xyz",
  "claim_expires": "2026-04-27T10:00:00.000Z",
  "upgrade_url": "https://bazaarlink.ai/claim?token=...",
  "referral_code": "aBcDeFgH",
  "free_model": "auto:free",
  "message": "Welcome to BazaarLink!...",
  "referral_message": "Share referral link:...",
  "base_url": "https://bazaarlink.ai/api/v1",
  "docs": "https://bazaarlink.ai/llms.txt"
}

Errors: 400 (invalid body / missing name), 429 (rate limit — 1/IP/24h), 500 (internal).

Error Codes

BazaarLink uses standard HTTP status codes. Error responses follow the OpenAI format:

json

{
  "error": {
    "message": "Invalid or disabled API key.",
    "type": "invalid_request_error",
    "code": 401
  }
}

Code

Name

Description

400Bad RequestMalformed request, empty messages array, or missing required fields

401UnauthorizedAPI key is missing, invalid, or disabled

402Payment RequiredInsufficient account credits, per-key spend limit reached, or monthly/weekly budget cap exceeded

403ForbiddenAccount is suspended or does not have permission

413Payload Too LargeRequest body exceeds 10 MB; reduce content size or split the request

429Too Many RequestsRate limit exceeded; check Retry-After header before retrying

500Server ErrorInternal BazaarLink error

502Bad GatewayAll upstream providers failed; failover was attempted

503Service UnavailableNo upstream provider is configured for this model; contact admin

Handling Errors

python

from openai import OpenAI, APIError, RateLimitError

client = OpenAI(
    base_url="https://bazaarlink.ai/api/v1",
    api_key="sk-bl-YOUR_API_KEY",
)

try:
    response = client.chat.completions.create(
        model="openai/gpt-4.1",
        messages=[{"role": "user", "content": "Hello!"}],
    )
except RateLimitError:
    print("Rate limited — waiting before retry...")
except APIError as e:
    print(f"API error {e.status_code}: {e.message}")

Streaming Error Formats

Errors that occur before any tokens are streamed return a standard HTTP error response with a JSON body.

Errors that occur mid-stream are sent as SSE events with finish_reason: "error". Parse the error field in the delta.

typescript

// Error chunk sent mid-stream (finish_reason: "error")
type MidStreamError = {
  choices: [
    {
      index: 0;
      finish_reason: "error";
      delta: { content: "" };
      native_finish_reason: null;
      error: {
        code: number;
        message: string;
        metadata?: {
          provider_name?: string;
          raw?: unknown;
        };
      };
    }
  ];
};

Debugging

Use debug.echo_upstream_body: true to inspect the exact request body sent to the upstream provider. The transformed request is returned as the first SSE chunk. For development / debugging only — do not use in production.

json

// Request with debug enabled (streaming only)
{
  "model": "openai/gpt-4.1",
  "messages": [{ "role": "user", "content": "Hello" }],
  "stream": true,
  "debug": { "echo_upstream_body": true }
}