Routing and Models Reference

This page explains how cliproxyapi++ selects credentials/providers and resolves model names.

Audience Guidance

Platform operators tuning reliability and quota usage.
Developers debugging model resolution and fallback behavior.

Request Flow

Client sends an OpenAI-compatible request to /v1/*.
API key auth is checked (Authorization: Bearer <client-key>).
Model name is resolved against configured providers, prefixes, and aliases.
Credential/provider is chosen by routing strategy.
Upstream request is translated and executed.
Response is normalized back to OpenAI-compatible JSON/SSE.

Endpoint behavior note:

For Copilot Codex-family models (*codex*, including gpt-5.1-codex-mini), route through /v1/responses.
For non-Codex Copilot and most other providers, /v1/chat/completions remains the default path.

Routing Controls in `config.yaml`

yaml

routing:
  strategy: "round-robin" # round-robin | fill-first

force-model-prefix: false
request-retry: 3
max-retry-interval: 30
quota-exceeded:
  switch-project: true
  switch-preview-model: true

Notes:

quota-exceeded.switch-project and quota-exceeded.switch-preview-model are the current built-in automatic quota fallback controls.
There is no generic per-provider auto-disable/auto-enable scheduler yet; for Gemini keys, use model exclusions/aliases plus these fallback toggles.

Model Prefix and Alias Behavior

A credential/provider prefix (for example team-a) can require requests like team-a/model-name.
With force-model-prefix: true, unprefixed model calls are restricted.
Per-provider alias mappings can translate client-stable names to upstream names.

Example alias configuration:

yaml

codex-api-key:
  - api-key: "sk-xxxx"
    models:
      - name: "gpt-5-codex"
        alias: "codex-latest"

Client request:

json

{ "model": "codex-latest", "messages": [{"role":"user","content":"hi"}] }

Metrics and Routing Diagnosis

bash

# Per-provider rolling stats
curl -sS http://localhost:8317/v1/metrics/providers | jq

# Runtime health
curl -sS http://localhost:8317/health

Use these signals with logs to confirm if retries, throttling, or auth issues are driving fallback.

Common Routing Failure Modes

model_not_found: model alias/prefix not exposed by configured credentials.
Wrong provider selected: prefix overlap or non-explicit model name.
High latency spikes: provider degraded; add retries or alternate providers.
Repeated 429: insufficient credential pool for traffic profile.
400 on Codex model via chat endpoint: retry with /v1/responses and verify resolved model is Codex-family.

Routing and Models Reference ​

Audience Guidance ​

Request Flow ​

Routing Controls in config.yaml ​

Model Prefix and Alias Behavior ​

Metrics and Routing Diagnosis ​

Common Routing Failure Modes ​

Related Docs ​