Technical Specification: Library-First Architecture (pkg/llmproxy)

Overview

cliproxyapi++ implements a "Library-First" architectural pattern by extracting all core proxy logic from the traditional internal/ package into a public, reusable pkg/llmproxy module. This transformation enables external Go applications to import and embed the entire translation, authentication, and communication engine without depending on the CLI binary.

Architecture Migration

Before: Mainline Structure

CLIProxyAPI/
├── internal/
│   ├── translator/      # Core translation logic (NOT IMPORTABLE)
│   ├── provider/        # Provider executors (NOT IMPORTABLE)
│   └── auth/            # Auth management (NOT IMPORTABLE)
└── cmd/server/

After: cliproxyapi++ Structure

cliproxyapi++/
├── pkg/llmproxy/         # PUBLIC LIBRARY (IMPORTABLE)
│   ├── translator/       # Translation engine
│   ├── provider/         # Provider implementations
│   ├── config/           # Configuration synthesis
│   ├── watcher/          # Dynamic reload orchestration
│   └── auth/             # Auth lifecycle management
├── cmd/server/          # CLI entry point (uses pkg/llmproxy)
└── sdk/cliproxy/        # High-level embedding SDK

Core Components

1. Translation Engine (`pkg/llmproxy/translator`)

Purpose: Handles bidirectional protocol conversion between OpenAI-compatible requests and proprietary LLM APIs.

Key Interfaces:

type Translator interface {
    // Convert OpenAI format to provider format
    TranslateRequest(ctx context.Context, req *openai.ChatRequest) (*ProviderRequest, error)

    // Convert provider response back to OpenAI format
    TranslateResponse(ctx context.Context, resp *ProviderResponse) (*openai.ChatResponse, error)

    // Stream translation for SSE
    TranslateStream(ctx context.Context, stream io.Reader) (<-chan *openai.ChatChunk, error)

    // Provider-specific capabilities
    SupportsStreaming() bool
    SupportsFunctions() bool
    MaxTokens() int
}

Implemented Translators:

claude.go - Anthropic Claude API
gemini.go - Google Gemini API
openai.go - OpenAI GPT API
kiro.go - AWS CodeWhisperer (custom protocol)
copilot.go - GitHub Copilot (custom protocol)
aggregators.go - OpenRouter, Together, Fireworks

Translation Strategy:

Request Normalization: Parse OpenAI-format request, extract:
- Messages (system, user, assistant)
- Tools/functions
- Generation parameters (temp, top_p, max_tokens)
- Streaming flag

Provider Mapping: Map OpenAI models to provider endpoints:

claude-3-5-sonnet -> claude-3-5-sonnet-20241022 (Anthropic)
gpt-4 -> gpt-4-turbo-preview (OpenAI)
gemini-1.5-pro -> gemini-1.5-pro-preview-0514 (Gemini)

Response Normalization: Convert provider responses to OpenAI format:
- Standardize usage statistics (prompt_tokens, completion_tokens)
- Normalize finish reasons (stop, length, content_filter)
- Map provider-specific error codes to OpenAI error types

2. Provider Execution (`pkg/llmproxy/provider`)

Purpose: Orchestrates HTTP communication with LLM providers, handling authentication, retry logic, and error recovery.

Key Interfaces:

type ProviderExecutor interface {
    // Execute a single request (non-streaming)
    Execute(ctx context.Context, auth coreauth.Auth, req *ProviderRequest) (*ProviderResponse, error)

    // Execute streaming request
    ExecuteStream(ctx context.Context, auth coreauth.Auth, req *ProviderRequest) (<-chan *ProviderChunk, error)

    // Health check provider
    HealthCheck(ctx context.Context, auth coreauth.Auth) error

    // Provider metadata
    Name() string
    SupportsModel(model string) bool
}

Executor Lifecycle:

Request -> RateLimitCheck -> AuthValidate -> ProviderExecute ->
    -> Success -> Response
    -> RetryableError -> Backoff -> Retry
    -> NonRetryableError -> Error

Rate Limiting:

Per-provider token bucket
Per-credential quota tracking
Intelligent cooldown on 429 responses

3. Configuration Management (`pkg/llmproxy/config`)

Purpose: Loads, validates, and synthesizes configuration from multiple sources.

Configuration Hierarchy:

1. Base config (config.yaml)
2. Environment overrides (CLI_PROXY_*)
3. Runtime synthesis (watcher merges changes)
4. Per-request overrides (query params)

Key Structures:

type Config struct {
    Server      ServerConfig
    Providers   map[string]ProviderConfig
    Auth        AuthConfig
    Management  ManagementConfig
    Logging     LoggingConfig
}

type ProviderConfig struct {
    Type        string  // "claude", "gemini", "openai", etc.
    Enabled     bool
    Models      []ModelConfig
    AuthType    string  // "api_key", "oauth", "device_flow"
    Priority    int     // Routing priority
    Cooldown    time.Duration
}

Hot-Reload Mechanism:

File watcher on config.yaml and auths/ directory
Debounced reload (500ms delay)
Atomic config swapping (no request interruption)
Validation before activation (reject invalid configs)

4. Watcher & Synthesis (`pkg/llmproxy/watcher`)

Purpose: Orchestrates dynamic configuration updates and background lifecycle management.

Watcher Architecture:

type Watcher struct {
    configPath     string
    authDir        string
    reloadChan     chan struct{}
    currentConfig  atomic.Value // *Config
    currentAuths   atomic.Value // []coreauth.Auth
}

// Run starts the watcher goroutine
func (w *Watcher) Run(ctx context.Context) error {
    // 1. Initial load
    w.loadAll()

    // 2. Watch files
    go w.watchConfig(ctx)
    go w.watchAuths(ctx)

    // 3. Handle reloads
    for {
        select {
        case <-w.reloadChan:
            w.loadAll()
        case <-ctx.Done():
            return ctx.Err()
        }
    }
}

Synthesis Pipeline:

Config File Changed -> Parse YAML -> Validate Schema ->
    Merge with Existing -> Check Conflicts -> Atomic Swap

Background Workers:

Token Refresh Worker: Checks every 5 minutes, refreshes tokens expiring within 10 minutes
Health Check Worker: Pings providers every 30 seconds, marks unhealthy providers
Metrics Collector: Aggregates request latency, error rates, token usage

Data Flow

Request Processing Flow

HTTP Request (OpenAI format)
    ↓
Middleware (CORS, auth, logging)
    ↓
Handler (Parse request, select provider)
    ↓
Provider Executor (Rate limit check)
    ↓
Translator (Convert to provider format)
    ↓
HTTP Client (Execute provider API)
    ↓
Translator (Convert response)
    ↓
Handler (Send response)
    ↓
Middleware (Log metrics)
    ↓
HTTP Response (OpenAI format)

Configuration Reload Flow

File System Event (config.yaml changed)
    ↓
Watcher (Detect change)
    ↓
Debounce (500ms)
    ↓
Config Loader (Parse and validate)
    ↓
Synthesizer (Merge with existing)
    ↓
Atomic Swap (Update runtime config)
    ↓
Notification (Trigger background workers)

Token Refresh Flow

Background Worker (Every 5 min)
    ↓
Scan All Auths
    ↓
Check Expiry (token.ExpiresAt < now + 10min)
    ↓
Execute Refresh Flow
    ↓
Update Storage (auths/{provider}.json)
    ↓
Notify Watcher
    ↓
Atomic Swap (Update runtime auths)

Reusability Patterns

Embedding as Library

import "github.com/KooshaPari/cliproxyapi-plusplus/pkg/llmproxy"

// Create translator
translator := llmproxy.NewClaudeTranslator()

// Translate request
providerReq, err := translator.TranslateRequest(ctx, openaiReq)

// Create executor
executor := llmproxy.NewClaudeExecutor()

// Execute
resp, err := executor.Execute(ctx, auth, providerReq)

// Translate response
openaiResp, err := translator.TranslateResponse(ctx, resp)

Custom Provider Integration

// Implement Translator interface
type MyCustomTranslator struct{}

func (t *MyCustomTranslator) TranslateRequest(ctx context.Context, req *openai.ChatRequest) (*llmproxy.ProviderRequest, error) {
    // Custom translation logic
    return &llmproxy.ProviderRequest{}, nil
}

// Register with executor
executor := llmproxy.NewExecutor(
    llmproxy.WithTranslator(&MyCustomTranslator{}),
)

Extending Configuration

// Custom config synthesizer
type MySynthesizer struct{}

func (s *MySynthesizer) Synthesize(base *llmproxy.Config, overrides map[string]interface{}) (*llmproxy.Config, error) {
    // Custom merge logic
    return base, nil
}

// Use in watcher
watcher := llmproxy.NewWatcher(
    llmproxy.WithSynthesizer(&MySynthesizer{}),
)

Performance Characteristics

Memory Footprint

Base package: ~15MB (includes all translators)
Per-request allocation: <1MB
Config reload overhead: <10ms

Concurrency Model

Request handling: Goroutine-per-request (bounded by worker pool)
Config reloading: Single goroutine (serialized)
Token refresh: Single goroutine (serialized per provider)
Health checks: Per-provider goroutines

Throughput

Single instance: ~1000 requests/second (varies by provider)
Hot reload impact: <5ms latency blip during swap
Background workers: <1% CPU utilization

Security Considerations

Public API Stability

All exported APIs follow semantic versioning
Breaking changes require major version bump (v7, v8, etc.)
Deprecated APIs remain for 2 major versions

Input Validation

All translator inputs validated before provider execution
Config validation on load (reject malformed configs)
Auth credential validation before storage

Error Propagation

Internal errors sanitized before API response
Provider errors mapped to OpenAI error types
Detailed logging for debugging (configurable verbosity)

Migration Guide

From Mainline internal/

// Before (mainline)
import "github.com/router-for-me/CLIProxyAPI/v6/internal/translator"

// After (cliproxyapi++)
import "github.com/KooshaPari/cliproxyapi-plusplus/pkg/llmproxy/translator"

Function Compatibility

Most internal functions have public equivalents:

internal/translator.NewClaude() → llmproxy/translator.NewClaude()
internal/provider.NewExecutor() → llmproxy/provider.NewExecutor()
internal/config.Load() → llmproxy/config.LoadConfig()

Testing Strategy

Unit Tests

Each translator: Mock provider responses
Each executor: Mock HTTP transport
Config validation: Test schema violations

Integration Tests

End-to-end proxy: Real provider APIs (test keys)
Hot reload: File system changes
Token refresh: Expiring credentials

Contract Tests

OpenAI API compatibility: Verify response format
Provider contract: Verify translator mapping

Technical Specification: Library-First Architecture (pkg/llmproxy) ​

Overview ​

Architecture Migration ​

Before: Mainline Structure ​

After: cliproxyapi++ Structure ​

Core Components ​

1. Translation Engine (pkg/llmproxy/translator) ​

2. Provider Execution (pkg/llmproxy/provider) ​

3. Configuration Management (pkg/llmproxy/config) ​

4. Watcher & Synthesis (pkg/llmproxy/watcher) ​

Data Flow ​

Request Processing Flow ​

Configuration Reload Flow ​

Token Refresh Flow ​

Reusability Patterns ​

Embedding as Library ​

Custom Provider Integration ​

Extending Configuration ​

Performance Characteristics ​

Memory Footprint ​

Concurrency Model ​

Throughput ​

Security Considerations ​

Public API Stability ​

Input Validation ​

Error Propagation ​

Migration Guide ​

From Mainline internal/ ​

Function Compatibility ​

Testing Strategy ​

Unit Tests ​

Integration Tests ​

Contract Tests ​