Technical Specification: Library-First Architecture (pkg/llmproxy)
Overview
cliproxyapi++ implements a "Library-First" architectural pattern by extracting all core proxy logic from the traditional internal/ package into a public, reusable pkg/llmproxy module. This transformation enables external Go applications to import and embed the entire translation, authentication, and communication engine without depending on the CLI binary.
Architecture Migration
Before: Mainline Structure
CLIProxyAPI/
├── internal/
│ ├── translator/ # Core translation logic (NOT IMPORTABLE)
│ ├── provider/ # Provider executors (NOT IMPORTABLE)
│ └── auth/ # Auth management (NOT IMPORTABLE)
└── cmd/server/After: cliproxyapi++ Structure
cliproxyapi++/
├── pkg/llmproxy/ # PUBLIC LIBRARY (IMPORTABLE)
│ ├── translator/ # Translation engine
│ ├── provider/ # Provider implementations
│ ├── config/ # Configuration synthesis
│ ├── watcher/ # Dynamic reload orchestration
│ └── auth/ # Auth lifecycle management
├── cmd/server/ # CLI entry point (uses pkg/llmproxy)
└── sdk/cliproxy/ # High-level embedding SDKCore Components
1. Translation Engine (pkg/llmproxy/translator)
Purpose: Handles bidirectional protocol conversion between OpenAI-compatible requests and proprietary LLM APIs.
Key Interfaces:
type Translator interface {
// Convert OpenAI format to provider format
TranslateRequest(ctx context.Context, req *openai.ChatRequest) (*ProviderRequest, error)
// Convert provider response back to OpenAI format
TranslateResponse(ctx context.Context, resp *ProviderResponse) (*openai.ChatResponse, error)
// Stream translation for SSE
TranslateStream(ctx context.Context, stream io.Reader) (<-chan *openai.ChatChunk, error)
// Provider-specific capabilities
SupportsStreaming() bool
SupportsFunctions() bool
MaxTokens() int
}Implemented Translators:
claude.go- Anthropic Claude APIgemini.go- Google Gemini APIopenai.go- OpenAI GPT APIkiro.go- AWS CodeWhisperer (custom protocol)copilot.go- GitHub Copilot (custom protocol)aggregators.go- OpenRouter, Together, Fireworks
Translation Strategy:
Request Normalization: Parse OpenAI-format request, extract:
- Messages (system, user, assistant)
- Tools/functions
- Generation parameters (temp, top_p, max_tokens)
- Streaming flag
Provider Mapping: Map OpenAI models to provider endpoints:
claude-3-5-sonnet -> claude-3-5-sonnet-20241022 (Anthropic) gpt-4 -> gpt-4-turbo-preview (OpenAI) gemini-1.5-pro -> gemini-1.5-pro-preview-0514 (Gemini)Response Normalization: Convert provider responses to OpenAI format:
- Standardize usage statistics (prompt_tokens, completion_tokens)
- Normalize finish reasons (stop, length, content_filter)
- Map provider-specific error codes to OpenAI error types
2. Provider Execution (pkg/llmproxy/provider)
Purpose: Orchestrates HTTP communication with LLM providers, handling authentication, retry logic, and error recovery.
Key Interfaces:
type ProviderExecutor interface {
// Execute a single request (non-streaming)
Execute(ctx context.Context, auth coreauth.Auth, req *ProviderRequest) (*ProviderResponse, error)
// Execute streaming request
ExecuteStream(ctx context.Context, auth coreauth.Auth, req *ProviderRequest) (<-chan *ProviderChunk, error)
// Health check provider
HealthCheck(ctx context.Context, auth coreauth.Auth) error
// Provider metadata
Name() string
SupportsModel(model string) bool
}Executor Lifecycle:
Request -> RateLimitCheck -> AuthValidate -> ProviderExecute ->
-> Success -> Response
-> RetryableError -> Backoff -> Retry
-> NonRetryableError -> ErrorRate Limiting:
- Per-provider token bucket
- Per-credential quota tracking
- Intelligent cooldown on 429 responses
3. Configuration Management (pkg/llmproxy/config)
Purpose: Loads, validates, and synthesizes configuration from multiple sources.
Configuration Hierarchy:
1. Base config (config.yaml)
2. Environment overrides (CLI_PROXY_*)
3. Runtime synthesis (watcher merges changes)
4. Per-request overrides (query params)Key Structures:
type Config struct {
Server ServerConfig
Providers map[string]ProviderConfig
Auth AuthConfig
Management ManagementConfig
Logging LoggingConfig
}
type ProviderConfig struct {
Type string // "claude", "gemini", "openai", etc.
Enabled bool
Models []ModelConfig
AuthType string // "api_key", "oauth", "device_flow"
Priority int // Routing priority
Cooldown time.Duration
}Hot-Reload Mechanism:
- File watcher on
config.yamlandauths/directory - Debounced reload (500ms delay)
- Atomic config swapping (no request interruption)
- Validation before activation (reject invalid configs)
4. Watcher & Synthesis (pkg/llmproxy/watcher)
Purpose: Orchestrates dynamic configuration updates and background lifecycle management.
Watcher Architecture:
type Watcher struct {
configPath string
authDir string
reloadChan chan struct{}
currentConfig atomic.Value // *Config
currentAuths atomic.Value // []coreauth.Auth
}
// Run starts the watcher goroutine
func (w *Watcher) Run(ctx context.Context) error {
// 1. Initial load
w.loadAll()
// 2. Watch files
go w.watchConfig(ctx)
go w.watchAuths(ctx)
// 3. Handle reloads
for {
select {
case <-w.reloadChan:
w.loadAll()
case <-ctx.Done():
return ctx.Err()
}
}
}Synthesis Pipeline:
Config File Changed -> Parse YAML -> Validate Schema ->
Merge with Existing -> Check Conflicts -> Atomic SwapBackground Workers:
- Token Refresh Worker: Checks every 5 minutes, refreshes tokens expiring within 10 minutes
- Health Check Worker: Pings providers every 30 seconds, marks unhealthy providers
- Metrics Collector: Aggregates request latency, error rates, token usage
Data Flow
Request Processing Flow
HTTP Request (OpenAI format)
↓
Middleware (CORS, auth, logging)
↓
Handler (Parse request, select provider)
↓
Provider Executor (Rate limit check)
↓
Translator (Convert to provider format)
↓
HTTP Client (Execute provider API)
↓
Translator (Convert response)
↓
Handler (Send response)
↓
Middleware (Log metrics)
↓
HTTP Response (OpenAI format)Configuration Reload Flow
File System Event (config.yaml changed)
↓
Watcher (Detect change)
↓
Debounce (500ms)
↓
Config Loader (Parse and validate)
↓
Synthesizer (Merge with existing)
↓
Atomic Swap (Update runtime config)
↓
Notification (Trigger background workers)Token Refresh Flow
Background Worker (Every 5 min)
↓
Scan All Auths
↓
Check Expiry (token.ExpiresAt < now + 10min)
↓
Execute Refresh Flow
↓
Update Storage (auths/{provider}.json)
↓
Notify Watcher
↓
Atomic Swap (Update runtime auths)Reusability Patterns
Embedding as Library
import "github.com/KooshaPari/cliproxyapi-plusplus/pkg/llmproxy"
// Create translator
translator := llmproxy.NewClaudeTranslator()
// Translate request
providerReq, err := translator.TranslateRequest(ctx, openaiReq)
// Create executor
executor := llmproxy.NewClaudeExecutor()
// Execute
resp, err := executor.Execute(ctx, auth, providerReq)
// Translate response
openaiResp, err := translator.TranslateResponse(ctx, resp)Custom Provider Integration
// Implement Translator interface
type MyCustomTranslator struct{}
func (t *MyCustomTranslator) TranslateRequest(ctx context.Context, req *openai.ChatRequest) (*llmproxy.ProviderRequest, error) {
// Custom translation logic
return &llmproxy.ProviderRequest{}, nil
}
// Register with executor
executor := llmproxy.NewExecutor(
llmproxy.WithTranslator(&MyCustomTranslator{}),
)Extending Configuration
// Custom config synthesizer
type MySynthesizer struct{}
func (s *MySynthesizer) Synthesize(base *llmproxy.Config, overrides map[string]interface{}) (*llmproxy.Config, error) {
// Custom merge logic
return base, nil
}
// Use in watcher
watcher := llmproxy.NewWatcher(
llmproxy.WithSynthesizer(&MySynthesizer{}),
)Performance Characteristics
Memory Footprint
- Base package: ~15MB (includes all translators)
- Per-request allocation: <1MB
- Config reload overhead: <10ms
Concurrency Model
- Request handling: Goroutine-per-request (bounded by worker pool)
- Config reloading: Single goroutine (serialized)
- Token refresh: Single goroutine (serialized per provider)
- Health checks: Per-provider goroutines
Throughput
- Single instance: ~1000 requests/second (varies by provider)
- Hot reload impact: <5ms latency blip during swap
- Background workers: <1% CPU utilization
Security Considerations
Public API Stability
- All exported APIs follow semantic versioning
- Breaking changes require major version bump (v7, v8, etc.)
- Deprecated APIs remain for 2 major versions
Input Validation
- All translator inputs validated before provider execution
- Config validation on load (reject malformed configs)
- Auth credential validation before storage
Error Propagation
- Internal errors sanitized before API response
- Provider errors mapped to OpenAI error types
- Detailed logging for debugging (configurable verbosity)
Migration Guide
From Mainline internal/
// Before (mainline)
import "github.com/router-for-me/CLIProxyAPI/v6/internal/translator"
// After (cliproxyapi++)
import "github.com/KooshaPari/cliproxyapi-plusplus/pkg/llmproxy/translator"Function Compatibility
Most internal functions have public equivalents:
internal/translator.NewClaude()→llmproxy/translator.NewClaude()internal/provider.NewExecutor()→llmproxy/provider.NewExecutor()internal/config.Load()→llmproxy/config.LoadConfig()
Testing Strategy
Unit Tests
- Each translator: Mock provider responses
- Each executor: Mock HTTP transport
- Config validation: Test schema violations
Integration Tests
- End-to-end proxy: Real provider APIs (test keys)
- Hot reload: File system changes
- Token refresh: Expiring credentials
Contract Tests
- OpenAI API compatibility: Verify response format
- Provider contract: Verify translator mapping