Research
Archived research briefs from the Haiku research swarm that informed hwLedger's architecture and implementation decisions. All 12 briefs are indexed below with direct links to their full analyses.
All Research Briefs
Inference Engines & Backends
- oMlx Analysis — MLX Fork Strategy — Architecture review, fork viability, sidecar integration design.
- Inference Engine Matrix — April 2026 — Comprehensive comparison of MLX, mistral.rs, llama.cpp, vLLM, TGI across platforms.
Subprocess Communication & Integration
- MLX IPC Patterns — JSON-RPC over stdio vs protobuf; venv management; signal discipline.
Memory & Architecture Formulas
- KV Cache Formulas — Per-Architecture Derivations — Complete math breakdown for MHA, GQA, MQA, MLA, SSM, hybrid, attention-sink.
Model Configuration
- Config Ingestion — Model Metadata Loaders — Pure-Rust loaders for HF Hub, GGUF, safetensors; subprocess fallback for MLX.
Hardware Telemetry
- GPU Telemetry Backends — NVIDIA nvml-wrapper, AMD rocm-smi, Apple Silicon macmon, Intel Arc (deferred).
Language Bindings & FFI
- FFI Survey — Rust ↔ Native Language Bindings — UniFFI vs cbindgen vs csbindgen vs cxx-qt vs Slint for SwiftUI, WinUI, Qt.
Fleet Architecture
- Fleet Wire Design — Axum + JSON/HTTPS + mTLS; russh + deadpool SSH; Tailscale integration; phenotype-event-sourcing audit log.
Competitive Analysis
- Competitors Survey — Gap Analysis — HF Accelerate, can-it-run-llm, LM Studio, vLLM internals. hwLedger differentiators.
Auditing & Cost Tracking
- Event Sourcing — Audit Log & Cost Tracking — phenotype-event-sourcing reuse; SHA-256 hash chains; LedgerError::Integrity tamper detection.
Additional Research
- Competing VRAM Planners — Comparative Analysis — Deep-dive into existing capacity planning tools and their limitations.
- UI Journey Harness — VitePress 2 + Vue 3 — Component showcase and interaction patterns for hwLedger desktop UIs.
Key Findings Summary
- oMlx fork is the right choice: HTTP sidecar over Python direct call avoids build complexity.
- JSON-RPC is proven: mistral.rs and MCP both use stdin-based JSON-RPC; fallback to protobuf if throughput saturates.
- Math accuracy is paramount: Each attention mechanism has different KV scaling; per-architecture dispatch is non-negotiable.
- GPU telemetry is fragmented: NVIDIA only has a mature API; AMD and Apple require shell-outs.
- FFI converges on standards: UniFFI for Apple, csbindgen for Windows, cxx-qt for Linux.
- Axum + mTLS > gRPC: Simpler protocol stack; mTLS is sufficient at fleet-of-tens scale.
- Event sourcing is critical: Audit trail for cost reconciliation and fleet diagnostics.
- KV-cache + MoE awareness is the differentiator: No competitor handles MLA, hybrid attention, and active-expert math simultaneously.
Research Organization
All research briefs include:
- Frontmatter: Title, description, brief ID, date, status, sources.
- Executive Summary: One-paragraph overview.
- Deep Dives: Technical analysis, code examples, trade-off tables.
- Recommendations: Clear guidance for implementation.
- Sources: URLs and citations for validation.
- See Also: Links to related ADRs and source files.
Contributing Research
To add a new research brief:
- Create
docs/research/NN-slug.md(where NN is the next brief number) - Include YAML frontmatter with
title,description,brief_id,date,status, andsources - Write 300–600 words with markdown headings, tables, and code blocks
- Reference related ADRs in "See also" section
- Run
bun run sync:researchto publish to docsite - Submit a PR
See CONTRIBUTING.md for full guidelines.
External References
Key papers and resources cited across all briefs:
- Llama 2: Open Foundation and Fine-Tuned Chat Models
- Mistral 7B
- Mixtral of Experts
- Mamba: Linear-Time Sequence Modeling with Selective State Spaces
- Efficient Streaming Language Models with Attention Sinks
- DeepSeek-V2: Multi-Head Latent Attention
- oMlx GitHub
- mistral.rs GitHub
- vLLM GitHub
- UniFFI Documentation
- cxx-qt (KDAB)
- Axum Web Framework