Research

Archived research briefs from the Haiku research swarm that informed hwLedger's architecture and implementation decisions. All 12 briefs are indexed below with direct links to their full analyses.

All Research Briefs

Inference Engines & Backends

oMlx Analysis — MLX Fork Strategy — Architecture review, fork viability, sidecar integration design.
Inference Engine Matrix — April 2026 — Comprehensive comparison of MLX, mistral.rs, llama.cpp, vLLM, TGI across platforms.

Subprocess Communication & Integration

MLX IPC Patterns — JSON-RPC over stdio vs protobuf; venv management; signal discipline.

Memory & Architecture Formulas

KV Cache Formulas — Per-Architecture Derivations — Complete math breakdown for MHA, GQA, MQA, MLA, SSM, hybrid, attention-sink.

Model Configuration

Config Ingestion — Model Metadata Loaders — Pure-Rust loaders for HF Hub, GGUF, safetensors; subprocess fallback for MLX.

Hardware Telemetry

GPU Telemetry Backends — NVIDIA nvml-wrapper, AMD rocm-smi, Apple Silicon macmon, Intel Arc (deferred).

Language Bindings & FFI

FFI Survey — Rust ↔ Native Language Bindings — UniFFI vs cbindgen vs csbindgen vs cxx-qt vs Slint for SwiftUI, WinUI, Qt.

Fleet Architecture

Fleet Wire Design — Axum + JSON/HTTPS + mTLS; russh + deadpool SSH; Tailscale integration; phenotype-event-sourcing audit log.

Competitive Analysis

Competitors Survey — Gap Analysis — HF Accelerate, can-it-run-llm, LM Studio, vLLM internals. hwLedger differentiators.

Auditing & Cost Tracking

Event Sourcing — Audit Log & Cost Tracking — phenotype-event-sourcing reuse; SHA-256 hash chains; LedgerError::Integrity tamper detection.

Additional Research

Competing VRAM Planners — Comparative Analysis — Deep-dive into existing capacity planning tools and their limitations.
UI Journey Harness — VitePress 2 + Vue 3 — Component showcase and interaction patterns for hwLedger desktop UIs.

Key Findings Summary

oMlx fork is the right choice: HTTP sidecar over Python direct call avoids build complexity.
JSON-RPC is proven: mistral.rs and MCP both use stdin-based JSON-RPC; fallback to protobuf if throughput saturates.
Math accuracy is paramount: Each attention mechanism has different KV scaling; per-architecture dispatch is non-negotiable.
GPU telemetry is fragmented: NVIDIA only has a mature API; AMD and Apple require shell-outs.
FFI converges on standards: UniFFI for Apple, csbindgen for Windows, cxx-qt for Linux.
Axum + mTLS > gRPC: Simpler protocol stack; mTLS is sufficient at fleet-of-tens scale.
Event sourcing is critical: Audit trail for cost reconciliation and fleet diagnostics.
KV-cache + MoE awareness is the differentiator: No competitor handles MLA, hybrid attention, and active-expert math simultaneously.

Research Organization

All research briefs include:

Frontmatter: Title, description, brief ID, date, status, sources.
Executive Summary: One-paragraph overview.
Deep Dives: Technical analysis, code examples, trade-off tables.
Recommendations: Clear guidance for implementation.
Sources: URLs and citations for validation.
See Also: Links to related ADRs and source files.

Contributing Research

To add a new research brief:

Create docs/research/NN-slug.md (where NN is the next brief number)
Include YAML frontmatter with title, description, brief_id, date, status, and sources
Write 300–600 words with markdown headings, tables, and code blocks
Reference related ADRs in "See also" section
Run bun run sync:research to publish to docsite
Submit a PR

See CONTRIBUTING.md for full guidelines.

External References

Key papers and resources cited across all briefs:

Research ​

All Research Briefs ​

Inference Engines & Backends ​

Subprocess Communication & Integration ​

Memory & Architecture Formulas ​

Model Configuration ​

Hardware Telemetry ​

Language Bindings & FFI ​

Fleet Architecture ​

Competitive Analysis ​

Auditing & Cost Tracking ​

Additional Research ​

Key Findings Summary ​

Research Organization ​

Contributing Research ​

External References ​