Capacity Planning
Predict VRAM and throughput for any model, correctly handling MoE, MLA, GQA, and hybrid architectures with live per-layer breakdown.
Desktop inference runtime with enterprise-grade fleet management
Every public VRAM calculator (HF Accelerate, can-it-run-llm, LM Studio) gets MoE and MLA wrong. They undercount KV cache and overcount MoE throughput. hwLedger's math core is architecture-keyed: it dispatches per AttentionKind and treats resident-vs-active parameters separately for MoE.
The result: hobbyist-sized fleet with enterprise bones.
git clone https://github.com/KooshaPari/hwLedger.git
cd hwLedgercargo build --releasecargo run --bin hwledger-cli -- plan --model llama-2-70b| Phase | Status |
|---|---|
| P0 Foundation | in progress |
| P1 Math core | planned |
| P2 Ingestion + probe | planned |
| P3 macOS GUI MVP | planned |
| P4 Inference | planned (macOS only in MVP) |
| P5 Fleet | planned |
| P6 Windows GUI | deferred |
| P7 Linux GUI | deferred |
Tracked in AgilePlus: feature hwledger-v1-macos-mvp (run agileplus status).
Apache-2.0. See LICENSE.
A hobbyist-sized fleet with enterprise bones.