FAQ
Does hwLedger work on CPU?
Not efficiently. hwLedger is optimized for GPU inference (NVIDIA CUDA, AMD ROCm, Apple Metal). CPU inference is supported as fallback but will be 10-100x slower. Use for testing only.
Which GPUs are supported?
NVIDIA: CUDA compute capability 3.0+ (Kepler or newer). Test with nvidia-smi to verify.
AMD: RDNA or RDNA 2+ (5700 XT, 6800 XT, MI series). Test with rocm-smi.
Apple: M1/M2/M3/M4 with Metal framework. Intel Macs not supported.
Intel: Arc A-series (limited support, experimental).
Can I run multiple models simultaneously?
Yes. Use --batch 2 or higher. Batch is independent of model count. hwLedger supports:
- 1 model, batch 8 tokens
- 2 models, batch 4 tokens each (if VRAM permits)
Check hwledger plan --batch 8 --model mistral-7b to see if VRAM is sufficient.
How do I update hwLedger?
macOS (Sparkle auto-update):
# Auto-update available; install at app launch
# Manual update:
curl -O https://github.com/KooshaPari/hwLedger/releases/download/v0.1.0/hwledger-macos.dmg
open hwledger-macos.dmg
# Drag hwledger to ApplicationsLinux:
cargo install --path . # Build from source
# Or use system package manager if availableHow much VRAM do I need?
7B model, FP16: 14 GB + KV cache
- 4K context: ~16 GB
- 32K context: ~32 GB
70B model, INT4: 35 GB + KV cache
- 4K context: ~37 GB
- 32K context: ~50+ GB
Use hwledger plan --model <MODEL> --context <CONTEXT> to get exact estimate.
Can I use multiple GPUs?
Yes, via tensor parallelism (TP). hwledger plan --tp 2 splits model across 2 GPUs. Requires high inter-GPU bandwidth (NVLink, PCIe 4.0+).
What quantization should I use?
Start with FP16 (no quantization). If VRAM insufficient:
- Try INT8 (8-bit quantization, ~1% quality loss)
- Try INT4 (4-bit quantization, ~5% quality loss)
hwledger plan --model llama-70b --quant int4 --context 32000How do I deploy the fleet server?
- Install on a box with fixed IP
- Configure:
~/.config/hwledger/server.toml - Enable systemd: copy unit file,
systemctl enable hwledger-server - Register agents:
hwledger fleet register-ssh --host user@agent-box
See Fleet Server Guide for detailed setup.
Can hwLedger work air-gapped (no internet)?
Mostly yes. Internet needed for:
- Initial model download (HuggingFace)
- Cloud rental API calls (Vast, RunPod)
Once models cached locally, air-gapped inference works fine. Use hwledger ingest beforehand.
How do I enable TLS for the fleet server?
By default, server generates self-signed cert at ~/.config/hwledger/server.cert.pem. To use custom cert:
[server]
cert_path = "/path/to/cert.pem"
key_path = "/path/to/key.pem"Agents trust cert via mTLS (pinned public key at registration).
What happens if an agent goes offline?
Fleet server marks agent offline after 3 missed heartbeats (15 seconds default). Jobs queued for that agent are reassigned to available agents.
Can I run inference while planning?
Yes. Planning is a separate process and doesn't block inference. Both can run concurrently.
How do I monitor fleet health?
hwledger fleet agents # List agents + status
hwledger fleet jobs --status running # Show active jobs
hwledger audit --since "2026-04-18T00:00:00Z" # Recent events