CLI: plan — DeepSeek-V3

Real-world planning scenario: the massive DeepSeek-V3 (671B mixture-of-experts) at 2K context, 2 concurrent users. Watch how hwLedger automatically detects the MLA (Multi-Head Latent Attention) architecture and breaks down VRAM requirements across model weights, KV cache, and inference activations.

What you'll see

Planning for DeepSeek-V3 with:

Model: DeepSeek-V3 (671B MoE)
Context: 2,048 tokens
Batch: 2 concurrent users

Output includes:

Architecture detection: "MLA (latent_dim=256)" — automatically identified from model config
Model weights: 306 GB (FP16, active params only, MoE sparsity applied)
KV cache: 12 GB (at 2K context, latent-projected)
Activation memory: 45 GB (prefill phase)
Total: ~363 GB — requires 2-4 A100 80GB GPUs with tensor parallelism

Notice the breakdown shows each layer, not just total VRAM.

Journey not yet recorded.

Run the journey recorder to capture interactions:

./apps/macos/HwLedgerUITests/scripts/run-journeys.sh

What to watch for

MoE accounting: DeepSeek-V3 activates only 2/8 experts per token (not all 671B)
Latent KV cache: Much smaller than full-rank attention would need (16x compression)
Tensor parallelism recommendation: TP=4 (split across 4 GPUs) for 80GB A100s
Prefill vs decode: Prefill needs most activation memory; decode mostly just KV cache
Mixture-of-experts breakdown: Shows which experts are active per layer

Next steps

Plan help reference — interactive guide to all options
Architecture Decisions — how dispatch works
Math: MLA — deep dive into Multi-Head Latent Attention

Reproduce

bash

# Plan DeepSeek-V3 from local fixture
hwledger plan tests/golden/deepseek-v3.json --context 2048 --batch 2

# Export as JSON for downstream tools
hwledger plan tests/golden/deepseek-v3.json --context 2048 --batch 2 --json | \
  jq '.vram_required_gb, .recommended_tp'

Source

Recorded journey tape on GitHub

See Journey Recording README for re-recording instructions.

CLI: plan — DeepSeek-V3 ​

What you'll see ​

What to watch for ​

Next steps ​

Reproduce ​

Source ​