Skip to content

CLI Reference

Complete reference for all hwledger subcommands.

plan

Memory planner: estimates VRAM and selects optimal tensor parallelism, quantization, and attention variant.

bash
hwledger plan [OPTIONS] --model <MODEL>
OptionTypeDefaultDescription
--modelstring(required)Model ID (e.g. mistral-7b-instruct)
--contextinteger4096Target context length in tokens
--batchinteger1Batch size
--quantstringnoneQuantization: none, int8, int4
--attentionstringautoAttention variant: mha, gqa, mqa, mla, ssm, auto
--tpinteger0Tensor parallelism (0 = auto-detect)
--devicestringautoGPU backend: cuda, rocm, metal, cpu
--jsonflagfalseOutput JSON instead of human-readable

Exit codes:

  • 0: Success
  • 1: Model not found
  • 2: Insufficient VRAM
  • 3: Unsupported architecture (e.g. CPU-only query on CUDA-only model)

Examples:

bash
# Plan Mistral 7B for 32K context
hwledger plan --model mistral-7b-instruct --context 32000 --device cuda

# Export as JSON for programmatic use
hwledger plan --model llama-70b --context 8000 --json | jq .vram_required

probe

GPU discovery and telemetry: list available GPUs, memory, compute capability.

bash
hwledger probe [OPTIONS]
OptionTypeDescription
--jsonflagOutput JSON
--watchflagUpdate every 2 seconds (Ctrl+C to exit)
--filterstringFilter by GPU type (e.g. cuda:0, metal:0)

Examples:

bash
# List all GPUs
hwledger probe

# Watch NVIDIA GPU 0 continuously
hwledger probe --watch --filter cuda:0

# Export JSON for parsing
hwledger probe --json | jq '.gpus[].vram_free_gb'

ingest

Download and cache models from HuggingFace or Ollama.

bash
hwledger ingest [OPTIONS] --model <MODEL>
OptionTypeDescription
--modelstringModel ID (e.g. mistralai/Mistral-7B-Instruct-v0.2)
--sourcestringhf (HuggingFace) or ollama
--cache-dirpathCache location (default: ~/.cache/hwledger/models)
--formatstringgguf, safetensors, auto-detect

Exit codes:

  • 0: Success
  • 1: Model not found on source
  • 2: Network error
  • 3: Insufficient disk space

Examples:

bash
# Download Mistral 7B from HuggingFace
hwledger ingest --model mistralai/Mistral-7B-Instruct-v0.2

# Use Ollama as source
hwledger ingest --model llama2:70b --source ollama

run

Execute inference on local or remote GPU.

bash
hwledger run [OPTIONS] --model <MODEL> <INPUT_FILE>
OptionTypeDescription
--modelstringModel to run
--contextintegerMax context (default: auto)
--batchintegerBatch size
--timeoutintegerTimeout in seconds (default: 300)
--outputpathSave result to file (default: stdout)
--remotestringFleet server URL (use remote inference)

Examples:

bash
# Run locally
echo '{"prompt": "Hello world"}' | hwledger run --model mistral-7b

# Use fleet server
hwledger run --model llama-70b --remote tcp://fleet.example.com:5443 input.json

fleet

Fleet orchestration: register agents, query status, submit jobs.

bash
hwledger fleet <SUBCOMMAND>

fleet register-ssh

Register remote GPU via SSH.

bash
hwledger fleet register-ssh --host user@remote.box --key ~/.ssh/id_ed25519 [OPTIONS]

fleet agents

List all registered agents.

bash
hwledger fleet agents [--json]

fleet jobs

List all jobs.

bash
hwledger fleet jobs [--agent <AGENT_ID>] [--status <STATUS>] [--json]

audit

Verify ledger integrity and export audit trail.

bash
hwledger audit [OPTIONS]
OptionTypeDescription
--verifyflagVerify hash chain integrity
--exportpathExport JSON to file
--sinceRFC3339Start time (e.g. 2026-04-17T00:00:00Z)

Examples:

bash
# Verify chain
hwledger audit --verify

# Export last 7 days
hwledger audit --export audit.json --since "2026-04-11T00:00:00Z"

Released under the Apache 2.0 License.