Placement
When you ask the fleet for a host, hwLedger ranks every candidate with a single blended score and returns the top N.
Formula
- fit_score ∈ [0, 1] — how comfortably the plan fits in the host's free VRAM.
free_bytes / plan.total_bytes, clamped to 1.0. - cost_score ∈ [0, 1] —
1 - (hourly_usd / max_hourly_in_pool). Local hosts withhourly_usd = 0getcost_score = 1.0.
The 0.7/0.3 split biases toward fit: a host that barely fits is penalised more than one that costs a bit more.
Worked example
Plan needs 62 GB for DeepSeek-V3 at seq 32 768, users 2.
| Host | Free VRAM | $/hr | fit | cost | rank |
|---|---|---|---|---|---|
| Local M2 Max | 80 GB | 0.00 | 1.00 | 1.00 | 1.000 |
| RunPod A100-80 | 78 GB | 1.89 | 1.00 | 0.37 | 0.811 |
| Vast.ai 2×A6000 | 92 GB | 1.20 | 1.00 | 0.60 | 0.880 |
| Lambda H100 | 75 GB | 2.99 | 1.00 | 0.00 | 0.700 |
Local wins outright. Drop the local host and Vast.ai edges out RunPod because of cheaper hourly cost at equal fit.
Tie-breaks
When two hosts score within 0.01:
- Prefer local over rented (
kind == Local). - Prefer lower latency (
last_seen_rtt_ms). - Prefer higher free VRAM headroom (absolute bytes, not ratio).
- Stable sort by
host_idas a deterministic final tiebreaker.
Configuration
The weights live in hwledger-server/config.toml:
toml
[placement]
fit_weight = 0.7
cost_weight = 0.3
min_free_headroom_bytes = 1073741824 # 1 GiB slackSet min_free_headroom_bytes to force a safety margin — any host with less free VRAM than plan.total_bytes + headroom is filtered before ranking.