Cloud GPU Rentals
hwLedger integrates with major GPU rental platforms to query pricing, availability, and automatically spin up instances.
Supported providers
| Provider | Pricing API | Launch API | Shutdown API | Spot | Interruptibility |
|---|---|---|---|---|---|
| Vast.ai | Yes | Yes | Yes | Yes | 5-30% undercut, can interrupt |
| RunPod | Yes | Yes | Yes | Yes | Stable (24h min) |
| Lambda Labs | Yes (manual) | Yes | Yes | No | Dedicated, 10h monthly |
| Modal Labs | Yes | Deployment | Auto | No | Via timeout |
Pricing cache
hwLedger caches prices locally to avoid repeated API calls:
File: ~/.cache/hwledger/pricing-cache.json
json
{
"providers": {
"vast": {
"last_updated": "2026-04-18T21:45:00Z",
"instances": [
{
"id": "123456",
"gpu": "RTX4090",
"vram": 24,
"$/hour": 0.18,
"availability": "available",
"location": "US-West"
}
]
},
"runpod": {
"last_updated": "2026-04-18T21:43:00Z",
"instances": [...]
}
}
}Refresh: cache invalidated after 1 hour or manual hwledger cloud refresh-prices.
Configuration
File: ~/.config/hwledger/cloud.toml
toml
[providers.vast]
api_key = "vast-api-key-here"
min_vram_gb = 16
max_$/hour = 0.50
filter = { location = "US", min_bandwidth_gpbs = 1.0 }
[providers.runpod]
api_key = "runpod-api-key"
min_vram_gb = 20
max_$/hour = 0.40
[providers.lambda]
api_key = "lambda-api-key"
min_vram_gb = 24
[budget]
total_$/month = 100
alert_at_$/hour = 0.60Launch workflow
User command:
bash
hwledger fleet launch --model llama-70b-instruct \
--context 32000 \
--provider vast \
--timeout 2hServer process:
- Query prices: GET
/api/pricingfrom Vast → cache - Filter by model: find GPUs with >= 48GB VRAM (70B model requirement)
- Sort by cost: pick cheapest available
- Create instance: POST
/api/instanceswith image ID (Ubuntu + hwledger-agent preinstalled) - Wait for boot: poll instance status until "running"
- SSH health check:
ssh ubuntu@instance.ip nvidia-smi - Register with fleet server: agent heartbeats in
- Schedule job: planner sends model + inference request
- Teardown: on timeout or user command, DELETE
/api/instances/{id}
Cost estimation
Before launching, planner estimates cost:
Model: llama-70b-instruct (70B params, FP16)
Context: 32K tokens
Cost per inference: (model_size + kv_cache) / bandwidth = cost/s
Vast.ai RTX4090: $0.18/hr = $0.00005/sec
Prefill (32K tokens): ~10 sec × $0.00005 = $0.0005
Decode (1K tokens): ~3 sec × $0.00005 = $0.00015
Total per request: $0.00065 (+ overhead)
Monthly (1000 requests): $0.65 (+ idle cost)User confirms before launch.
Spot instance risks
Vast.ai spot instances save 30% but can be interrupted:
toml
[budget]
prefer_spot = true # Use spot if available
timeout_on_interrupt = 60 # Seconds to migrate job before forced killhwLedger automatically:
- Saves job state to ledger (persist-in-flight)
- Migrates to new instance
- Resumes from checkpoint
Regional selection
Placement scores consider region:
Score = (availability / instance_cost) × (latency_ms / 100)Lower score = better. Server picks top 3 instances, user selects region interactively.