New `obliteratus gpu-calc` subcommand estimates minimum GPU count from
model params, dtype, and GPU VRAM. Auto-detects param counts from HF
configs including MoE expert structure.
README now covers --dtype, --quantization flags, the gpu-calc command,
and references both in the "Choosing the right setup" table.
Add a comprehensive section covering:
- How model sharding (pipeline parallelism) works and its limitations
- GPU selection via --gpus flag
- Pipeline parallel benchmarks on GPT-OSS-120B across 3-8 A100-80GB GPUs
- Stage-by-stage timing breakdown
- When data parallelism helps (and when it doesn't)
- Remote SSH execution with CLI and YAML examples
- Decision table for choosing the right setup
When --data-parallel is passed and the model fits on a single GPU,
wraps it with nn.DataParallel to split prompt batches across all
available GPUs during activation collection. Batch size scales by
GPU count. Hooks already move activations to CPU so they work
correctly across replicas.