28 Commits

Author SHA1 Message Date
pliny d6af36f8d3 Merge pull request #38 from GangGreenTemperTatum/chore/dependency-release-age-guardrails
chore: add dependency release age guardrails
2026-04-01 13:47:50 -07:00
GangGreenTemperTatum d40142b6af Add dependency release age guardrails 2026-04-01 07:44:54 -04:00
pliny 25c3fea436 Merge pull request #36 from StellaAthena/main
Add multi-GPU and remote support
2026-03-26 15:50:29 -07:00
Stella Biderman 501ff0c963 Add gpu-calc command and document precision/quantization options
New `obliteratus gpu-calc` subcommand estimates minimum GPU count from
model params, dtype, and GPU VRAM. Auto-detects param counts from HF
configs including MoE expert structure.

README now covers --dtype, --quantization flags, the gpu-calc command,
and references both in the "Choosing the right setup" table.
2026-03-17 14:01:18 -04:00
Stella Biderman 79b469d3dc Add DeepSeek-R1-Distill-Llama-70B pipeline parallel benchmarks
Benchmarked 70B dense model (149 GB bf16) on 2/3/4/8 A100-80GB GPUs.
3 GPUs was fastest (536s), confirming minimum-viable-GPU-count guidance.
Combined stage breakdown table for both models.
2026-03-16 15:29:57 -04:00
Stella Biderman c723da02c8 Document multi-GPU parallelism, benchmarks, and remote SSH execution
Add a comprehensive section covering:
- How model sharding (pipeline parallelism) works and its limitations
- GPU selection via --gpus flag
- Pipeline parallel benchmarks on GPT-OSS-120B across 3-8 A100-80GB GPUs
- Stage-by-stage timing breakdown
- When data parallelism helps (and when it doesn't)
- Remote SSH execution with CLI and YAML examples
- Decision table for choosing the right setup
2026-03-16 14:39:22 -04:00
Stella Biderman 51f621d0a2 Save model snapshot to CPU to avoid OOM on multi-GPU setups
The snapshot() deepcopy was cloning tensors on their original GPU
devices, doubling VRAM usage. For a 234GB model sharded across 6
A100-80GB GPUs (~39GB each), this left no room for the copy.

Now snapshot stores tensors on CPU and restore() moves them back
to each parameter's current device.
2026-03-13 17:13:50 -04:00
Stella Biderman a2bb748f1b Revert "Add data parallel support for PROBE stage"
This reverts commit 1a6e2577bb.
2026-03-13 16:54:31 -04:00
Stella Biderman 1a6e2577bb Add data parallel support for PROBE stage
When --data-parallel is passed and the model fits on a single GPU,
wraps it with nn.DataParallel to split prompt batches across all
available GPUs during activation collection. Batch size scales by
GPU count. Hooks already move activations to CPU so they work
correctly across replicas.
2026-03-13 01:24:31 -04:00
Stella Biderman a634950abd Update remote install URL to StellaAthena fork 2026-03-13 01:02:18 -04:00
Stella Biderman b23d989824 Add __main__.py and fix remote pip install command 2026-03-12 12:56:03 -04:00
Stella Biderman cbdb772eb9 Add multi-GPU support with --gpus flag
Adds --gpus flag to obliterate, run, and tourney commands for controlling
which GPUs to use (sets CUDA_VISIBLE_DEVICES). Works both locally and with
--remote. Models are automatically split across selected GPUs via
accelerate's device_map="auto". Also adds gpus field to remote YAML config.
2026-03-12 12:35:29 -04:00
Stella Biderman 34032b7821 Add remote SSH execution support for GPU nodes
Adds --remote [user@]host flag to obliterate, run, and tourney commands,
enabling execution on remote GPU nodes via SSH. Also supports a remote:
section in YAML configs. The remote runner handles SSH connectivity checks,
GPU detection, auto-installation of obliteratus, log streaming, and result
syncing back to the local machine via scp.
2026-03-11 15:51:16 -04:00
pliny 84bdf5d978 Create .gitignore 2026-03-08 13:03:36 -07:00
pliny 526cfb8943 Add files via upload 2026-03-08 13:02:53 -07:00
pliny 346db3d59d Add files via upload 2026-03-08 12:23:58 -07:00
pliny 26e1c5b13b Add files via upload 2026-03-08 12:09:27 -07:00
pliny 69fa63ac43 Add files via upload 2026-03-08 12:07:56 -07:00
pliny 1065809658 Add files via upload 2026-03-07 17:54:38 -08:00
pliny ece134f870 Add files via upload 2026-03-07 17:53:42 -08:00
pliny 984ce14059 Add files via upload 2026-03-05 10:03:46 -08:00
pliny 6120061553 Add files via upload 2026-03-05 00:52:16 -08:00
pliny 66ea4a6f86 Add files via upload 2026-03-05 00:50:44 -08:00
pliny 4cddc6399a Add files via upload 2026-03-05 00:44:59 -08:00
pliny f67f13ca57 Update README.md 2026-03-04 13:35:24 -08:00
pliny 8312bbfa8d Add files via upload 2026-03-04 13:09:57 -08:00
pliny 904092fcdb Update README.md 2026-03-04 12:44:17 -08:00
pliny 0f6114fe87 Add files via upload 2026-03-04 12:38:18 -08:00