mirror of
https://github.com/elder-plinius/OBLITERATUS.git
synced 2026-04-29 14:46:15 +02:00
51f621d0a2
The snapshot() deepcopy was cloning tensors on their original GPU devices, doubling VRAM usage. For a 234GB model sharded across 6 A100-80GB GPUs (~39GB each), this left no room for the copy. Now snapshot stores tensors on CPU and restore() moves them back to each parameter's current device.