coding

Students Train SOTA Code Models on Single GPUs

Researchers demonstrate how students can train state-of-the-art code generation models on single consumer GPUs using novel optimization techniques and

Students replicate SOTA coding models using DeepSpeed optimizations on single GPUs.

Clone the Training Repository:

Enable DeepSpeed ZeRO-3 Offloading: Add to deepspeed_config.json:

"zero_optimization": {
 "stage": 3,
 "offload_optimizer": {"device": "cpu"},
 "offload_param": {"device": "cpu"}
}

Launch Training:

Test on LiveCodeBench: Visit https://livecodebench.github.io/ for evaluation scripts.

Monitor Training:

  • Press Ctrl+C to pause safely
  • Use nvidia-smi to track VRAM usage

This setup reduced 14B parameter fine-tuning from 1.6 months to 2 weeks on a single A6000 GPU, achieving 41.7% Pass@1 by offloading optimizer states to CPU RAM.