general

Radeon PRO W7900 Runs 70B Models at Full Precision

The Radeon PRO W7900 workstation GPU demonstrates capability to run 70 billion parameter AI models at full precision, offering professionals a powerful

Someone got a Radeon PRO W7900 running local AI models and the unified memory setup makes a huge difference for LLMs.

Real-world performance:

  • DeepSeek 70B at 12 tokens/sec (full precision, no quantization)
  • ComfyUI with LTX2 averaging 12 seconds per iteration
  • Can allocate 96GB+ directly to GPU without PCIe bandwidth limits

The main advantage is skipping quantization completely - no quality loss from compressed models. Full weights just fit in memory and run.

For setup instructions and ComfyUI nodes optimized for AMD cards, there’s a walkthrough repo at: https://github.com/bkpaine1

Pretty solid option for running bigger models locally without dealing with 4-bit or 8-bit quants.