coding

Running 120B AI Models on Networked Mini PCs

Researchers demonstrate running 120-billion parameter AI models across networked mini PCs using distributed computing techniques, making large language models

Someone figured out how to run massive AI models by networking two Bosgame M5 PCs (Strix Halo chips) via Thunderbolt cables.

The setup uses llama.cpp’s RPC feature to split model inference across both machines. With 512GB total RAM and dual iGPUs, they’re running models like:

  • GPT-OSS-120B at 50+ tokens/s (single PC)
  • Minimax-M2.1 Q6 at 18 tokens/s (networked)

Total cost was around €3,200 for both systems plus USB4 cables.

Getting started: Check out the Strix Halo wiki for setup guides and join their Discord for troubleshooting.

The catch? Prompt preprocessing is painfully slow right now, though inference speed is solid once it gets going. They’re planning to test vLLM with 345B models next, which could be interesting for anyone tired of cloud API costs.

Pretty wild that consumer hardware can handle models this size without melting into a puddle.