coding

Framework for Choosing LLMs by Hardware Constraints

A practical framework that helps developers and organizations select the most appropriate large language model based on available hardware resources, memory

Someone put together a nice framework for picking open-source LLMs based on actual hardware constraints instead of just going by parameter count.

The breakdown:

  • Unlimited tier - >128GB VRAM (think server setups or multi-GPU rigs)
  • Medium tier - 8-128GB VRAM (solid desktop GPUs, some laptops)
  • Small tier - <8GB VRAM (most consumer hardware)

The thinking here is pretty practical - you probably need different models for different tasks anyway, so why not organize recommendations by what hardware people actually have? Someone running a 4060 with 8GB isn’t getting much value from “this 70B model is amazing” advice.

Turns out this cuts through a lot of the noise in model discussions. Instead of endless debates about benchmark scores, folks can just look at their GPU specs and find what actually runs on their setup. Way more useful than the usual “just rent cloud compute” suggestions that pop up everywhere.