Framework for Choosing LLMs by Hardware Constraints

Someone put together a nice framework for picking open-source LLMs based on actual hardware constraints instead of just going by parameter count.

The breakdown:

Unlimited tier - >128GB VRAM (think server setups or multi-GPU rigs)
Medium tier - 8-128GB VRAM (solid desktop GPUs, some laptops)
Small tier - <8GB VRAM (most consumer hardware)

The thinking here is pretty practical - you probably need different models for different tasks anyway, so why not organize recommendations by what hardware people actually have? Someone running a 4060 with 8GB isn’t getting much value from “this 70B model is amazing” advice.

Turns out this cuts through a lot of the noise in model discussions. Instead of endless debates about benchmark scores, folks can just look at their GPU specs and find what actually runs on their setup. Way more useful than the usual “just rent cloud compute” suggestions that pop up everywhere.

Framework for Choosing LLMs by Hardware Constraints

Related Tips

Nvidia's DMS Cuts LLM Memory Usage by 8x

Unsloth Kernels: 12x Faster MoE Training, 12GB VRAM

Unsloth Kernels: Fine-Tune 30B MoE on Consumer GPUs