LM Arena: Blind Model Comparisons with Elo Rankings
LM Arena at lmarena.ai runs blind head-to-head model comparisons with Elo ratings, helping developers pick models based on actual performance rather than marketing.
Someone found a useful way to compare AI models without relying on cherry-picked benchmarks.
LM Arena at https://lmarena.ai runs blind head-to-head comparisons - you submit a prompt to two anonymous models, vote on which response is better, and rankings update based on thousands of evaluations.
How to use it:
- Go to https://lmarena.ai/leaderboard
- Filter by category: coding, creative writing, reasoning, etc.
- Check the confidence intervals - some rankings are tighter than others
For testing locally:
huggingface-cli download Qwen/Qwen2.5-72B-Instruct
Other useful resources:
- Speed/cost benchmarks: https://artificialanalysis.ai
- Trending models: https://huggingface.co/models?sort=trending
The Elo system means rankings reflect actual user preferences rather than synthetic benchmarks. Useful for cutting through marketing claims when picking between similar models.
Related Tips
Verity: Local AI Search Engine Like Perplexity
Verity is a local AI search engine that runs entirely on a user's device, providing privacy-focused searches similar to Perplexity without sending data to
ACE-Step 1.5: Free Local Music AI Rivals Suno v4/v5
ACE-Step 1.5 is an open-source music generation AI model that runs locally on consumer hardware, offering quality comparable to commercial services like Suno
MOVA: Open-Source Synchronized Video & Audio Gen
MOVA is an open-source framework that generates synchronized video and audio content simultaneously, enabling coherent multimodal media creation through