LMArena: Crowdsourced AI Model Battle Platform

LMArena, hosted at lmarena.ai, lets people compare large language models head to head and vote on which response they prefer. The platform is powered by FastChat, an open project maintained by LMSYS and described in its repository at https://github.com/lm-sys/FastChat as “an open platform for training, serving, and evaluating large language model based chatbots.”

How the battles work

The core idea is a side-by-side battle. A person enters a prompt, two models answer the same question, and the person picks the better response. These votes are collected and aggregated into an Elo leaderboard, the same style of rating system used to rank players in games like chess. According to the FastChat repository, Chatbot Arena has “served over 10 million chat requests for 70+ LLMs,” which gives a sense of how much human feedback the rankings draw on.

Collecting many human votes is the point of the approach. Rather than scoring models against a fixed set of test questions, the arena gathers preferences from real prompts that people actually want answered. The FastChat project has also released a dataset described as “33k conversations with human preferences,” which can be used to study how people judge model responses.

Running it locally

FastChat is the open-source codebase behind the arena, and it can be run on a local machine. The repository documents two main paths.

To launch the side-by-side battle interface against hosted model endpoints, the project uses a configuration file listing the models and a single server command:

python3 -m fastchat.serve.gradio_web_server_multi --register-api-endpoint-file api_endpoint.json

The endpoint configuration supports several providers, including OpenAI, Anthropic, Gemini, and Mistral models.

To serve a model locally instead, FastChat splits the work across three components: a controller, one or more model workers, and a Gradio web server. The repository gives this example:

python3 -m fastchat.serve.controller
python3 -m fastchat.serve.model_worker --model-path lmsys/vicuna-7b-v1.5
python3 -m fastchat.serve.gradio_web_server

Installation is handled through pip with pip3 install "fschat[model_worker,webui]". The project is released under the Apache-2.0 license, so the code can be inspected, modified, and reused.

Why the approach matters

Fixed benchmarks have a known weakness: models can be tuned to do well on a specific test set without necessarily being better in everyday use. Human preference voting on open-ended prompts sidesteps some of that problem by measuring what people actually prefer across a wide range of questions.

FastChat also serves a second role beyond ranking. The same repository is the release home for Vicuna and includes code for training and serving chatbots, along with OpenAI-compatible APIs for the models it serves. That makes it both an evaluation tool and a practical way to run and compare models, whether hosted commercial systems or open-source releases.

For anyone trying to decide between models, the leaderboard offers a community-driven signal rather than a vendor’s own marketing numbers, and the open codebase means the methodology can be examined directly.

LMArena: Crowdsourced AI Model Battle Platform

LMArena: Crowdsourced AI Model Battle Platform

How the battles work

Running it locally

Why the approach matters

Related Tips

Inkling: Mira Murati's Conversational AI Model

Loading Kimi K3: China's Coding-Focused LLM

Amazon Connect to Teams: AI-First Support Integration