Evolutionary Model Merge Skips Backprop

Most ways of producing a capable language model rely on gradient-based training, where backpropagation adjusts billions of weights based on data. Sakana AI describes a different route in its work on Evolutionary Model Merge, an automated technique that uses evolutionary algorithms to discover good ways of combining existing open-source models into new ones. The approach is documented at https://sakana.ai/evolutionary-model-merge/.

Combining Models Instead of Training Them

Rather than training a model from scratch, the method starts from open models already available on platforms such as Hugging Face, which the authors note hosts over 500,000 models. Evolutionary search, inspired by natural selection, looks for the best way to merge several of these into a single new model with capabilities chosen by the user.

A point Sakana AI emphasizes is that the process needs no gradient-based training. The authors write that they were surprised the method could automatically produce new foundation models “without the need for any gradient-based training,” using relatively little compute. They add that backpropagation could be layered on afterward to improve results, but their goal was to show strong performance even without it.

This makes model merging distinct from fine-tuning. Fine-tuning adapts a model using a specialized dataset and gradient updates, while the merging approach searches over how to recombine models that already exist.

Two Spaces for Evolution

The technique operates in two complementary spaces. The first is the data flow space, where the search evolves combinations of layers drawn from different source models. The second is the parameter space, where it evolves ways of mixing model weights, allowing different mixing ratios at different layers. The two can also be combined in a single search.

For the math-focused language model EvoLLM-JP, the team merged three 7B models: Shisa-Gamma for Japanese, plus WizardMath and Abel for mathematics. The evolutionary search ran for roughly 100 to 150 generations, with the best candidate on the training set evaluated once on a held-out test set.

Benchmark Outcomes

EvoLLM-JP was evaluated on MGSM-JA, a Japanese variant of the GSM8K math benchmark, and on the Japanese lm-evaluation-harness covering nine averaged tasks. Sakana AI reports that the merged model exceeds the scores of all Japanese language models with fewer than 70B parameters, and even surpasses the previous 70B state-of-the-art Japanese model on these measures.

The same idea extended to other modalities. EvoVLM-JP merged LLaVa-1.6-Mistral-7B with Shisa Gamma 7B, which the authors describe as a first attempt at merging a vision-language model with a language model. It was tested on JA-VG-VQA-500 and JA-VLM-Bench-In-the-Wild, outperforming both its English base model and an existing Japanese vision-language model. A preliminary image-generation model, EvoSDXL-JP, was tuned to produce Japanese-capable output in only four diffusion steps.

Why the Direction Matters

The broader argument is economic and structural. Sakana AI frames evolutionary merging as a cost-effective alternative to training large models from scratch, and envisions a future built from a large collection of smaller systems that are recombined to gain new abilities. The research was supported by a NEDO grant from the Japanese government, and the underlying paper was accepted to Nature Machine Intelligence in January 2025. The methods are implemented in open-source tooling, including the mergekit and Optuna Hub projects.

Evolutionary Model Merge Skips Backprop

Evolutionary Model Merge Skips Backprop

Combining Models Instead of Training Them

Two Spaces for Evolution

Benchmark Outcomes

Why the Direction Matters

Related Tips

Auto-Rename Images with Vision Models & Reasoning

AI Diagrams: Chat-Generated, Fully Editable

M5 Max vs M3 Max: What the llama.cpp Data Shows