Evolution Beats Backprop for LLM Fine-Tuning

What It Is

Evolutionary strategies (ES) for language model fine-tuning replace traditional backpropagation with a surprisingly simple approach: add random noise to model parameters, evaluate which perturbations improve performance, and update the model accordingly. Instead of computing gradients through the entire network, this method generates roughly 30 random Gaussian perturbations of the model weights, tests each variant, and moves the parameters in the direction that produced better results.

The technique works by treating model optimization as a black-box problem. Each perturbation creates a slightly different version of the model, and the algorithm measures which versions perform better on the target task. The parameter updates come from averaging the successful perturbations, weighted by their performance scores. No backward pass, no gradient tape, no chain rule calculations.

Research documented at https://arxiv.org/abs/2509.24372 demonstrates that this approach can outperform Group Relative Policy Optimization (GRPO) on reinforcement learning from verifiable rewards (RLVR) tasks. The implementation at https://github.com/Green0-0/propagate includes LoRA support and pass@k training capabilities, making it practical for real-world applications.

Why It Matters

This development challenges fundamental assumptions about how neural networks must be trained. Backpropagation has dominated deep learning for decades because it efficiently computes exact gradients. The fact that random perturbations can match or exceed its performance on certain tasks suggests that gradient precision may be overrated for some applications.

Teams working with reinforcement learning tasks benefit most immediately. RLVR problems often involve sparse rewards and complex optimization landscapes where traditional gradient descent struggles. Evolutionary strategies naturally handle these scenarios because they explore the parameter space more broadly rather than following local gradient information.

Memory-constrained environments gain significant advantages. Backpropagation requires storing activations for the backward pass, which can consume substantial GPU memory. Evolutionary approaches only need to evaluate forward passes, cutting memory requirements roughly in half. This makes fine-tuning larger models feasible on consumer hardware.

The zero overfitting characteristic matters for practitioners dealing with small datasets. Traditional gradient-based methods can memorize training examples when data is limited. The stochastic nature of evolutionary strategies provides implicit regularization, helping models generalize better.

Getting Started

The propagate repository provides a straightforward implementation. Developers can clone it and start experimenting:


trainer = EvolutionaryTrainer(
 model=base_model,
 num_perturbations=30,
 noise_std=0.01,
 use_lora=True
)

trainer.train(
 train_dataset=dataset,
 eval_metric=reward_function,
 num_iterations=1000
)

The key parameters include num_perturbations (typically 30-50), noise_std (controlling perturbation magnitude), and the evaluation metric. For LoRA-based training, the method only perturbs the low-rank adapter weights rather than the full model, dramatically reducing computational costs.

Training speed depends on how quickly the model can be evaluated. Since evolutionary strategies require multiple forward passes per iteration, tasks with fast inference benefit most. The lack of backward passes often compensates for the multiple evaluations, especially on models where backpropagation is the bottleneck.

Context

Evolutionary strategies aren’t new - they’ve been used in robotics and game playing for years. What’s novel is their competitive performance on language model fine-tuning, a domain where gradient-based methods seemed unbeatable.

Compared to standard supervised fine-tuning, evolutionary approaches work best when the loss landscape is noisy or when rewards are sparse. For straightforward supervised learning with abundant labeled data, backpropagation remains more sample-efficient. The technique shines in reinforcement learning scenarios where traditional methods struggle.

Limitations include higher variance in training dynamics and potentially slower convergence on smooth optimization surfaces. The method also requires careful tuning of perturbation magnitude - too small and progress stalls, too large and the search becomes random walk.

Alternative gradient-free methods like zeroth-order optimization exist, but evolutionary strategies often prove more robust in practice. The population-based approach naturally parallelizes across multiple GPUs, making it attractive for distributed training setups.

For teams hitting memory walls with standard fine-tuning or struggling with RL tasks, evolutionary strategies offer a genuinely different approach worth testing. The tradeoff between computational precision and practical performance may favor evolution more often than conventional wisdom suggests.

Evolution Outperforms Backprop in LLM Fine-Tuning

Evolution Beats Backprop for LLM Fine-Tuning

What It Is

Why It Matters

Getting Started

Context

Related Tips

Skyfall 31B v4.2: Uncensored Roleplay AI Model

CoPaw-Flash-9B Matches Larger Model Performance

Intel Arc Pro B70: 32GB VRAM AI Workstation GPU at $949