Estimating AI Training Compute in FLOP

Comparing AI models is hard when they use different hardware, run for different durations, and rely on different architectures. One way to put them on a common footing is to estimate the compute used to train each model, measured in floating point operations (FLOP). Epoch AI describes this measure as the FLOP required to train the final version of a system, and outlines two practical methods for estimating it at https://epoch.ai/blog/estimating-training-compute.

Counting Operations

The first method estimates compute directly from the work done during training. The basic relationship is the operations per forward pass plus the operations per backward pass, multiplied by the number of passes, where the number of passes equals the number of epochs times the number of training examples.

Epoch AI estimates the backward pass at roughly twice the cost of the forward pass, a 2:1 ratio. Combining the forward pass with a backward pass costing twice as much gives a factor of three. This produces the heuristic that training compute is approximately equal to the operations per forward pass times three, times the number of epochs, times the number of examples.

To estimate the forward pass itself, the article notes a shortcut: the FLOP for a forward pass is approximately equal to twice the number of parameters, or more precisely twice the number of connections in the model. Epoch AI is explicit that this is a heuristic and that the result can be off depending on the architecture involved.

Hardware Time and Utilization

The second method works backward from the hardware used. The estimate is the training time multiplied by the number of cores, the peak FLOP per second of those cores, and a utilization rate.

The utilization rate matters because hardware datasheets list theoretical peak performance that real training runs rarely reach. Epoch AI recommends a utilization rate of about 0.3 for large language models and about 0.4 for other networks. The main drawback of this approach is that it does not account for which computing hardware was used, since newer accelerators deliver more FLOP per unit of time than older ones.

Choosing a Method

Epoch AI recommends defaulting to the operations-counting method because it is more exact, given that GPU utilization is difficult to estimate. Where possible, running both methods serves as a sanity check. In the comparison reported by Epoch AI, estimates from the two methods differed by no more than a factor of 1.7.

Neither method claims perfect precision. The value of estimating training compute is that it converts very different training setups into a single number that can be reasoned about and compared, rather than relying on parameter count alone, which ignores how much training a model actually received.

Estimating AI Training Compute in FLOP

Estimating AI Training Compute in FLOP

Counting Operations

Hardware Time and Utilization

Choosing a Method

Related Tips

Amazon Connect to Teams: AI-First Support Integration

MiniCPM5-1B Runs AI Models on Older Smartphones

NVIDIA AI-Q Blueprints on Oracle Cloud Deploy