“Take a Deep Breath” Came From an AI Optimizer

The phrase “Take a deep breath and work on this problem step-by-step” circulated widely as a prompt-engineering trick for getting better answers from language models. The phrase did not come from a person experimenting with wording. It was produced by another language model that was searching for effective instructions, as described in a paper from researchers at Google DeepMind.

How the Phrase Was Found

The paper, titled “Large Language Models as Optimizers,” was written by Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V. Le, Denny Zhou, and Xinyun Chen, and was published at ICLR 2024. It is available at https://arxiv.org/abs/2309.03409.

The work introduces a method the authors call OPRO, short for Optimization by PROmpting. Instead of solving optimization problems with hand-written formulas, OPRO uses a language model to propose candidate solutions. At each step the model receives a prompt containing previously tried solutions along with their scores, and it generates new candidates intended to do better. Those candidates are evaluated, the results are fed back, and the loop repeats.

The authors first demonstrated the idea on classic optimization problems such as linear regression and the traveling salesman problem. They then applied the same loop to a different target: finding instructions that make a language model answer reasoning questions more accurately.

The Numbers on GSM8K

In the prompt-optimization experiments, one model acted as the optimizer that generated candidate instructions, and a separate model acted as the scorer that was tested with those instructions. In the configuration reported in the paper, PaLM 2-L-IT served as the optimizer and a pre-trained PaLM 2-L served as the scorer.

The benchmark was GSM8K, a set of grade-school math word problems. According to the paper’s results, the best instruction the optimizer discovered for that setup was “Take a deep breath and work on this problem step-by-step,” which reached 80.2 percent accuracy.

For comparison, the paper reports that the well-known instruction “Let’s think step by step” reached 71.8 percent, and using no instruction at all reached 34.0 percent. The accuracy figures therefore reflect one specific optimizer-scorer pairing and one benchmark rather than a universal ranking of phrases.

What the Result Suggests

The paper frames the finding as evidence that a language model can search the space of natural-language instructions and surface prompts that outperform familiar human-written ones. The discovered phrase was a byproduct of that search, not the central claim.

Beyond GSM8K, the authors report that OPRO-optimized prompts improved over a baseline by up to 8 percent on GSM8K and by up to 50 percent on Big-Bench Hard tasks, depending on the model and task. Those gains come from the optimization procedure across many tasks, not from any single phrase applied everywhere.

The practical takeaway is narrower than the viral version of the story. The exact wording that worked best emerged for a particular model and benchmark, so a phrase that helped one system on math problems will not automatically transfer to other models or other kinds of tasks. The more durable idea in the paper is the method itself: using a model to iteratively propose and evaluate instructions, with measured scores guiding each round.

"Take a Deep Breath" Came From an AI Optimizer

“Take a Deep Breath” Came From an AI Optimizer

How the Phrase Was Found

The Numbers on GSM8K

What the Result Suggests

Related Tips

Sampling Multiple Answers Improves LLM Reasoning

Qwen2-Audio Listens and Replies in Text

Inkling: Mira Murati's Conversational AI Model