Self-Consistency Improves Chain of Thought Reasoning in Language Models

TLDR

Chain‑of‑thought prompting with large language models has shown promise on complex reasoning tasks, and the notion that such problems admit multiple reasoning paths motivates the self‑consistency approach. The paper proposes self‑consistency as a decoding strategy to replace greedy decoding in chain‑of‑thought prompting. Self‑consistency samples diverse reasoning paths and selects the most consistent answer by marginalizing over them. Empirical results demonstrate that self‑consistency improves chain‑of‑thought prompting by 17.9% on GSM8K, 11.0% on SVAMP, 12.2% on AQuA, 6.4% on StrategyQA, and 3.9% on ARC‑challenge.

Abstract

Chain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks. In this paper, we propose a new decoding strategy, self-consistency, to replace the naive greedy decoding used in chain-of-thought prompting. It first samples a diverse set of reasoning paths instead of only taking the greedy one, and then selects the most consistent answer by marginalizing out the sampled reasoning paths. Self-consistency leverages the intuition that a complex reasoning problem typically admits multiple different ways of thinking leading to its unique correct answer. Our extensive empirical evaluation shows that self-consistency boosts the performance of chain-of-thought prompting with a striking margin on a range of popular arithmetic and commonsense reasoning benchmarks, including GSM8K (+17.9%), SVAMP (+11.0%), AQuA (+12.2%), StrategyQA (+6.4%) and ARC-challenge (+3.9%).