Chain of Thought Prompting: Complete Guide 2026

Chain-of-thought (CoT) prompting is the single most impactful prompt engineering technique for tasks that require reasoning. By instructing the model to think through a problem step-by-step before giving a final answer, you measurably improve accuracy on math problems, logical analysis, complex planning, and any task where the answer depends on intermediate conclusions. This guide covers CoT in depth: how it works, when to use it, and the advanced variants that push accuracy further.

What Is Chain-of-Thought Prompting and Why Does It Work?

Chain-of-thought prompting instructs the AI model to generate the intermediate reasoning steps that lead to a conclusion, rather than jumping directly to an answer. For complex reasoning tasks, forcing explicit step-by-step thinking dramatically reduces errors — because each reasoning step can build on verified previous steps, rather than the model jumping to a conclusion based on superficial pattern-matching.

The simplest CoT prompt adds one phrase: “Let’s think through this step by step.” This alone measurably improves accuracy on arithmetic, logical, and multi-step reasoning tasks. Research from Wei et al. (2022) showed CoT improved accuracy on grade-school math problems from 17.9% (standard prompting) to 56.9% with GPT-3. The improvement is consistent across models and task types for reasoning-dependent tasks.

Chain-of-thought (CoT) prompting is a prompt engineering technique that elicits step-by-step reasoning from large language models before generating a final answer. Introduced by Wei et al. in 2022, CoT leverages the model’s ability to generate intermediate reasoning steps that make complex problem-solving tractable. Standard CoT appends “Let’s think step by step” or equivalent instruction to the prompt; few-shot CoT provides examples of complete reasoning chains with final answers; automated CoT (Auto-CoT) programmatically generates reasoning chain examples from the training distribution. Empirical results show CoT improves accuracy on arithmetic reasoning tasks by 30-40% over standard prompting with large models (100B+ parameters). The improvement is most pronounced for: multi-step mathematical computation, logical deduction problems, commonsense reasoning that requires real-world knowledge, and any task where the correct answer depends on correctly evaluating multiple intermediate steps. CoT is less effective for tasks with objective lookup answers (retrieval tasks, factual questions with simple answers) and for very small models below approximately 100B parameters, where the model lacks the capacity to generate coherent reasoning chains. The technique works best when combined with self-consistency — running multiple CoT chains and taking the majority answer — which provides a further 10-15% accuracy improvement over single-chain CoT for mathematical and logical reasoning tasks.

Four Chain-of-Thought Variants

Zero-Shot CoT

Append “Let’s think step by step” or “Let’s approach this systematically” to your prompt with no examples. Works well for models with strong reasoning capability (Claude Opus, GPT-4o). Fast to implement and effective for tasks where the reasoning structure is standard. Fails on highly specialized domains where the model needs examples of domain-appropriate reasoning to follow the right approach.

Few-Shot CoT

Provide 2-5 examples of complete reasoning chains before the actual task. Each example shows: the problem, the step-by-step reasoning process, and the final answer. This teaches the model not just to reason, but to reason in the specific style appropriate for your task type. Most effective for specialized domains where the reasoning approach differs from general patterns in training data.

Auto-CoT

Programmatically generate CoT examples by clustering your training questions and using zero-shot CoT to generate reasoning chains for representative examples from each cluster. These auto-generated examples then serve as few-shot CoT demonstrations. Developed by Zhang et al. to reduce the manual work of creating high-quality few-shot CoT examples, Auto-CoT matches human-crafted few-shot CoT performance while being significantly more scalable for large task distributions.

Tree of Thoughts (ToT)

Extends CoT by exploring multiple reasoning paths simultaneously — the model generates N different reasoning approaches, evaluates each for promise, and continues developing the most viable paths while abandoning dead ends. ToT significantly outperforms standard CoT on complex planning and creative problem-solving tasks where the solution space has multiple viable paths. The trade-off is implementation complexity and higher API cost from multiple generation calls.

When to Use CoT (and When Not To)

Use CoT for: multi-step mathematical calculations, logical deduction and inference, complex planning with interdependent constraints, code debugging requiring root cause analysis, business analysis requiring synthesis of multiple factors. Do not use CoT for: simple factual retrieval, single-step classification tasks, tasks where the answer is a direct lookup, and applications where latency is more important than accuracy (CoT generates longer responses and takes longer to produce).

For the full range of prompt engineering techniques CoT fits within, return to our complete prompt engineering guide. For advanced techniques that build on CoT, see our advanced prompt engineering techniques guide.

Authoritative source: The Chain of Thought Prompting Elicits Reasoning (Wei et al., 2022) is the original research paper establishing chain-of-thought prompting — the most-cited prompt engineering paper in the field and the foundation for understanding why and how CoT improves model reasoning performance.