Chain-of-Thought Prompting: Teaching LLMs to Think Out Loud

· 9 min

“Chain-of-thought isn’t magic. It’s just the model thinking out loud because you asked nicely.”

Models That Think — or At Least Try To

Let’s face it — language models are great at sounding smart.

But sounding smart and thinking through a problem? Not the same thing.

That’s where chain-of-thought prompting (CoT) comes in. It’s a clever way to help LLMs not just answer a question, but show their reasoning step by step.

What’s the Problem With Just Asking?

Take this:

Q: If I have 3 apples and give away 2, how many do I have left?
A:

You might get the correct answer.

But give it a slightly harder question like:

**Prompt:**
Q: A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?
Please explain your reasoning step by step.

**LLM Response:**
Let’s think step by step. Let x be the cost of the ball. Then the bat costs x + $1.00. So:

x + (x + 1.00) = 1.10
2x + 1.00 = 1.10
2x = 0.10
x = 0.05

So the ball costs $0.05.

Boom. By prompting the model to “think out loud,” we pushed it into structured reasoning.

Where CoT Shines

Chain-of-thought prompting is especially useful for:

It helps with any task where the answer isn’t a fact — it’s a conclusion.

đź§© Building Better Prompts With CoT

While the magic phrase “Let’s think step by step” helps, it’s not a silver bullet. For robust results, especially with complex reasoning tasks, consider these techniques in more depth:

Zero-shot CoT

This involves adding a reasoning instruction like “Please explain your reasoning step by step” to a single question prompt, without giving any prior examples.

**Prompt:**
Q: If it takes 5 machines 5 minutes to make 5 widgets, how long would it take 100 machines to make 100 widgets?
Please explain your reasoning step by step.

**LLM Response:**
Let’s think step by step. 5 machines take 5 minutes to make 5 widgets, which means 1 machine makes 1 widget in 5 minutes. So 100 machines can make 100 widgets in 5 minutes.

This works well when the model has been trained on similar reasoning tasks and you’re just nudging it to show its work.

Few-shot CoT

Instead of one prompt, show the model a few examples of structured reasoning — then ask your real question.

Q: Mary has 3 times as many apples as Tom. Tom has 4 apples. How many apples does Mary have?
Please explain your reasoning step by step.
A: Tom has 4 apples. Mary has 3 Ă— 4 = 12 apples. So Mary has 12 apples.

Q: A train travels at 60 km/h for 2 hours. How far does it go?
Please explain your reasoning step by step.
A: Distance = speed Ă— time = 60 Ă— 2 = 120 km. So the train travels 120 km.

Q: A book costs $15 and you buy 3 of them. How much do you spend?
Please explain your reasoning step by step.
A:

Few-shot CoT tends to perform better than zero-shot, especially in multi-hop or numerical tasks.

Auto-CoT (Automatic Chain of Thought)

In Auto-CoT, you let the model generate its own few-shot examples from a cluster of similar tasks. Then you use those examples as the few-shot context for future prompts.

It looks something like this:

Step 1: Ask the model to generate step-by-step reasoning examples from task templates.

Generate a few step-by-step reasoning examples for arithmetic word problems.

Step 2: Use those examples as few-shot context in future prompts.

Q: John has 3 pencils and buys 2 more. How many pencils does he have?
Please explain your reasoning step by step.
A: John has 3 pencils. He buys 2 more. Total = 3 + 2 = 5 pencils.

Q: A bag contains 6 apples. You eat 2. How many apples are left?
Please explain your reasoning step by step.
A: There were 6 apples. 2 were eaten. Remaining = 6 - 2 = 4 apples.

Q: Alice has 2 cats and buys 4 more. How many cats does she have now?
Please explain your reasoning step by step.
A:

This technique is useful for automating reasoning examples at scale, especially when you don’t want to handwrite demonstrations.

There are no strict rules, but here’s what usually works:

What Can You Combine It With?

Chain-of-thought prompting plays well with others. Pair it with:

When Not to Reach for CoT

Don’t use chain-of-thought prompting when:

Verbose reasoning might be helpful for clarity but costly or unnecessary in these cases. Sometimes, too much thinking is overthinking.

đź§µ Final Thread

Chain-of-thought prompting doesn’t make the model smarter — it makes the prompt smarter.

It’s not about teaching LLMs to reason like humans — it’s about nudging them toward reliable patterns of reasoning. Some models do this better than others.

And they’re getting better. Each generation of LLMs is refining its ability to follow reasoning cues, generate intermediate steps, and even self-correct.

Prompts like these won’t just help you get answers — they help you understand how those answers are formed.

“To teach a model to reason, don’t shout the answer — walk it there.”