Reasoning Effort, Explained: Why "Always Use Max" Is Costing You (and Sometimes Hurting Your Answers)

There is a habit almost every heavy AI user falls into: when a model offers a "reasoning effort" or "extended thinking" setting, they crank it to the top and leave it there. The logic feels obvious, more thinking means better answers, so why not always think the most? It feels like the safe default. It is not. On most everyday tasks, maximum reasoning effort burns far more tokens, adds noticeable latency, and on genuinely simple tasks it can make the answer worse, not better.

This guide explains what reasoning effort actually does, gives you a simple rule for choosing it, walks through general, technical, and coding examples, and busts the five myths that keep people stuck on "max." It was written June 15, 2026 against current Anthropic, OpenAI, and Google documentation plus published research on reasoning-model behavior.

Quick answer

Reasoning effort controls how much the model "thinks" before answering. Higher effort can help on hard problems and does nothing, or hurts, on simple ones.
It is not free: thinking tokens are billed at the same rate as output tokens, and accuracy improves only logarithmically, so the cost climbs much faster than the benefit.
Default to the model's medium setting. Escalate to high only when you can see the answer is actually wrong or too shallow, not preemptively, "just in case."
Match effort to the shape of the task: recall and reformatting need almost none; multi-step reasoning, architecture, and hard debugging are where high effort earns its keep.

What "reasoning effort" actually is

Modern AI models can do two different kinds of work. One is fast retrieval and transformation, pulling a fact, reformatting text, classifying something, rewriting a sentence. The other is slow deliberation, working through a multi-step problem, weighing tradeoffs, planning, or debugging. "Reasoning effort" (OpenAI and Codex), "extended thinking" (Claude), and "thinking budget" (Gemini) are all knobs that control how much of that slow, deliberate work the model does before it answers.

Under the hood, higher effort means the model generates more internal "thinking" tokens, a private chain of reasoning, before producing your visible answer. Those thinking tokens are where the cost and the latency come from, and they are the part most people do not see. The model looks like it just took a little longer; what actually happened is it produced thousands of hidden tokens you paid for.

The one rule that fixes most of this

Here is the rule worth memorizing: match reasoning effort to the shape of the task, not to how important it feels. Importance makes people reach for "max", this matters, so think hard. But importance and difficulty are different things. A critical email still only needs the model to write clearly; it does not need a thousand tokens of deliberation. The question to ask is not "how important is this?" but "does this task require step-by-step reasoning, or just recall and transformation?"

If the task is recall or transformation, look something up, extract, classify, reformat, translate, rewrite, use minimal or low effort. Deliberation adds nothing.
If the task is light analysis, a short comparison, a quick judgment call, a small planning step, use low to medium.
If the task is genuinely hard, multi-step logic, architecture, novel problem-solving, debugging something subtle, that is where high (or xhigh) earns the extra cost.

And one practical default that flips most people's instinct: start low and escalate on evidence, instead of starting high "just in case." OpenAI's own guidance for its reasoning models says to use the lowest reasoning effort that still gives acceptable quality, and to raise it only when you can measure a real quality gain that justifies the latency and cost. That is the opposite of the default-to-max habit.

Examples: the same model, three effort levels

General / everyday tasks

Minimal effort: "Reformat this list alphabetically." "Fix the grammar in this paragraph." "Summarize this email in two sentences." There is nothing to reason about, these are transformations. High effort just adds cost and delay.
Low to medium effort: "Which of these two subject lines is stronger and why?" "Draft a polite decline to this meeting request." A little judgment, not a research project.
High effort: "We are deciding whether to expand into the German market next quarter. Here is our revenue mix, headcount, and three constraints, walk through the tradeoffs and recommend a path." This is genuine multi-factor reasoning. Here, more thinking helps.

Technical / data tasks

Minimal effort: "Extract every email address from this text." "Convert this JSON to CSV." "Classify these 50 support tickets into the five categories I gave you." Deterministic work, the answer is mechanical, not deliberative.
Medium effort: "Here is a slow SQL query and the schema, suggest two likely causes." A bounded diagnostic ask.
High effort: "Design a rate-limiting strategy for our API that survives burst traffic, works across three regions, and degrades gracefully under load." Architecture with real tradeoffs, exactly what reasoning budget is for.

Coding tasks

Coding is where this matters most, because reasoning effort is exposed directly in tools like Claude Code (Low / Medium / High / Max) and the Codex CLI (the model_reasoning_effort setting, from none up to xhigh). It is also where people burn the most tokens by leaving it pinned high.

Low / minimal effort: rename a variable across a file, add a docstring, write a unit test for a pure function, apply a mechanical refactor. Single-file, well-defined, mechanical, Codex guidance explicitly recommends low here.
Medium effort: typical feature work and interactive coding sessions. This is the recommended default for most day-to-day coding.
High / xhigh effort: debug an intermittent race condition in an async queue, untangle a circular dependency across modules, plan a non-trivial refactor, or run a security audit. These genuinely need deep, multi-step reasoning, reserve your most expensive setting for them.

The pattern across all three domains is identical: the majority of real tasks are recall and transformation, which need little or no reasoning, and a minority are hard problems where high effort pays for itself. Defaulting everything to max means you pay the hard-problem price on every easy problem.

Five myths about reasoning effort

Myth 1: "Higher reasoning always means higher quality."

It does not. Accuracy from extended thinking improves logarithmically with the number of thinking tokens, steep gains at first, then quickly flattening. And on simple tasks there is a documented failure mode researchers call "overthinking," where reasoning models apply unnecessarily long chains to easy problems and get diminishing or even negative returns. Longer reasoning does not equal better performance; the right amount depends on the task.

Myth 2: "Reasoning is basically free."

Thinking tokens are billed at the same rate as output tokens. That means a task that burns 4,000 thinking tokens before a 500-token answer costs roughly nine times what the bare answer would. Multiply that across hundreds of daily requests and "always on max" quietly becomes one of the largest line items in an AI bill, and the one people notice last, because the thinking is hidden.

Myth 3: "More thinking makes it more accurate, even on lookups."

For factual lookups and formatting, extended thinking changes the answer essentially not at all, it is pure overhead. Worse, on tasks the model would have gotten right quickly, long reasoning chains can introduce "self-doubt," where the model second-guesses a correct answer into a wrong one. More deliberation is not a free safety margin.

Myth 4: "Set a huge thinking budget to be safe."

Bigger budgets hit diminishing returns fast, and models often will not even use the whole allocation, Anthropic notes Claude may not consume large thinking budgets, especially above about 32k tokens. You still pay in latency for the headroom you reserved. Anthropic's own advice is to start at the minimum budget and increase incrementally until quality stops improving.

Myth 5: "Reasoning effort is only a developer / API thing."

The knob is most explicit in APIs and coding tools, but the same choice shows up in consumer apps: Claude's extended-thinking toggle, ChatGPT's thinking modes and model picker, Gemini's thinking models. Every time you pick a "thinking" model for a one-line question, you are making the same default-to-max mistake, just without seeing the token meter.

Per-model cheat sheet (2026)

OpenAI GPT-5.x and Codex: a reasoning_effort parameter with levels none, minimal, low, medium, high, and xhigh. GPT-5.5 defaults to medium. In the Codex CLI, set model_reasoning_effort (and plan_mode_reasoning_effort for planning) in config.toml. Guidance: low for mechanical single-file work, medium as default, high for planning/architecture/hard debugging, xhigh for the hardest autonomous tasks.
Anthropic Claude: extended thinking with a token budget (minimum 1,024). Thinking bills at the output-token rate. Newer models support adaptive thinking, where Claude decides how much to think per request instead of a fixed budget. Claude Code exposes effort as Low / Medium / High / Max.
Google Gemini: a thinking budget you can set in tokens, set to dynamic (-1) to let the model decide, or turn off (0) on Flash-class models for the cheapest, fastest path.
Consumer apps (ChatGPT, Claude, Gemini): fewer knobs, same principle, do not pick a heavy "thinking" model for a task that is really just recall or formatting.

A 10-second checklist before you crank it to max

Is this recall, extraction, formatting, or rewriting? Use minimal or low.
Is this a quick judgment or small planning step? Use low to medium.
Does it genuinely need multi-step reasoning, architecture, or hard debugging? Now high (or xhigh) is worth it.
Did medium actually fail, or did it just feel risky? Escalate on evidence, not on nerves.
Are you about to run this hundreds of times? The effort level is now a budget decision, not just a quality one.

The point is not to be stingy with reasoning. It is to spend it where it changes the answer. Used well, high effort is a precision tool for hard problems. Used as a default, it is a quiet tax on every easy one.

Right effort, right prompt.

A clear prompt often does more than extra reasoning, the Prompting Hub has the patterns that pair with each effort level.

Go to Prompting Hub

FAQ

Does higher reasoning effort always give better answers?

No. Gains from extra thinking improve logarithmically and flatten quickly, and on simple tasks high effort can cause "overthinking" that adds cost and sometimes degrades accuracy. Higher effort helps on hard, multi-step problems and does little or nothing on recall and formatting.

How much does high reasoning effort actually cost?

Thinking tokens bill at the same rate as output tokens, so a task that uses 4,000 thinking tokens before a 500-token answer costs roughly nine times the bare answer. Across many requests, defaulting to maximum effort is often one of the biggest and least-visible drivers of an AI bill.

What reasoning effort should I use by default?

Start at the model's medium setting (GPT-5.5 and most Codex sessions default to medium) and escalate to high only when you can see the answer is wrong or too shallow. For recall, extraction, and formatting, drop to minimal or low.

Is reasoning effort the same as the model I pick?

Related but not identical. Model choice sets the ceiling on capability; reasoning effort controls how much that model deliberates per request. You can run a top model at low effort for easy tasks, and that is usually the efficient move.

How do I know if a task needs high reasoning?

Ask whether it requires step-by-step reasoning, multi-factor decisions, architecture, novel problem-solving, subtle debugging, versus recall and transformation. The former benefits from high effort; the latter does not. When unsure, run medium first and only escalate if the answer falls short.