Claude's New Thinking Powers: How Extended Reasoning and Search Makes AI Smarter

Updated: March 22 2025 11:00

Claude 3.7 Sonnet has introduced capabilities that fundamentally change how AI models approach complex problems. Three distinct but complementary features, the Extended Thinking Mode, the "think" tool, and the Search, are revolutionizing Claude's problem-solving abilities. Let's explore how these innovations work and what they mean for users and developers.


The Power of Pause: Understanding Extended Thinking Mode

Imagine you're faced with a difficult problem—maybe it's a complex math equation or debugging tricky code. What do you do? You don't answer immediately; you pause and think it through carefully. This natural human approach to problem-solving is what Anthropic has now built into Claude 3.7 Sonnet with its Extended Thinking Mode.

Released in February 2025, this capability allows Claude to allocate additional time and computational resources to challenging questions. Unlike previous AI models that were forced to generate responses with fixed computational budgets, Claude can now "think longer" about problems that require deeper analysis.
What makes this particularly fascinating is that users can actually see Claude's thought process unfold in real-time. This visible reasoning reveals how Claude approaches problems—exploring different angles, catching potential mistakes, and refining its understanding before delivering a final answer.

The impact is substantial—Claude shows dramatic improvements in mathematical reasoning, scoring impressively on the 2024 American Invitational Mathematics Examination. As Anthropic's research demonstrates, accuracy improves logarithmically with the number of "thinking tokens" Claude is allowed to use, showing clear benefits to this approach.

Thinking in Parallel: How Claude Multiplies Its Intelligence

But Anthropic didn't stop at sequential thinking. Their researchers have also experimented with parallel thinking approaches, where Claude generates multiple independent thought processes simultaneously and selects the best one.

Using techniques like majority voting (selecting the most common answer) or employing a separate model to evaluate the quality of different solutions, Claude achieves remarkable results. In the challenging GPQA evaluation covering physics, biology, and chemistry, parallel thinking allowed Claude to reach an astonishing 84.8% overall score and a 96.5% score on physics questions.


This approach is similar to how a group of experts might tackle a problem together, bringing diverse perspectives and approaches to find the optimal solution. While this parallel thinking capability isn't yet available in the deployed model, it shows promising directions for future enhancements.

The "Think" Tool: Stopping to Reflect During Complex Tasks

Distinct from Extended Thinking Mode is the new "think" tool, designed specifically for complex, multi-step tasks where Claude needs to process external information throughout its problem-solving journey.

While Extended Thinking happens before Claude starts generating a response, the "think" tool creates dedicated space for structured thinking during the response generation process itself. This proves particularly valuable when Claude needs to:

  • Analyze outputs from previous tool calls
  • Navigate policy-heavy environments with detailed guidelines
  • Make sequential decisions where each step builds on previous ones

Testing on τ-Bench, a comprehensive benchmark for customer service scenarios, showed dramatic improvements with the "think" tool. In the airline domain, which involves complex policy adherence, the tool delivered a 54% relative improvement in performance.


The best performance in the airline domain was achieved by pairing the “think” tool with an optimized prompt that gives examples of the type of reasoning approaches to use when analyzing customer requests. Below is an example of the optimized prompt:


## Using the think tool

Before taking any action or responding to the user after receiving tool results, use the think tool as a scratchpad to:
- List the specific rules that apply to the current request
- Check if all required information is collected
- Verify that the planned action complies with all policies
- Iterate over tool results for correctness

Here are some examples of what to iterate over inside the think tool:

User wants to cancel flight ABC123
- Need to verify: user ID, reservation ID, reason
- Check cancellation rules:
* Is it within 24h of booking?
* If not, check ticket class and insurance
- Verify no segments flown or are in the past
- Plan: collect missing info, verify rules, get confirmation



User wants to book 3 tickets to NYC with 2 checked bags each
- Need user ID to check:
* Membership tier for baggage allowance
* Which payments methods exist in profile
- Baggage calculation:
* Economy class × 3 passengers
* If regular member: 1 free bag each → 3 extra bags = $150
* If silver member: 2 free bags each → 0 extra bags = $0
* If gold member: 3 free bags each → 0 extra bags = $0
- Payment rules to verify:
* Max 1 travel certificate, 1 credit card, 3 gift cards
* All payment methods must be in profile
* Travel certificate remainder goes to waste
- Plan:
1. Get user ID
2. Verify membership level for bag fees
3. Check which payment methods in profile and if their combination is allowed
4. Calculate total: ticket price + any bag fees
5. Get explicit confirmation for booking


Claude Can Now Search the Web

In a major update released on March 20, 2025, Claude gained the ability to search the internet to provide more up-to-date and relevant responses. This enhancement significantly expands Claude's knowledge beyond its training cutoff, giving it access to the latest events and information.


What makes Claude's web search particularly valuable is how it processes and delivers information. Rather than simply returning a list of search results, Claude analyzes web content and integrates it into conversational responses, complete with direct citations for easy fact-checking. This new capability is already transforming how different professionals leverage Claude:

  • Sales teams can transform account planning by analyzing real-time industry trends to identify key initiatives and pain points, driving higher win rates through more informed conversations with prospects.
  • Financial analysts can assess current market data, earnings reports, and industry trends to make better investment decisions and inform financial model assumptions.
  • Researchers can build stronger grant proposals and literature reviews by searching across primary sources, spotting emerging trends and identifying gaps in the current literature.
  • Shoppers can compare product features, prices, and reviews across multiple sources to make more informed purchase decisions.

Strategic Implementation: Getting the Most from Claude's Thinking Abilities

For developers looking to leverage these new capabilities, implementation guidance differs based on the specific use case:

For Extended Thinking:
  • Best suited for standalone complex reasoning tasks
  • Valuable for math, physics, and coding problems
  • Useful for non-sequential tool calls or straightforward instruction following

For the "think" tool:
  • Most beneficial in multi-step problem-solving scenarios
  • Critical for policy compliance in complex domains
  • Enhanced with domain-specific examples in prompts
  • Most effective when placed in system prompts rather than tool descriptions

For web search integration:
  • Most valuable for queries requiring current information
  • Helpful when comparing data from multiple sources
  • Powerful when combined with extended thinking for comprehensive analysis

Safety Considerations: The Balance of Transparency and Security

While the visible thought process provides valuable transparency, Anthropic has identified several potential concerns:

  • The thought process appears more detached and less personal than Claude's standard outputs
  • There are questions about "faithfulness"—whether the displayed thoughts truly represent the model's internal reasoning
  • Security risks exist, as malicious actors might use visible thought processes to develop better jailbreaking strategies

To address safety concerns, Anthropic has implemented encryption for portions of thought processes that might include potentially harmful content. This encryption doesn't prevent Claude from thinking about sensitive topics when necessary but keeps certain content hidden from users.

Real-World Applications: Beyond Mathematical Reasoning

The benefits of Extended Thinking Mode extend far beyond abstract problem-solving. In computer use tasks, Claude can now issue virtual mouse clicks and keyboard presses to solve tasks on a user's behalf with improved results.

Perhaps most surprisingly, these capabilities dramatically improved Claude's performance in playing Pokémon Red. While previous versions got stuck early in the game, Claude 3.7 Sonnet successfully navigated far into the game, defeating three Gym Leaders—a task requiring long-term planning, memory, and strategic decision-making.

While playing Pokémon might seem frivolous, it demonstrates Claude's enhanced ability to maintain focus and accomplish open-ended goals—capabilities that translate directly to real-world applications in various domains.


As Anthropic continues to refine these capabilities, the visible thought process in Claude 3.7 Sonnet remains a research preview, with the team weighing the pros and cons for future releases.

What's clear is that giving AI models the ability to "think longer," "think better," and access current information marks a fundamental shift in how we approach artificial intelligence. By incorporating more human-like reasoning processes—taking time to consider problems from multiple angles, reflect on intermediate steps, and incorporate the latest information—Claude is becoming a more capable partner for tackling complex challenges.

Claude 3.7: Claude 3.7 Sonnet now at Claude.ai
Claude API: Claude 3.7 Anthropic API
Blog: New Anthropic Engineering Blog

Recent Posts