I tried Claude Sonnet 5 with prompts that ask it to finish the job, not just answer the question and that's where the AI war is going
Date:
Thu, 02 Jul 2026 14:12:05 +0000
Description:
Claude Sonnet 5 shows that the next AI battle isnt about better chatbot answers its about which assistant can actually get work done.
FULL STORY ======================================================================Copy link Facebook X Whatsapp Reddit Pinterest Flipboard Threads Email Share this article 0 Join the conversation Follow us Add us as a preferred source on Google Newsletter Subscribe to our newsletter Anthropic has just released Claude Sonnet 5 for all users, and I wanted to test what it was good at. But the game has changed now. Sonnet 5 doesn't feel dramatically different from Gemini or ChatGPT if you ask it ordinary chatbot questions. Instead, the difference should show up when you stop asking for answers and start asking for completed work.
Anthropic says Sonnet 5 is built for "multi-step software engineering work," sustained coding, tool use, debugging, and "messy technical contexts." It
also says it can make plans, use browsers and terminals, and run more autonomously than smaller, cheaper models previously could. I'm not using Sonnet 5 for coding, but that doesn't mean I can't take advantage of its new abilities just like you can. So I stopped asking Claude for answers and started asking it to finish jobs, beginning with planning a trip to Bath, UK, for my family: my wife, me, and two teens. Latest Videos From Watch full
video here: A trip to Bath When I tested it, Claude Sonnet 5 defaulted to its Medium level of effort, so that's what I used. Here's the first prompt I tried:
"I want to test whether you can act more like an agent than a chatbot. You may like Claude Sonnet 5 is here, and it's the 'most agentic Sonnet model
yet' Everyones switching to Claude but the smartest AI might surprise you ChatGPT vs Claude for Instacart shopping is all about the checkout
My task is: Plan a weekend trip to Bath for two adults and two teenagers, including travel, lunch, one activity, estimated costs, and what still needs booking.
Don't just give me advice. First, make a brief plan. Then identify which parts of the task you can complete yourself right now, which parts require tools or information you don't have, and which parts need human judgment. Get daily insight, inspiration and deals in your inbox Sign up for breaking news, reviews, opinion, top tech deals, and more. Contact me with news and offers from other Future brands Receive email from us on behalf of our trusted partners or sponsors By submitting your information you agree to the Terms & Conditions and Privacy Policy and are aged 16 or over.
Then complete as much of the task as possible without stopping after the first obvious answer.
At the end, give me:
What you completed What to read next Meta AI highlights its social media origin when matched against ChatGPT I connected Claude to Gmail, and it actually saved me some time 'We love you, and we want you to win' OpenAI releases GPT-5.5 for ChatGPT
What still needs human action
Any assumptions you made
A short checklist I can use to verify the result
The next best step"
What I really liked was that, as Claude tackled this task, it gave me the option to be notified when it had finished. In reality, it only took a few seconds to come back with a plan, which included travel options, an
itinerary, and a suggestion for lunch and something to do: a trip to The
Roman Baths.
To my delight Claude gave me an interactive map showing where all the places it recommended were. It also gave me a useful list of what it had completed, what required human action, the assumptions it had made, a verification checklist, and a "next best step" action point. It felt ready to keep working with me as more details came in, rather than treating its first answer as final.
In fact, when I gave it more details, such as which day I was going to go, it gave me a visual weather report for the day. That was a really nice touch. Claude Sonnet 5 produced a handy map showing where to go. (Image credit: Anthropic) Claude vs ChatGPT I also tried this prompt with ChatGPT-5.5 Medium and got a similar result. It acted as an agent, just like Claude did, and notified me when it had finished its tasks. It just didn't look as nice.
There was no map, or any visual elements at all, and it felt more like I had been given a finished report than the start of a two-way conversation where
it asked me for more details.
Both chatbots recommended lunch and a trip to The Roman Baths. Interestingly, ChatGPT assumed Id get the train, while Claude assumed Id drive. They also recommended different places to eat, but the core information they both provided was solid.
What was most impressive was that both models could adapt when I reframed the inputs. For example, when I gave them the ages of the kids, student status, a different mode of transport, or changed the day of the trip, both models
could cope. Both also identified that since the oldest was a university student, he could get free entry to The Roman Baths.
This part of the test was probably the most meaningful, as it felt much more "multi-step" than simply providing one answer.
Overall, Id give this test to Claude. You can clearly see that Sonnet 5 is
set up for agentic actions. Neither Claude nor ChatGPT could actually do any of the booking for me at the moment, so we're still a long way from true personal-assistant-level autonomy. But for this kind of task, Claude
currently has the edge. A different domain I wanted to test the models in a different domain that would let Claude show me it had genuinely improved, and that the Bath trip result was not just a fluke of the travel-planning use case. So I asked them both to:
"Build me a simple household budget tracker as a spreadsheet or small tool."
Both models thought for a while about this task, and churned through various options before opting to make a spreadsheet. ChatGPT produced a spreadsheet with a bar chart that tracked how much Id spent on various household expenses against a budget. Claude, however, went for something simpler: dispensing
with a budget, it just tracked actual expenses and created a pie chart
showing where my money was going.
Claudes initial approach was simpler, and easier to understand. Both models provided a .xlsx file, but only Claude provided a button to upload it
straight to Google Drive so I could open it in Sheets.
I told ChatGPT, "I wanted the graph to be a pie chart," and it responded: "Absolutely Ill update the spreadsheet itself so the dashboard uses a pie chart for spending by category, rather than the current graph style."
It ran into a few problems because it was trying to show both the budget and actual values in the same pie chart, but eventually it worked out that it could show only one and produced a new spreadsheet that did exactly what I asked for.
I then asked Claude to change its spreadsheet to provide a budget section
too, and to change the graph into a bar chart. Again, it showed me its workings and added a budget section and bar charts perfectly.
I cant really separate the two AI models on this task. Both proved they can handle multi-step tasks well, and both were happy to revise the result when I changed the brief.
That, really, is the point. The most interesting AI tests now are not "which chatbot gives the best answer?" They are "which assistant keeps working until the job is actually done?"
On that front, Claude Sonnet 5 feels extremely capable. ChatGPT was close behind, and in some ways just as effective, but Claude felt more naturally organized around the idea of completing work rather than simply responding to prompts. It asked fewer invisible questions, presented its output more helpfully, and made the whole process feel more like collaborating with an assistant than interrogating a chatbot.
For now, neither model is ready to fully take over the job. I still had to check the details, make the decisions, and do the actual booking or uploading myself. But the direction of travel is obvious. The AI war is no longer just about who has the smartest chatbot. Its about who can build the assistant
that gets you closest to a finished task. Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds. The best business laptops for all budgets Our top picks, based on real-world testing and comparisons
Read our full guide to the best business laptops 1. Best overall: Dell 14 Premium 2. Best on a budget: Acer Aspire 5 3. Best MacBook: Apple MacBook Pro 14-inch (M4)
======================================================================
Link to news story:
https://www.techradar.com/ai-platforms-assistants/claude/i-tried-claude-sonnet -5-with-prompts-that-ask-it-to-finish-the-job-not-just-answer-the-question-and -thats-where-the-ai-war-is-going
--- Mystic BBS v1.12 A49 (Linux/64)
* Origin: tqwNet Technology News (1337:1/100)