• Even the most advanced AI models fail more often than you think o

    From TechnologyDaily@1337:1/100 to All on Sunday, March 22, 2026 17:15:26
    Even the most advanced AI models fail more often than you think on structured outputs raising doubts about the effectiveness of coding assistants

    Date:
    Sun, 22 Mar 2026 17:05:00 +0000

    Description:
    Large language models struggle with structured outputs, achieving only 75% accuracy on complex tasks, leaving reliability concerns for developers.

    FULL STORY ======================================================================Copy link Facebook X Whatsapp Reddit Pinterest Flipboard Threads Email Share this article 0 Join the conversation Follow us Add us as a preferred source on Google Newsletter Tech Radar Get the TechRadar Newsletter Sign up for
    breaking news, reviews, opinion, top tech deals, and more. Contact me with news and offers from other Future brands Receive email from us on behalf of our trusted partners or sponsors By submitting your information you agree to the Terms & Conditions and Privacy Policy and are aged 16 or over. You are
    now subscribed Your newsletter sign-up was successful An account already exists for this email address, please log in. Subscribe to our newsletter Report finds AI coding assistants regularly fail one in four
    structured-output tasks Even advanced proprietary models only reach approximately 75% accuracy Open source AI models perform worse, averaging closer to 65% reliability The promise of artificial intelligence as a
    tireless coding assistant has encountered a significant roadblock after new research claimed such tools can experience a range of issues.

    A recent study from the University of Waterloo found AI struggles with software development, with even the most advanced models failing on one in four structured-output tasks. The research evaluated 11 large language models across 18 different structured formats and 44 tasks to test how well the systems could follow predefined rules, finding a clear disparity between performance on text-based tasks and outputs involving multimedia or complex structures. Article continues below You may like AI models cant fully understand security and they never will Devs don't trust AI code - but many say they still don't check it anyways Why LLMs are plateauing and what that means for software security Benchmarking reveals a troubling reliability gap While text-related tasks were generally handled with moderate success, tasks requiring image, video, or website generation proved far more problematic.

    Accuracy in these areas dropped sharply, raising questions about how these AI tools can be integrated safely into professional workflows.

    With this kind of study, we want to measure not only the syntax of the code that is, whether its following the set rules but also whether the outputs produced for various tasks were accurate, said Dongfu Jiang, a PhD student
    and co-first author of the study.

    Structured outputs, designed to impose format consistency through JSON, XML, or Markdown, were intended to make AI responses more reliable for developers. Are you a pro? Subscribe to our newsletter Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed! Contact me with news and offers from other Future brands Receive email from us on behalf of our trusted partners or sponsors By submitting your information you agree to the Terms & Conditions and Privacy Policy and are aged 16 or over.

    AI companies, including OpenAI, Google , and Anthropic, introduced structured outputs to force responses into predictable formats.

    The Waterloo research suggests this approach has not yet delivered the level of dependability developers require.

    Waterloos benchmarking revealed even the most advanced proprietary models reached only about 75% accuracy, while open source alternatives performed closer to 65%. What to read next Testing AI is not like testing software and most companies haven't figured that out yet 5 AI myths taken apart AI can summarize meetings, but heres what it still cant do in 2026

    These results suggest that, despite improvements, AI systems still make significant errors that cannot be ignored in professional development environments.

    The report emphasized the need for human oversight, noting,Developers might have these agents working for them, but they still need significant human supervision.

    Although structured outputs are a step forward from free-form natural
    language responses, errors remain common.

    The technology is not yet robust enough to operate independently in complex development scenarios.

    One might reasonably question whether the industrys enthusiasm for AI and
    vibe coding assistants has outpaced the actual capabilities of the underlying technology.

    Even the most advanced models demonstrate a significant failure rate on structured tasks, revealing a wide gap between marketing claims and actual performance.

    Therefore, for now, developers should treat these tools as experimental aids rather than autonomous colleagues. Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds. Make sure to click the Follow button!

    And of course you can also follow TechRadar on TikTok for news, reviews, unboxings in video form, and get regular updates from us on WhatsApp too.



    ======================================================================
    Link to news story: https://www.techradar.com/pro/even-the-most-advanced-ai-models-fail-more-often -than-you-think-on-structured-outputs-raising-doubts-about-the-effectiveness-o f-coding-assistants


    --- Mystic BBS v1.12 A49 (Linux/64)
    * Origin: tqwNet Technology News (1337:1/100)