OpenAI pushed out GPT-5 yesterday, describing the latest rev of its large language model as its “best AI system” yet and promising reduced hallucinations and less “sycophancy”.

Observers noted improved performance across coding with the new models but also flagged the possibility of the model enabling more corporate fraud and tempting enterprises to generate more code while not addressing technical debt.

When it comes to coding, OpenAI claimed “particular improvements in complex front‑end generation and debugging larger repositories”. It claimed the model could “often create beautiful and responsive websites, apps, and games with an eye for aesthetic sensibility” with a single prompt.

Hallucinations have been a concern for anyone thinking of implementing OpenAI in serious, real-world applications. The firm claimed “GPT‑5 is significantly less likely to hallucinate than our previous models.”

It explained: “With web search enabled on anonymized prompts representative of ChatGPT production traffic, GPT‑5’s responses are ~45% less likely to contain a factual error than GPT‑4o, and when thinking, GPT‑5’s responses are ~80% less likely to contain a factual error than OpenAI o3.”

When it comes to reasoning on “complex, open-ended questions” it claimed the latest rev served up “six times fewer” hallucinations.

It’s just a question, then, of how much risk an enterprise can live with when basing critical systems and decisions on AI.

It also serves up “more honest responses”, OpenAI claimed, “especially for tasks which are impossible, underspecified, or missing key tools.”

It added that “On a large set of conversations representative of real production ChatGPT traffic, we’ve reduced rates of deception from 4.8% for o3 to 2.1% of GPT‑5 reasoning responses.”

At the same time, it claimed the GPT‑5 is “less effusively agreeable”. Apparently, “It should feel less like “talking to AI” and more like chatting with a helpful friend with PhD‑level intelligence.”

OpenAI touted its performance on “economically important tasks”, according to its internal benchmarks. This covered tasks across “over 40 occupations including law, logistics, sales, and engineering.”

Peter van der Putten, head of the AI Lab at Pegasystems, and assistant prof of AI at Leiden University said the new release appeared to be able to understand large blocks of code and unpick the architectural decisions behind it.

That sounds like a gift for organizations looking to AI to build their way out of legacy systems, but he warned: “The idea that GPT-5 can build an entire application from scratch is appealing and quite an achievement. Yet for large enterprises this won’t be the way how to build mission critical applications. They need to understand the code that is being generated, and you run the risk of just increasing technical debt.”

Rather enterprises should consider it for “low code assets that business people can understand, such as workflows and data models.”

More specifically, Gary Hall, chief product officer at Medius, said GPT-5 was a “a gift to fraudsters. When AI-generated documents are indistinguishable from the real thing, legacy finance systems simply can’t cope.”

Medius said research showed nearly a third of respondents wouldn’t recognise a fake, AI generated expense report if it came across their desk.

OpenAI this week also released a pair of open models, which it promised would “deliver strong real-world performance at low cost.”

gpt-oss-120b and gpt-oss-20b are “open-weight language models”, meaning the parameters, or weights, are publicly available, and the models are available under the Apache 2 license. However, the firm was careful not to describe them as open source.

OpenAI said the models “can run on edge devices with just 16 GB of memory, making it ideal for on-device use cases, local inference, or rapid iteration without costly infrastructure.”

However, users can turn to AWS to access the new models, with the cloud giant making them available via Bedrock. This is the first time it has offered OpenAI LLMs.

Personalized Feed
Personalized Feed