OpenAI’s New Reasoning Models Show Increased AI Hallucinations

OpenAI's o3 and o4-mini AI models excel at reasoning but show higher hallucination rates, making up false facts up to 48% of the time.

chandramouli
By
chandramouli
Founder
Chandra Mouli is a former software developer from Andhra Pradesh, India, who left the IT world to start CyberOven full-time. With a background in frontend technologies...
- Founder
6 Min Read
The image features the Open AI logo encircled by various Open AI icons, illustrating innovation and technology.
Highlights
  • OpenAI launched o3 and o4-mini, new advanced reasoning models.
  • O3 and o4-mini hallucinate more, making up incorrect information often.
  • OpenAI is researching ways to reduce hallucinations and improve reliability.

OpenAI has recently launched new AI models called o3 and o4-mini that are better at reasoning but have a surprising problem: they make up false information more often than older models. According to TechCrunch, these advanced models are creating fake facts, fictional events, and even broken website links in their responses. This issue is most noticeable when the models answer questions about people.

What Are AI Hallucinations?

When we talk about AI “hallucinations,” we don’t mean the AI is seeing things that aren’t there. Instead, this term describes when AI models make up information that sounds real but is actually false. Think of it like a student who doesn’t know an answer but confidently makes something up instead of saying “I don’t know.”

AI hallucinations happen when models try to fill gaps in their knowledge by using patterns they’ve learned. Some common examples include:

  • Creating fake facts that sound believable
  • Making up events that never happened
  • Inventing quotes from real people
  • Providing links to websites that don’t exist
  • Describing fictional actions or capabilities

These made-up answers are dangerous because the AI presents them confidently as if they were true facts.

OpenAI’s New Reasoning Models: o3 and o4-mini

OpenAI launched these new models in April 2025. They are more advanced versions that come after their earlier o1 model. These new AI systems are specifically designed to be better at reasoning – meaning they can think through problems more like humans do.

Key Features and Improvements

The new models have several impressive abilities:

  • They can use tools on their own, like web browsers and coding in Python
  • They can analyze images and “think with pictures”
  • The o3 model scores 87.7% on a difficult test called GPQA Diamond (much better than o1’s 32%)
  • The o4-mini model is cheaper to run but still very powerful, scoring 99.5% on a math test called AIME 2025
  • Both models are better at following instructions than previous versions

Rising Hallucination Rates in New Models

Despite being smarter in many ways, these new models are actually worse at sticking to facts. Researchers found this problem when testing them with something called the PersonQA benchmark.

The PersonQA Benchmark Results

PersonQA is a test created by OpenAI that asks AI models questions about real people to see if they make up false information. The results were concerning:

AI ModelHallucination Rate on PersonQA
o3 model33% (made up false information in 1 of 3 answers)
o4-mini model48% (made up false information in nearly half of all answers)

This means that when asked about people, the o4-mini model makes up fake information almost half the time. This is a serious problem for users who need reliable information.

Why Are Hallucinations Increasing?

You might wonder: if these models are smarter, why do they lie more? Researchers have some ideas:

  • The improved reasoning abilities might actually make the models more confident in making up answers
  • As models use broader patterns in data, they might create more detailed but incorrect responses
  • The way these models are trained to get good outcomes might encourage them to guess confidently rather than admit uncertainty

Neil Chowdhury from Transluce research, Sarah Schwettmann (Transluce co-founder), and Stanford professor Kian Katanforoosh have all been studying this problem. OpenAI spokesperson Niko Felix admitted that more research is needed to understand why this is happening.

Also Read: OpenAI’s new ChatGPT Library lets 130M users organize and edit over 700M AI images made with GPT-4o seamlessly across devices. Read Here.

OpenAI’s Plans to Address Hallucination Issues

OpenAI isn’t ignoring this problem. They’re planning several research paths to fix these hallucination issues:

  • Developing “internal reasoning trace” methods to see how the AI models reach their conclusions
  • Investigating why scaling up reasoning models increases false information
  • Studying how their training methods might contribute to hallucinations
  • Improving how models connect to reliable data sources
  • Adding web search abilities so models can check facts before answering

The goal is to make future models both smart and truthful, which is essential for AI to be useful in important areas like healthcare, education, and business.

Why This Matters For Everyone

AI hallucinations aren’t just a technical problem. If you use AI tools for work, school, or personal tasks, you need to know if the information is reliable. Imagine making important decisions based on completely made-up facts! This challenge shows that making AI smarter doesn’t automatically make it more truthful.

As OpenAI continues developing these models, the balance between powerful reasoning and factual accuracy remains a key challenge. Until these hallucination issues are solved, users should double-check important information provided by these advanced AI systems, even when the answers sound very confident.

Share This Article