The Latest AI Model From OpenAI Is More Frequently Smoothed

JAKARTA ChatGPT o3 and o4-mini are the latest Artificial Intelligence (AI) models from OpenAI. Although claimed to be the most advanced model of reasoning, both of them are more frequent hallucinations than other OpenAI models.

Sustainable is one of the diseases avoided by AI developers, including OpenAI. When experiencing this, the AI model will tend to be fabricated rather than giving the right response and according to what users need.

Until now, hallucinations are still the biggest challenge for AI because it has a direct impact on its model system and performance. Although most of the AI models have managed to overcome this problem, OpenAI's o3 and o4-mini failed to do so.

According to the results of OpenAI's internal testing, these two reasoning models are more frequent hallucinations compared to the o1, o1-mini, and o3-mini. When compared to OpenAI's non-alignment model, GPT-4o actually has much better test results.

The thing that is quite concerning is that OpenAI does not know the cause of o3 and o4-mini frequent hallucinations. The company noted that they 'require more training' to find out the cause of the hallucinations.

| TEKNOLOGI
Pembuat Sepatu Asal Australia, Bertransformasi Lewat iPhone
21 April 2025, 10:07
| TEKNOLOGI
Lapor Bug di iOS, Konten Anda Bisa Digunakan untuk Melatih Apple Intelligence
21 April 2025, 11:05
| TEKNOLOGI
DOJ Masih Berusaha Pisahkan Chrome dari Google
10 Maret 2025, 17:35
| TEKNOLOGI
Apple dan Raksasa Teknologi AS Terancam Denda Rp14.105 Triliun Miliar Akibat Larangan TikTok
26 Maret 2025, 11:05

From the results of OpenAI's testing, 33 percent imaginary o3 answered questions in PersonQA, a benchmark that the company relies on. Meanwhile, the o4-mini has a worse percentage benchmark, which is 48 percent.

This halusinasi is much worse than the previous model of reasoning. The reason is, o1 and o3-mini each only got a score of 16 percent and 14.8 percent in benchmark testing. This is twice as low as the o3 and o4-mini models.

Transluce testing results, citing from TechCrunch, also showed similar results to OpenAI testing. The nonprofit AI Research Laboratory found that o3 often makes up actions to provide users with answers.

"Our hypothesis is that the type of reinforcement learning used for the series-o model can strengthen problems that are usually addressed (but not completely removed) by standard post-training channels," said Neil Chowdhury, Transluce Researcher and former OpenAI Employee.

The English, Chinese, Japanese, Arabic, and French versions are automatically generated by the AI. So there may still be inaccuracies in translating, please always see Indonesian as our main language. (system supported by DigitalSiber.id)

Tag: openai kecerdasan buatan