Research Reveals GPT-4 is More Trusted but Vulnerable to Jailbreaking and Bias

JAKARTA - Researchers from the University of Illinois Urbana-Champaign, Stanford University, University of California, Berkeley, Center for AI Safety, and Microsoft Research have conducted research related to the large language model GPT-4. They revealed that although more trustworthy than GPT-3.5, GPT-4 remains vulnerable to jailbreaking and bias issues.

The research gives GPT-4 a higher trust score than its predecessor. This means that GPT-4 is better at protecting personal information, avoiding “toxic” results such as biased information, and is more resistant to adversarial attacks. However, GPT-4 can also be directed to bypass security measures and leak personal information and conversation history.

The researchers found that users were able to bypass GPT-4's protections because the model "follows misleading information more closely" and is more likely to follow very complex commands verbatim.

The researchers emphasized that the vulnerability was tested and not found in GPT-4-based products presented to consumers, because "ready-made AI applications apply various mitigation approaches to address potential losses that may occur at the technology model level."

The study measured levels of trust by observing results in several categories, including toxicity, stereotyping, privacy, machine ethics, fairness, and resistance to adversarial testing.

The researchers first tried GPT-3.5 and GPT-4 using standard commands, which included using potentially prohibited words. Next, the researchers used commands designed to encourage the model to violate content policy restrictions without appearing biased against certain groups, before ultimately challenging the model by deliberately trying to trick it into ignoring the protections altogether.

The researchers revealed that they had shared the results of this research with the OpenAI team.

"Our goal is to encourage other research communities to utilize and build on this work, which may prevent malicious actions by parties who would exploit this vulnerability to cause harm," said the research team, quoted by The Verge.

Research Reveals GPT-4 is More Trusted but Vulnerable to Jailbreaking and Bias

SEE ALSO:

Begini Cara Mengatasi Copyright di TikTok, Solusi untuk Konten yang Terkena Klaim Hak Cipta

Lakukan 4 Cara Ini Jika Opsi Eject USB Drive Tidak Muncul di Laptop Windows

**Pasar Kripto Pulih, Harga Bitcoin Naik Setelah Para Whale Nyerok BTC**

Begini Cara Mudah Memperbarui Pengontrol DualSense Tanpa Kabel

Empat Awak Pesawat dalam Misi Ax-3 Axiom Space Siap Meluncur Tahun Depan

SEE ALSO:

Related News :