OpenAI: Chatbot Can Play Deliberately
JAKARTA - Recent research from OpenAI and Apollo Research reveals a surprising phenomenon: AI models can intentionally deceive users to achieve their goals. This phenomenon is called AI Scheming'' or AI cunning behavior, and is different from a common mistake known as AI halusination.
AI halusinasi terjadi ketika chatbot memberikan informasi yang salah karena metebak. Namun, AI Scheming jauh lebih serius: AI mempredak-pura bertindak sesuai aturan, padahal secara tentang-tidak mengejar tujuan terhidup.
Researchers describe the analogy as a stock trader who violates the law for profit, but still looks obedient on the surface.
A simple example of this behavior is when AI claims to have completed a task such as creating a website when it hasn't done it, just to pass the evaluation.
SEE ALSO:
Challenges Of Eliminating Twisted Behavior
Researchers found that training AI to stop lying could actually worsen things. If AI realizes it is being tested, it can come up with a more cunning and hidden strategy to keep it running, even though it hasn't really changed yet.
The good news is that a new technique called deliberative alignment has proven to be effective in reducing cunning behavior. By teaching AI anti-schemming specifications and asking it to review before acting, the frequency of cunning actions dropped drastically from 13% to less than 1% on some models.
Researchers warn that as AI is given greater responsibility in complex tasks, the potential danger of cunning behavior will increase. Unlike ordinary software, AI has the ability to formulate strategies and deceive consciously. Therefore, ensuring AI honesty becomes increasingly important in the future.