JAKARTA Several prominent artificial intelligence (AI) models have reportedly not fully met EU regulations in terms of cybersecurity and discriminatory output. A number of AI-models of genreative technology companies such as Meta, OpenAI, and Alibaba have shown shortcomings in some areas that are critical to compliance with the European AI Law (AI Act), which is expected to take effect in stages in the coming two years.
This AI law has been the subject of debate for years, especially after OpenAI's launch of ChatGPT at the end of 2022 which sparked widespread discussions of the potential existential risk of these AI models. The emergence of public concerns forced policymakers to draft stricter regulations related to "general-purpose" AI (GPAI), which includes generative AI technologies such as ChatGPT.
To test compliance with this regulation, a new tool developed by LatticeFlow AI, a startup from Switzerland, with their partners at ETH Zurich and Bulgaria's INSAIT, was used to test a generative AI model. The tool assesses AI models of various categories with scores between 0 and 1, where these categories include technical aspects such as resilience, security, and discriminatory risk potential.
Test Results and AI Model Shortage
LatticeFlow published ranking boards showing the results of several AI models being tested. Big tech companies such as Alibaba, Meta, OpenAI, Anthropic, and Mistral all get an average score of more than 0.75. However, some models show flaws in key categories that could risk violating the AI Law.
In terms of discriminatory output, the tool provides a low score to the "GPT-3.5 Turbo" model from OpenAI, which only gets a value of 0.46. In fact, Alibaba Cloud's "Qwen1.5 72B Chat" model gets a lower score, namely 0.37. This discriminatory output reflects human bias related to gender, race, and other aspects, which can arise when the AI model is asked to produce certain content.
In addition, in the "prompt hijacking" category, namely the type of cyberattack in which hackers disguise malicious prompts as legitimate prompts to steal sensitive information, Meta's "Llama 2 13B Chat" model received a low score of 0.42, while the "8x7B Instruct" model from Mistral got a lower score, namely 0.38.
Claude 3 Opus, a model developed by Anthropic with Google support, got the highest score with an average value of 0.89 in various categories, making it the most resilient model in terms of compliance with security regulations and technical resilience.
Great Sanction Potential
This checking tool is designed in accordance with the text of the AI Law and is expected to continue to be updated in line with the implementation of additional enforcement measures. According to LatticeFlow CEO and co-founder Petar Tsankov, the test results provide an overview of where companies need to increase their focus to ensure compliance with the AI Law.
He stated that although the results were positive overall, there was still a "gap" that needed to be corrected so that this generative AI model could meet regulatory standards.
"EU is still perfecting compliance benchmarks, but we can already see some shortcomings in existing AI models," said Tsankov. With a greater focus on optimization for compliance, we believe model providers can prepare well to meet regulatory requirements.
SEE ALSO:
If companies fail to comply with this AI law, they can be fined 35 million euros (approximately 38 million US dollars) or 7% of the global annual turnover, depending on which one is bigger. This puts great pressure on tech companies to correct the shortcomings exposed through this test.
Currently, the European Union is still working to determine how the rules of the AI Law will be enforced, particularly for generative AI tools such as ChatGPT. Experts are being gathered to draw up a code of practice that is expected to be completed in the spring of 2025.
Although the European Commission was unable to verify external tools, they have been informed throughout the development of this checking tool and called it an important first step in implementing the AI Law. A spokesman for the European Commission said, "The Commission welcomes this study and the AI model evaluation platform as the first step in translating the European Union AI Law into technical requirements."
Several technology companies whose models were tested, such as Meta and Mistral, declined to comment. Meanwhile, companies such as Alibaba, Anthropic, and OpenAI did not immediately respond to requests for comment regarding the test results.
The English, Chinese, Japanese, Arabic, and French versions are automatically generated by the AI. So there may still be inaccuracies in translating, please always see Indonesian as our main language. (system supported by DigitalSiber.id)