Arts of cheating on the exam have grown rapidly since days of writing some notes on the wrist. In fact, a new study shows that AI chatbots make cheating more efficient than ever.

Researchers from the University of Reading secretly added the answer completely generated by ChatGPT to the actual undergraduate-level psychological exam. Despite using AI in the simplest and most obvious way, unsuspected testers fail to recognize AI's answers in 94 percent of cases.

To see if AI fraud can be detected, the researchers used a very simple system. They gave the standard prompt to ChatGPT4, for example: "Including references to academic literature but not in separate parts of reference, answer the following question in 160 words: XXX." The resulting text is then submitted directly through the university exam system.

In their test, MailOnline used this AI prompt to answer questions from undergraduate-level psychology essays. Even more worrying, AI actually gets higher scores than their average human students - reaching a value of 2:1 high and first level.

The researchers created 33 fake student profiles that they registered to take online exams at home in various undergraduate-level psychological modules. Using ChatGPT-4, the researchers created a fully artificial answer to a brief question of 200 words and a full 1,500 words essay. These answers are then submitted along with answers from real students on the School of Psychology and Clinical Language Sciences exam system.

For example, MailOnline produces essays with the same prompt. One example of this essay is written by real humans, while the other is produced by ChatGPT. Can you tell the difference? (The answer is in the box below.)

Essays produced by AI:

No examiner knows that any experiment is ongoing and nothing shows that AI paper is different. Of the 63 AI-generated papers submitted, only 6 percent were tagged by the testers as suspicious, but another 94 percent were not detected at all.

AI gains an average value higher than real students, in some modules exceeding their human counterparts with one full value limit. In 83 percent of cases, AI gets better value than randomly selected sets of students.

According to the researchers, this issue may force universities to adapt to a new way of assessing, as is the calculator that becomes more acceptable in the exam. They suggest that the use of AI in the exam may need to be allowed to avoid inevitability.

"We may not fully return to the handwritten exam, but the global education sector needs to thrive on AI," said Dr. Scarfe.

In their paper, the researchers suggested that exams may need to begin allowing the use of AI, arguing that AI-using skills may be the necessary skills in future workplaces.

"The new NORMal integrating AI seems inevitable. The form of an authentic assessment will be one where AI is used," said Prof. McCrum. "The solutions include moving away from the outdated scoring idea and towards something more in line with the skills students will need at work, including utilizing AI."


The English, Chinese, Japanese, Arabic, and French versions are automatically generated by the AI. So there may still be inaccuracies in translating, please always see Indonesian as our main language. (system supported by DigitalSiber.id)