JAKARTA Not long ago, it was revealed that hundreds of thousands of YouTube videos were used to train Artificial Intelligence (AI) systems. The companies that carried out this action were Apple, Anthropic, NVIDIA, and Salesforce. Based on the results of the Proof News investigation, several technology companies used 173,536 subtitles from YouTube videos to train their devices. These datasets called YouTubears were taken from 48,000 different channels. The subtitles transscripts found came from large educational channels such as Harvard and MIT as well as news media such as ABC News, BBC, and New York Times. There is also material from two videos on the MrBeast channel and seven videos from the Marques Brownlee channel. In a larger number, 337 YouTubecurs came from the PewDiePie channel. Seeing these findings, Brownlee said that taking the YouTube transcript to train AI was a serious problem that would continue to occur. "One of them took a lot of data/transscript from YouTube videos, including mine," Brownlee said via the X platform. This will be an ever-evolving problem for a long time. Some time ago, OpenAI allegedly used video content from YouTube to train text-based video makers called Sora. CTO OpenAI Mira Murati even admitted that she was not sure if Sora was trained with YouTube content.

As the problem arose, YouTube CEO Neal Mohan gave a firm warning. Mohan asserts that the use of videos on his platform as an AI learning tool is an act of theft and violating platform policies. "This does not allow things like transcripts or video bits to download," Mohan told Bloomberg. "It is clearly a violation of our service requirements. It is a rule of play in terms of content on our platform." YouTube, including Mohan, has not yet responded to Proof News findings. Companies suspected of taking YouTubelipsing without permission, including Apple, have also not denied or responded to the allegations.


The English, Chinese, Japanese, Arabic, and French versions are automatically generated by the AI. So there may still be inaccuracies in translating, please always see Indonesian as our main language. (system supported by DigitalSiber.id)