Copyright Group Stops Dutch Language Datasets Used To Train AI
JAKARTA - The Dutch-based copyright enforcement group, BREIN, has succeeded in discontinuing the distribution of a large language dataset previously available to train artificial intelligence (AI) models. The dataset includes information collected without permission from tens of thousands of books, news sites, and subtitles in Dutch which are taken from various films and TV series.
According to a statement released by BREIN on Tuesday, August 13, the data collection was carried out without the approval of the rightful copyright owner. Director of BREIN, Bastiaan van Ramshorst, stated that although it is not yet clear to what extent the dataset has been used by AI companies, it is trying to act quickly to circumvent future lawsuits.
"It's very difficult to know, but we're trying to be on time," Van Ramshorst said. He also added that the upcoming EU AI Act would require AI companies to disclose datasets used in their model training.
In the United States, Microsoft-backed OpenAI has faced several lawsuits, including one of The New York Times, which accuses the use of copyrighted material to train AI models without permission.
SEE ALSO:
In Denmark, a copyright protection group called the Danish Rights Alliance previously succeeded in discontinuing the distribution of another large dataset known as "Books3" last year.
The person offering the Dutch language dataset has approved the terms of the termination and withdrawal order, and removed the dataset from the website where it is available for download, according to BREIN. The organization did not disclose the person's identity, given the privacy rules in the Netherlands.
This action shows how important copyright monitoring and enforcement is in the digital era, especially related to the rapid development of artificial intelligence technology that often utilizes large amounts of data to train its models.