JAKARTA - Social media platform Reddit announced on Tuesday, June 25 that it will update the web standards used by the platform to prevent automatic data collection from its website. The move comes after reports state that AI startups have circumvented the rules to collect content for their systems.
This announcement comes at a time when artificial intelligence companies (AI) are accused of plagiarism content from publishers to create AI summarys without granting credit or asking for permission.
Reddit said it would update the Robots Exclusion Protocol, or "robots.txt," a widely accepted standard aimed at determining which part of the site a search engine may index. The company will also maintain speed restrictions, techniques used to control the number of requests from a specific entity, and will block unknown bots and cranes from collecting data on its website.
Recently, robots.txt became an important tool used by publishers to prevent tech companies from using their content for free to train AI algorithms and make a summary in response to some search queries.
SEE ALSO:
Last week, a letter to the publisher by content licensing startup TollBit said that some AI companies outsmarted web standards to collect data from publisher sites.
This follows an investigation by Wired that found that AI search startups, Perplexity, are likely to circumvent attempts to block their web cranes through robots.txt.
In early June, business media publisher Forbes accused Perplexity of plagiarizing its investigative stories for use in a generating AI system without providing credit.
Reddit also stated on Tuesday that researchers and organizations such as the Internet Archive will still have access to its content for non-commercial use.
The English, Chinese, Japanese, Arabic, and French versions are automatically generated by the AI. So there may still be inaccuracies in translating, please always see Indonesian as our main language. (system supported by DigitalSiber.id)