Reddit Updates Web Standards To Prevent Automatic Data Collection

JAKARTA - Social media platform Reddit announced on Tuesday, June 25 that it will update the web standards used by the platform to prevent automatic data collection from its website. The move comes after reports state that AI startups have circumvented the rules to collect content for their systems.

This announcement comes at a time when artificial intelligence companies (AI) are accused of plagiarism content from publishers to create AI summarys without granting credit or asking for permission.

Reddit said it would update the Robots Exclusion Protocol, or "robots.txt," a widely accepted standard aimed at determining which part of the site a search engine may index. The company will also maintain speed restrictions, techniques used to control the number of requests from a specific entity, and will block unknown bots and cranes from collecting data on its website.

Recently, robots.txt became an important tool used by publishers to prevent tech companies from using their content for free to train AI algorithms and make a summary in response to some search queries.

| TEKNOLOGI
Gojek dan Airpro Berkolaborasi dalam Menghadirkan Kenyamanan di Layanan GoCar
26 Juni 2024, 17:30
| TEKNOLOGI
Robert Kiyosaki Bakal Beli Bitcoin Lebih Banyak Ketika Harga Turun
26 Juni 2024, 17:00
| TEKNOLOGI
Data 44 Instansi yang Terdampak Ransomware PDNS 2 Sudah Berhasil di Migrasi
26 Juni 2024, 16:35
| TEKNOLOGI
Wamenkominfo Pastikan Serangan Ransomware PDNS Tak Memengaruhi Pembangunan PDN
26 Juni 2024, 16:15
| TEKNOLOGI
Kongres AS Ajukan RUU Pembayaran Pajak dengan Bitcoin
26 Juni 2024, 16:00

Last week, a letter to the publisher by content licensing startup TollBit said that some AI companies outsmarted web standards to collect data from publisher sites.

This follows an investigation by Wired that found that AI search startups, Perplexity, are likely to circumvent attempts to block their web cranes through robots.txt.

In early June, business media publisher Forbes accused Perplexity of plagiarizing its investigative stories for use in a generating AI system without providing credit.

Reddit also stated on Tuesday that researchers and organizations such as the Internet Archive will still have access to its content for non-commercial use.

The English, Chinese, Japanese, Arabic, and French versions are automatically generated by the AI. So there may still be inaccuracies in translating, please always see Indonesian as our main language. (system supported by DigitalSiber.id)

Tag: reddit website artificial intelligence