JAKARTA AI Perplexity companies are back in the spotlight after recent reports show that they are continuing to practice data screening from websites aggressively and ignoring robotics rules.txt, although it has been warned since 2024.
According to a report from Cloudflare, Perplexity uses increasingly sophisticated techniques to access website data that explicitly prohibit bots from browsing. Even when the Perplexity main bot is blocked by robots.txt, the company allegedly sends new bots with user agents, IP addresses, and ASN (Autonomic System Number) different to avoid detection and keep access to protected content.
Cloudflare is testing by creating a new site that has never been accessed by anyone. After ordering Perplexity AI to search for information from the site, it is known that exclusive information that only exists on that page appears in the Perplexity an indication that they have successfully passed the robot's ban.txt in an opaque way.
Perplexity Defends Themselves
In response to the report, Perplexity published an article on their official blog that defends their practices. They claim that their web scripters' and AI agents are different entities, and accuse Cloudflare of failing to distinguish between the two. They even said that Cloudflare threatens web openness.
However, this defense has received strong criticism from the tech community. Many people view Perplexity's reasons as play words' that are irrelevant to the essence of the problem. The website has the full right to decide who can access its content, and the rules of the robots.txt are a form of ethical agreement between site owners and cranes.
"If all human sites close because their traffic is sucked in by a chatbot, in the end AI like Perplexity will have nothing more to read," wrote one observer.
Apple, Google, And OpenAI Respect Robots.txt
Unlike Perplexity, Apple, Google, ChatGPT (OpenAI), and other big tech companies still respect robots.txt, even though they have no legal force. Apple itself was highlighted when it was revealed that Applebot was used to index data for Apple Intelligence training. However, Apple insists that they follow robot rules.txt and do not train AI models with users' personal data.
Responding to rumors that Apple might acquire Perplexity, this issue could thwart the plan. Perplexity's reputation as a company that ignores scrapping ethics could be a huge burden for Apple, which is building an image as the pioneer of ethical AI.
Threat To Open Web
This issue highlights the big dilemma in the modern internet: AI requires data, but websites require human traffic to survive. If site content is copied by AI and served without bringing users back to its original source, an open digital ecosystem could collapse.
Reports from 404 Media and Ars Technica show that human traffic to news sites and blogs dropped dramatically as search engines and AI now answer directly without bringing users to the source page.
SEE ALSO:
"Perplexity is destroying the open web under the pretext of justice and freedom," wrote one analyst. But if there is no human site left, then the AI itself will lose fuel.
This debate reflects the tension between AI's need for data and the rights of site owners to protect its content. If companies like Perplexity continue to ignore digital ethics, the future of the internet could become a world dominated by bots, not humans.
In this context, Apple seems to have to keep a distance from Perplexity if it wants to maintain its position as a responsible and transparent AI pioneer.
The English, Chinese, Japanese, Arabic, and French versions are automatically generated by the AI. So there may still be inaccuracies in translating, please always see Indonesian as our main language. (system supported by DigitalSiber.id)