Meta Uses Facebook And Instagram Public Posts To Train Meta AI Virtual Assistant
JAKARTA - Meta Platforms Inc., uses public posts from Facebook and Instagram to train part of their new virtual assistant, Meta AI. However, this does not include private posts that are only shared with family and friends to respect consumer privacy. This was conveyed by the company's top policy executive to Reuters in the interview.
Meta also doesn't use private chats on their messaging services as training data for the model, and is taking steps to filter personal details from public datasets used for training, according to Global Meta Affairs President Nick Clegg, who spoke on the sidelines of their company's annual conference this week.
"We are trying to exclude data sets containing very dominant personal information," Clegg said. He added that "most of the" data used by Meta for training is publicly available.
He gave LinkedIn an example of a website whose content Meta deliberately does not use due to privacy concerns.
Clegg's comments come as tech companies such as Alphabet's Meta, OpenAI, and Google have been criticized for using information taken from the internet without permission to train their AI model, which absorbs large amounts of data to summarize information and generate images.
These companies are considering how to handle personal material or copyrighted ones taken into the process whose AI systems they may generate, while facing lawsuits from authors accusing them of copyright infringement.
Meta AI is the most significant product among the company's first AI tools aimed at consumers, Meta CEO Mark Zuckerberg revealed on Wednesday 27 September at Meta's annual product conference, Connect. This year's event was dominated by talks about artificial intelligence, in contrast to previous conferences focused on the most advanced and virtual reality.
Meta made the assistant use a special model based on the powerful Llama 2 big language model the company released for public commercial use in July, as well as a new model named Emu that produces images in response to text demand, the company said.
This product will be able to generate text, audio, and images, and will have access to real-time information through a partnership with Microsoft's Bing search engine.
"The public's Facebook and Instagram posts used to train Meta AI include text and photos," Clegg said.
The posts are used to train Emu for product image generation elements, while the chat function is based on Llama 2 with several publicly available datasets and annotated, a Meta spokesperson told Reuters.
"Interaction with Meta AI can also be used to improve future features," Clegg said.
Clegg says Meta imposes security restrictions on content that can be generated by Meta AI tools, such as bans on the creation of realistic photos of famous figures.
VOIR éGALEMENT:
About copyrighted material, Clegg said that he hopes there will be "a number of lawsuits" regarding the issue "whether creative content is covered or not by existing reasonable use doctrines," which allows limited use of copyright-protected works for purposes such as comments, research, and parody.
"We thought it was, but I was very suspicious it would play in litigation," Clegg said.
Several companies with image generation tools facilitate the reproduction of iconic characters such as Mickey Mouse, while others have paid for the material or deliberately avoided including it in training data.
OpenAI, for example, signed a six-year deal with content provider Shutterstock this summer to use the company's image, video and music library for training.
Asked whether Meta has taken similar steps to avoid copyrighted image reproduction, a Meta spokesperson refers to a new service requirement that prohibits users from producing content that violates privacy and intellectual property rights.