JAKARTA - Meta want to help Artificial Intelligence (AI) researchers create their tools and processes more universally inclusive, by releasing a new face-to-face video clip dataset and helping developers assess how well their models work for different demographic groups.
A Meta database called Casal Conversations v2 can be used by researchers to better evaluate the fairness and resilience of certain types of AI models.
"This comprehensive data framework offers a granular list of 11 categories provided by itself, annotated to further quantify the fairness and resilience of algorithms in this AI system," Meta said in its official blog quoted Friday, March 10.
"The release of this data collection is one of the main highlights of the progress of our civil rights, which were made through consultation with internal experts in this field," he added.
The Casal Conversation v2 dataset includes 26,467 monologous videos, recorded in seven countries, and features 5,567 paid participants, accompanied by speech, visual, and demographic attribute data to measure systematic effectiveness.
"With theual Conversations v2, we want to use a multilingual dataset to support the development of an inclusive natural language processing model," said Meta.
In addition to the extended list of categories, the Casal Conversation v2 differs from the first version with the inclusion of participant monologies recorded outside the US. Seven countries included in v2 are Brazil, India, Indonesia, Mexico, Vietnam, the Philippines, and the US.
So the data doesn't take Facebook information or provide images from Instagram, the content included in this dataset is designed to maximize inclusion by giving AI researchers more samples of people from various backgrounds to be used in their models.
"In the future, we hope to expand the dataset to additional geography. Another difference in the latest dataset is that participants are given the opportunity to speak in the main and second languages," said Meta.
The English, Chinese, Japanese, Arabic, and French versions are automatically generated by the AI. So there may still be inaccuracies in translating, please always see Indonesian as our main language. (system supported by DigitalSiber.id)