JAKARTA - Meta has just introduced a generative Artificial Intelligence (AI) dubbed Voicebox, a breakthrough in creating speech in various styles. However, the company is reluctant to launch it to the public.

Voicebox can perform speech-making tasks, such as editing, sampling and regulating a style of language that is not specially trained to be done through learning in context.

In addition, Voicebox can produce high-quality audio clips and edit previously recorded audio.

This tool also adopts multilinguals and produces speech in six languages. When given samples of a person's speech and text sections in English, French, German, Spanish, Polish, or Portuguese, Voicebox can produce text readings in one of these languages.

Even speech and text samples in different languages. This ability can be used in the future to help people communicate in a natural and authentic way even if they don't use the same language.

Meta states that Voicebox is based on a method called Flow Matching, which has proven to improve the diffusion model. However, Meta will not release it to the public because it is claimed it could be catastrophic, as well as abuse.

"There are many interesting use cases for a generating speech model, but due to potential abuse risks, we don't provide a Voicebox model or code to the public at this time," Meta said in its official blog, quoted Tuesday, June 20.

According to the company, Voicebox outperforms the latest English model VALL-E on zero-shot text-to-speech in terms of clarity (5.9 percent vs. 1.9 percent word error rate) and audio similarity (0.580 vs. 0.681), as well as 20 times faster.

For cross-language transfer, Voicebox outperforms YourTTS to reduce word error rates by an average from 10.9 percent to 5.2 percent, and increase audio similarity from 0.335 to 0.481.

"While we believe it is important to be open to the AI community and share our research to advance advanced AI, a precise balance between openness and responsibility is also important," said Meta.


The English, Chinese, Japanese, Arabic, and French versions are automatically generated by the AI. So there may still be inaccuracies in translating, please always see Indonesian as our main language. (system supported by DigitalSiber.id)