Meta Creates SeamlessM4T V2 With Lack Of Expression Reading And Latency
Meta presents the SeamlessM4T V2 translator (photo: dock. Meta)

JAKARTA - In August, Meta introduced the SeamlessM4T, an Artificial Intelligence (AI)-based translator tool. Just three months ago it was launched, Meta has announced its newest model.

Meta updated the SeamlessM4T and called it the second generation (V2). This service has a wider translation tool as Meta combines two new capabilities in it.

Meta's first advantage is Seamless Expressive. This tool can include speaker expressions such as whispers, sadness, and joy through pauses, speed of speech, vocal style, and emotional tone.

This tool is made with a combined expressive and basic model of SeamlessM4T V2. Meta says that it has replaced audio processing devices to support express readability.

Changing the HiFi-GAN co-unit in SeamlessM4T v2 with an expressive unit-to-state generator conditioned on source speech allows transfer of tones, emotional expressions, and vocal forces without a hitch," Meta said in its release.

Meta's next advantage is SeamlessStreming, automatic greeting recognition for text or speech translation. This tool comes with high accuracy and latency reaching two seconds.

This latency is unavoidable because the structure of the sentences of each language is different. The AI in SeamlessM4T must study its partial audio input to decide whether the words and sentences that are listened to need to be translated or not.

"This is done through a studied read or write policy, which determines based on partial audio inputs, whether to write and generate outputs or keep waiting for other inputs," explains Meta.

Currently, SeamlessM4T V2 has supported nearly 100 languages for translation of speech to text and 36 languages for translation from speech to speech. This translator service has been trained with 4.5 million hours of data so it is possible that the language will increase.

New capabilities in the Seamless Communication series, cross-language communication services, can be a rival to translators developed by Google and Samsung. However, it is not yet known when the SeamlessM4T V2 can be accessed by the public.


The English, Chinese, Japanese, Arabic, and French versions are automatically generated by the AI. So there may still be inaccuracies in translating, please always see Indonesian as our main language. (system supported by DigitalSiber.id)