JAKARTA - Google plans to build an Artificial Intelligence (AI) language model that supports 1,000 different languages, the technology is rumored to be launched during its annual I/O event in a few months.

Before going public, Google shared more information about the Universal Speech Model (USM), a system the company describes as the first step in realizing its goals.

The technology giant describes USM as a collection of state-of-the-art speech models with 2 billion parameters trained for 12 million hours of speech and 28 billion sentences in more than 300 languages.

So far, USM has supported more than 100 languages ​​and will serve as the foundation for building a wider system.

"We demonstrated that using large multilingual data sets without labels to pre-train model encoders and refine smaller labeled data sets allows us to recognize underrepresented languages," Google said in a blog post.

"In addition, our model training process is effective in adapting to new languages ​​and data," he added.

It is known that USM has been used by YouTube to create closed captions, also supports Whisper or automatic speech recognition (ASR).

It automatically detects and translates languages, including English, Chinese, Amharic, Cebuano, Assamese, and many more. Meta is also reportedly working on a similar AI translation tool which is still in its early stages.

"For speech translation, we refined USM on the CoVoST (large-scale multilingual speech-to-text translation corpus) data set. Our model includes text through the second stage of our channel, achieving state-of-the-art quality with limited supervised data," Google said.

In its model-wide performance assessment, Google categorizes languages ​​from its CoVoST dataset into high, medium, and low based on resource availability and calculates a BLEU score (higher is better) for each segment. USM outperformed Whisper across all segments.

The Verge reports, quoted on Wednesday, March 8, one of the goals of this technology could be in augmented reality (AR) glasses like the concept Google showcased during last year's I/O event, which is able to detect and provide real-time translations that appear right on the screen. before my eyes.

"The development of USM is an important endeavor to realize Google's mission to organize the world's information and make it universally accessible. We believe the USM base model architecture and training pipeline comprise the foundation we can build to extend speech modeling to the next 1.000 languages," Google said.


The English, Chinese, Japanese, Arabic, and French versions are automatically generated by the AI. So there may still be inaccuracies in translating, please always see Indonesian as our main language. (system supported by DigitalSiber.id)