Google Reveals More AI Supercomputer Details And Energy Savings Than Nvidia
JAKARTA - Google released new details on Tuesday April 4 about the supercomputer used to train artificial intelligence models, saying these systems are faster and more power efficient than similar systems from Nvidia Corp.
Google has designed its own custom chip called the Tensor Processing Unit or TPU. The chip is used for more than 90% of the company's work on artificial intelligence training, namely the process of providing data through models to make it useful in tasks such as answering questions with human-like text or producing images.
Google's TPU is now in its fourth generation. Google on Tuesday published a scientific paper outlining how the company has connected more than 4,000 chips into a supercomputer using an optic switch specially developed by Google itself to help connect individual machines.
Improving this connection has become a major point of competition among companies building artificial intelligence supercomputers as big language models that allow technologies like the Bard Google or OpenAI ChatGPT have exploded in size, meaning they are too big to store on a single chip.
The models must be divided among thousands of chips, which then had to work together for weeks or more to train models. Google's PLM model - the largest publicly expressed language model to date - was trained by dividing it between two supercomputer 4,000 chips for 50 days.
Google says its supercomputer makes it easy to reconfigure connections between chips, help avoid problems and adjust performance improvements.
"Switching circuits make it easier for failed routes to avoid components," said Google Fellow Norm Jouppi and Google Distinguished Engineer, David fire in a blog post about the system. "This flexibility even allows us to change supercomputer interconnection topologies to accelerate the performance of ML (machine learning) models."
Although Google has only released details about its current supercomputer, the system has been online within the company since 2020 at a data center in Mayes County, Oklahoma. Google says that the Midjourney startup uses the system to train its model, which produces fresh images after being given several textwords.
In the paper, Google says that for systems the same size, the chips are up to 1.7 times faster and 1.9 times more power efficient than the Nvidia A100 chip-based system circulating at the same time as the fourth generation TPU.
Spokesman Nvidia declined to comment, when asked for a response by Reuters.
Google says that the company is not comparing its fourth generation with Nvidia's current flagship chip, H100, as H100 goes into the market after Google chips and is made with newer technology.
Google provides hints that it may be working on a new TPU that will compete with Nvidia H100 but does not provide details, with Jouppi telling Reuters that Google has a "healthy chip development chip pipeline."