Partager:

JAKARTA Apple announced its collaboration with NVIDIA to accelerate the performance of the big language model (Large Language Models/LLM) in producing text. This collaboration leverages Apple's latest technique, Recurrent Drafter (ReDrafter), which was previously published and open source.

ReDrafter offers a new method to produce text with LLM more quickly and achieve the best performance. This technique combines beam searches to explore various possibilities and dynamic tree attention to handle options efficiently.

Together with NVIDIA, Apple integrated a ReDrafter into the TensorRT-LLM, an NVIDIA tool designed to accelerate LLM processing on their GPU. The results of this collaboration are very promising:

NVIDIA added a new operator and improved TensorRT-LLM's ability to support more sophisticated decoding models and methods. This allows machine learning developers who use NVIDIA GPUs to easily take advantage of the Reprafter advantage.

Apple's machine learning researchers stated that LLM is increasingly being used to support production applications, so inference efficiency is very important.

With the innovative ReDrafter approach to speculative decoding that has been integrated into the framework of the TensorRT-LLM, developers can now enjoy higher token generation speeds on NVIDIA GPUs for their production apps, Apple said in its official blog.

This technology not only reduces computational costs but also offers a more responsive user experience, making it an ideal solution for various LLM-based applications.


The English, Chinese, Japanese, Arabic, and French versions are automatically generated by the AI. So there may still be inaccuracies in translating, please always see Indonesian as our main language. (system supported by DigitalSiber.id)