Netflix Leverages AMD Epyc Processors To Increase Bandwidth Up To 400 Gbps
JAKARTA - Netflix senior software engineer Drew Gallatin recently revealed about the company's efforts to optimize the hardware and software architecture, which enables the streaming of massive video entertainment to more than 209 million subscribers.
The company is currently able to squeeze as much as 200 GB per second from a single server, but at the same time wants to increase it. The results of these efforts were presented at the EuroBSD 2021 conference, as quoted from Tom's Hardware, Wednesday, September 22.
Gallatin says Netflix is able to push content up to 400 Gb per second using a combination of a 32-core Epyc 7502p (Rome) CPU, 256 GB of DDR4-3200 memory, 18 2-terabyte Western Digital SN720 NVMe drives, and two Nvidia PCIe 4.0 x16 network adapters. Mellanox ConnectX-6 Dx, each capable of accommodating two 100 GB connections.
To get an idea of the maximum theoretical throughput of this system, there are eight memory channels that provide approximately 150 GB of bandwidth per second, and 128 PCIe 4.0 lanes that allow up to 250 GB of I/O bandwidth. In network units, that's about 1.2 TB per second and 2 TB per second, respectively. It's also worth noting that this is what Netflix uses to serve its most popular content.
This configuration can typically serve up to 240 GB of content per second, mainly due to limited memory bandwidth. Netflix then tried a different Non Uniform Memory Architecture (NUMA) configuration, with one NUMA node capable of generating 240 GB per second and four NUMA nodes generating about 280 GB per second.
However, this approach has a number of problems of its own, such as higher latency. Ideally, it should store as much bulk data from the NUMA Infinity Fabric as possible to prevent CPU bottlenecks and crashes as a result of competing with normal memory access.
Gallatin explains that overcoming this limitation is possible by using software optimization. By offloading TLS encryption tasks to two Mellanox adapters, the company increased the total throughput to 380 GB per second (up to 400 GB with additional customization), or 190 GB per second per network interface card (NIC).
With the CPU no longer having to perform any encryption, overall utilization drops to 50 percent with four NUMA nodes and 60 percent without NUMA. In addition, Netflix is also exploring configurations based on other platforms, including one with an Intel Xeon Platinum 8352V (Ice Lake) CPU, and Ampere's Altra Q80-30 which is a giant with 80 cores Arm Neoverse N1 running up to 3 GHz.
The Xeon testbed is capable of up to 230 GB per second without TLS offloading, and the Altra system is up to 320 GB per second. Not satisfied with the 400 GB per second yield, the company has built a new system that handles 800 GB per second network connections. However, some of the necessary components did not arrive in time to carry out any tests.