Tom's Hardware

GPUs can now use PCIe-attached memory or SSDs to boost VRAM capacity —Panmnesia's CXL IP claims double-digit nanosecond latency

By Anton Shilov,

2 days ago

Modern GPUs for AI and HPC applications come with a finite amount of high-bandwidth memory (HBM) built into the device, limiting their performance in AI and other workloads. However, new tech will allow companies to expand GPU memory capacity by slotting in more memory connected to the PCIe bus instead of being completely limited by the memory built into the device — it even allows using SSDs for memory capacity expansion, too. Panmnesia, a company backed by South Korea's renowned KAIST research institute, has developed a low-latency CXL IP that could be used to expand GPU memory using CXL memory expanders.

The memory requirements of more advanced datasets for AI training are growing rapidly, which means that AI companies either have to buy new GPUs, use less sophisticated datasets, or use CPU memory at the cost of performance. Although CXL is a protocol that formally works on top of a PCIe link, thus enabling users to connect more memory to a system via the PCIe bus, the technology has to be recognized by an ASIC and its subsystem, so just adding a CXL controller is not enough to make the technology work, especially on a GPU.

Panmnesia faced challenges integrating CXL for GPU memory expansion due to the absence of a CXL logic fabric and subsystems that support DRAM and/or SSD endpoints in GPUs. In addition, GPU cache and memory subsystems do not recognize any expansions except unified virtual memory (UVM), which tends to be slow.

https://img.particlenews.com/image.php?url=1FgdEO_0uBkpGE500 — (Image credit: Panmnesia)

To address this, Panmnesia developed a CXL 3.1-compliant root complex (RC) equipped with multiple root ports (RPs) that support external memory over PCIe) and a host bridge with a host-managed device memory (HDM) decoder that connects to the GPU's system bus. The HDM decoder, responsible for managing the address ranges of system memory, essentially makes the GPU's memory subsystem 'think' that it is dealing with system memory, but in reality, the subsystem uses PCIe-connected DRAM or NAND. That means either DDR5 or SSDs can be used to expand the GPU memory pool.

https://img.particlenews.com/image.php?url=3YglkV_0uBkpGE500 — (Image credit: Panmnesia)

The solution (based on a custom GPU and marked as CXL-Opt) underwent extensive testing, showing a two-digit nanosecond round-trip latency (compared to 250ns in the case of prototypes developed by Samsung and Meta, which is marked as CXL-Proto in the graphs below), including the time needed for protocol conversion between standard memory operations and CXL flit transmissions, according to Panmnesia. It has been successfully integrated into both memory expanders and GPU/CPU prototypes at the hardware RTL, demonstrating its compatibility with various computing hardware.

https://img.particlenews.com/image.php?url=1jTruT_0uBkpGE500 — (Image credit: Panmnesia)

As tested by Panmnesia, UVM performs the worst among all tested GPU kernels due to overhead from host runtime intervention during page faults and transferring data at the page level, which often exceeds the GPU's needs. In contrast, CXL allows direct access to expanded storage via load/store instructions, eliminating these issues.

Consequently, CXL-Proto's execution time is 1.94 times shorter than UVM. Panmnesia's CXL-Opt further reduces execution time by 1.66 times, with an optimized controller achieving two-digit nanosecond latency and minimizing read/write latency. This pattern is also evident in another figure, which displays IPC values recorded during GPU kernel execution. It reveals that Panmnesia's CXL-Opt achieves performance speeds 3.22 times and 1.65 times faster than UVM and CXL-Proto, respectively.

In general, CXL support can do a lot for AI/HPC GPUs, but performance is a big question. Additionally, whether companies like AMD and Nvidia will add CXL support to their GPUs remains to be seen. If the approach of using PCIe-attached memory for GPUs does gather steam, only time will tell if the industry heavyweights will use IP blocks from companies like Panmnesia or simply develop their own tech.

Expand All

Read in NewsBreak

Comments / 0

Add a Comment

Digital Trends3 days ago

Linux 6.10 Lands Improved Support For LG's Latest Laptops

phoronix.com4 days ago

Intel's new CPU socket may not need custom contact frames to improve chip temperatures anymore — the LGA1851 socket allegedly features optional "Reduced Load ILM" to improve temps

Tom's Hardware1 day ago

A benchmark leak reveals how powerful the AMD's flagship "Strix Point" APU will be

xda-developers2 days ago

ASUS sent us a monster version of the ExpertBook CX54 Chromebook Plus

chromeunboxed.com1 day ago

Here's how you can quickly turn an old laptop or desktop PC into a powerful Windows server

xda-developers20 days ago

Noctua's next-gen flagship CPU cooler finally arrives — Noctua NH-D15 G2 released at $150

Tom's Hardware2 days ago

World's first Thunderbolt 5 cable launched, 120 Gbps and 240W charging for $23 — Cable Matters new cable available now

Tom's Hardware3 days ago

The first non-Galaxy phone with an overclocked Snapdragon 8 Gen 3 is here

Android Authority21 hours ago

Microsoft make major Windows 10 U-turn ahead of end-of-support in 2025

Laptop27 days ago

LG Unveils 2024 Gram SuperSlim OLED Laptop At A Shockingly Low Price

Hot Hardware1 day ago

Dell cut the price of this XPS 15 laptop by $400 today

Digital Trends1 day ago

At 0.72 pounds, Vaio’s first portable monitor is one of the lightest ever

The Verge2 days ago

MSI launches two 240Hz QD-OLED gaming monitors — new 34 and 27-inchers come with 1440p visuals and USB-C connectivity

Tom's Hardware2 days ago

YouTuber installed Windows XP on a touchscreen MacBook and it went exactly as you'd expect

TechSpot13 days ago

Teclast T50 Max: 11-inch tablet with 90 Hz display, LTE and Android 14 presented

notebookcheck.net14 hours ago

The Best Laptops of 2024

howtogeek.com4 days ago

Powerful RedMagic 9S Pro gaming smartphones are now official

Android Headlines8 hours ago

Point and Click HPC: High-Performance Desktops

HPCwire23 hours ago

Motorola Razr 50 Ultra launched in India

gsmarena.com10 hours ago

Welcome to NewsBreak, an open platform where diverse perspectives converge. Most of our content comes from established publications and journalists, as well as from our extensive network of tens of thousands of creators who contribute to our platform. We empower individuals to share insightful viewpoints through short posts and comments. It’s essential to note our commitment to transparency: our Terms of Use acknowledge that our services may not always be error-free, and our Community Standards emphasize our discretion in enforcing policies. We strive to foster a dynamic environment for free expression and robust discourse through safety guardrails of human and AI moderation. Join us in shaping the news narrative together.

Comments / 0

Community Policy