NVIDIA Reportedly Plans GPU-Direct Storage for Vera Rubin, Raising Expectations for HBF Beyond HBM

NVIDIA plans to introduce GPU-Initiated Direct Storage Access with its Vera Rubin AI platform.

cnadmin
By
3 Min Read

NVIDIA plans to introduce GPU-Initiated Direct Storage Access (GIDS) with its Vera Rubin AI platform, a move that could fundamentally reshape AI memory architectures by enabling GPUs to bypass CPUs and directly control storage devices.

Architectural Shift

GIDS represents a significant evolution from existing GPU Direct Storage (GDS) architecture. Under GDS, the CPU must issue data requests to storage devices before data transfers to GPUs occur. GIDS eliminates this intermediary step, allowing GPUs to access NAND-based storage directly without CPU or DRAM involvement.

This change addresses a core inefficiency in traditional von Neumann computing: CPUs are structurally limited in thread processing, while GPUs can generate tens of thousands of parallel threads. By removing the CPU bottleneck, GIDS enables storage to keep pace with GPU processing speeds.

Memory Capacity Implications

The shift could dramatically expand GPU memory capacity. NAND flash offers roughly 30 times higher bit density than DRAM. According to Yonsei University Professor Song Ki-hwan, combining six high-bandwidth flash (HBF) units with two HBM units could increase GPU memory capacity more than 16 times—from 192GB to 3,120GB. This would potentially support AI models with parameter sizes approximately 16 times larger than current architectures.

HBF stacks NAND flash vertically using through-silicon vias (TSVs), similar to HBM construction. This approach places ultra-fast NAND closer to GPUs, addressing future AI bottlenecks while easing pressure on HBM capacity.

Endurance and Workload Considerations

NAND flash has inherent endurance limits, typically supporting around 100,000 write-and-erase cycles versus DRAM’s near-unlimited write capability. Consequently, HBF is best suited for storing AI model parameters, which remain largely static during inference and function as read-only workloads.

GPU-HBM data transfer already accounts for roughly half of total system power, strengthening the case for HBF architectures. Memory makers are responding: Samsung Electronics is reportedly developing both next-generation high-performance Z-NAND and GIDS technology that would allow GPUs to directly access Z-NAND-based storage devices.

Outlook

The emergence of GIDS and HBF signals a structural shift in AI memory design, moving from CPU-mediated data flows to GPU-native storage access. While adoption will require higher-performance NAND and careful workload partitioning, the potential to increase memory capacity by an order of magnitude—while reducing power consumption—positions this architecture as a critical enabler for next-generation AI models. If realized, GIDS could redefine the memory hierarchy for AI computing, making storage a first-class participant in GPU-centric systems.

Share This Article