Startup boosts scale-up to 1000+ GPUs in a single domain

Delos Data has introduced a cluster management software stack and server design that enables practical GPU scale-up domains exceeding 1,000 GPUs, addressing the shift from training to inference workloads.

cnadmin
By
3 Min Read

Delos Data has introduced a cluster management software stack and server design that enables practical GPU scale-up domains exceeding 1,000 GPUs, addressing the shift from training to inference workloads.

Scale-up for inference

Inference workloads demand nanosecond-latency sensitivity, always-on operation, and modularity beyond full-rack GPU systems. Delos CEO Ed Doe noted that distributed inference requires a fundamentally different approach than HPC-style training, which runs over weeks or months. The startup’s Nonstop AI platform delivers scale-up connectivity—where GPUs connect directly for lower latency and better consistency—to systems that previously relied on slower scale-out networking.

Server design and topology flexibility

Delos’ server, built with a Taiwanese OEM partner, brings scale-up to the front panel via nine OSFPs per GPU, offering 72× 200 Gb/s ports per server. These servers connect via copper or optical fiber through Ethernet or circuit switches, enabling scale-up domains of 1,000 GPUs practically and up to 10,000 GPUs with topology changes. CTO Dan Daly explained that leveraging the OSFP ecosystem allows vendor choice rather than being locked into rack-level configurations.

Larger scale-up domains improve inference speed and enable heterogeneous clusters with different GPU types or AI accelerators. However, scaling introduces new failure modes: cables can loosen, switches located remotely may fail or update asynchronously. Delos’ Mosaic software stack, demonstrated at GTC, manages these risks by re-routing data through parallel paths when a cable is unplugged, with only a temporary performance dip while full throughput is restored.

Target applications and availability

While validated primarily for inference clusters, the architecture also suits training and HPC workloads. Delos has early access deployments, with broader availability planned for Q4 2026. The company advocates for workload-optimized topologies rather than prescribed GPU, interconnect, or cable configurations.

The industry’s pivot to inference demands rethinking interconnect architecture at scale. Delos’ approach—combining modular hardware with resilient software—offers a path to larger, more flexible GPU domains that reduce cost and power per token. If adopted broadly, it could shift how hyperscalers and enterprises build AI infrastructure, moving away from rigid rack-scale designs toward disaggregated, topology-aware clusters.

SOURCES:EE Times
Share This Article