Majestic Labs raises $100m for memory pooling AI server

Majestic Labs has raised $100 million in Series A funding for a memory-pooled AI server architecture that delivers up to 100 TB of DRAM per accelerator, addressing the growing memory bandwidth bottleneck in large-scale inference.

Contents

The memory bandwidth problem
Architecture and silicon
Economic and operational advantages
Software and ecosystem
Forward-looking conclusion

The memory bandwidth problem

Current AI inference workloads are increasingly memory bandwidth-limited, not compute-bound. As models grow in size and context lengths extend, the economics of traditional GPU servers degrade sharply. Majestic’s founders—former Google and Meta silicon veterans—identified that compute is scaling faster than memory bandwidth, making most inference runs suboptimal.

HBM provides high bandwidth but limited capacity. CXL offers large capacity but low bandwidth. Neither solution adequately serves the demands of next-generation AI models. Majestic’s approach disaggregates memory from compute, allowing each to scale independently.

Architecture and silicon

Majestic is developing two chips: a memory interface chiplet and a many-core AI accelerator. The server pools over 100 TB of standard LPDDR memory, connected via a proprietary high-bandwidth, low-latency interface. Up to 12 accelerator chips access the entire memory pool as a single, flat, contiguous address space with uniform bandwidth and latency.

This design eliminates the multi-tier memory hierarchy found in GPU clusters (local HBM, peer HBM, host LPDDR), which requires complex software optimization. The accelerator is fully programmable, using a large array of CPU cores and matrix multiplication engines, with licensed IP from a third party that also supplies the compiler and low-level software stack.

Economic and operational advantages

By using off-the-shelf LPDDR instead of HBM, Majestic reduces both cost and supply chain risk. The server supports flexible compute-to-memory ratios, configurable from one to 12 compute chips and 8 to 128 TB of memory. Customers can adjust ratios post-deployment, or mix Majestic servers with Nvidia GPUs for different phases of inference (prefill vs. decode).

GPU-based systems often over-provision GPUs solely to increase memory capacity, leading to low utilization and higher power consumption. Majestic’s architecture claims to support significantly more concurrent users per server, directly lowering operational costs. The company has already received substantial orders from hyperscalers, neoclouds, and large enterprises.

Software and ecosystem

Majestic’s software stack can already lower HuggingFace models to executable code running on a simulated server. The company is leaning toward open-source projects like Triton and vLLM to accelerate adoption. The founders emphasize that success depends not just on hardware performance, but on how quickly customers can deploy and integrate the system into existing workflows.

Forward-looking conclusion

Majestic Labs is targeting a fundamental inefficiency in AI infrastructure: the mismatch between compute and memory scaling. By pooling memory independently and simplifying the programming model, the company could offer a more cost-effective alternative for inference at scale. The $100 million raise and early customer orders suggest the market sees real demand for architectures that decouple memory from compute, especially as model sizes continue to grow. If Majestic delivers on its bandwidth and latency targets, it may carve out a significant niche alongside—rather than in direct competition with—Nvidia’s GPU ecosystem.

Majestic Labs raises $100m for memory pooling AI server

Majestic Labs has raised $100 million in Series A funding for a memory-pooled AI server architecture that delivers up to 100 TB of DRAM per accelerator, addressing the growing memory bandwidth bottleneck in large-scale inference.

The memory bandwidth problem

Architecture and silicon

Economic and operational advantages

Software and ecosystem

Forward-looking conclusion

You May also Like

NVIDIA’s long-awaited N1/N1X SoC specs leak ahead of Computex launch

AI in design verification: from experimentation to measurable capability

IQE completes £81m fundraising

Photonics becomes a foundational scaling layer for AI-era computing

About ChipNews