The new Instinct MI350P packs 144GB of HBM3E memory and cranks out up to 40% more theoretical FP16 and FP8 compute than its green rival.
This isn’t a flashy flagship—it’s a workhorse. The MI350P slides into existing air-cooled servers as a drop-in upgrade, no plumbing required. It runs at 600W in a fanless dual-slot design, relying on chassis fans to keep it chill, or you can dial it back to 450W for tighter racks.
Under the hood, AMD’s CDNA4 architecture sits on TSMC’s 3nm and 6nm nodes. You get 8,192 cores, 128 compute units, 512 matrix cores, and a 2.2GHz clock. The 144GB of HBM3E memory offers 4TB/s bandwidth—enough to feed even the hungriest large language models. And yes, it natively supports MXFP6 and MXFP4 precision, which is shorthand for running AI inference faster without sacrificing accuracy.
The card targets inference and RAG pipelines, from small setups up to eight-card clusters. AMD claims it’s the fastest enterprise PCIe AI accelerator out there, with peak MXFP4 hitting 4,600 TFLOPs. That’s not just marketing fluff—it edges Nvidia’s H200 NVL by 20% in FP64, 43% in FP16, and 39% in FP8.
Nvidia hasn’t announced a PCIe version of its B200 Blackwell yet. That leaves AMD with the bleeding edge in this form factor—for now. The real question isn’t specs, though. It’s adoption. CUDA’s grip is tight, and AMD’s ROCm software stack still has ground to cover. But with moves like this, the gap is narrowing. The MI350P isn’t just a card; it’s a signal that the AI hardware race is finally a two-horse show.
