The Full-Stack SuperClusters include air- and liquid-cooled training and cloud-scale inference rack configurations with the latest NVIDIA Tensor Core GPUs, networking and NVIDIA AI Enterprise software.
Supermicro, Inc., a provider of total IT solutions for AI, cloud, storage and 5G/Edge, announces its latest portfolio to accelerate the deployment of generative AI. The Supermicro SuperCluster solutions provide foundational building blocks in current and future large language model (LLM) infrastructure.
The three powerful Supermicro SuperCluster solutions are now available for generative AI workloads. The 4U liquid-cooled systems or the 8U air-cooled systems are purpose-built and designed for high-performance LLM training, as well as large batch and high-volume LLM inference. A third SuperCluster, with 1U air-cooled Supermicro NVIDIA MGX ™ systems, is optimized for cloud-scale inference.
“In the age of AI, compute is now measured by clusters, rather than just the number of servers. With our extensive global production capacity of 5,000 racks/month, we can deliver complete generative AI clusters to our customers faster than ever,” says Charles Liang, president and CEO of Supermicro . “A 64-node cluster enables 512 NVIDIA HGX H200 GPUs with 72 TB HBM3e over a pair of our scalable cluster building blocks with 400 Gb/s NVIDIA Quantum-2 InfiniBand and Spectrum-X Ethernet networking. Supermicro’s SuperCluster solutions in combination with NVIDIA AI Enterprise software is ideal for enterprise and cloud infrastructures to train today’s LLMs with up to trillions of parameters. The interconnected GPUs, CPUs, memory, storage and networking deployed across multiple nodes in racks are the foundation of today’s AI .Supermicro’s SuperCluster solutions provide foundational building blocks for rapidly evolving generative AI and LLMs.”
“NVIDIA’s latest GPU, CPU, networking and software technologies enable system makers to accelerate a range of next-generation AI workloads for global markets,” said Kaustubh Sanghani , vice president of GPU Product Management at NVIDIA. “By leveraging the NVIDIA accelerated computing platform with Blackwell architecture-based products, Supermicro provides customers with the advanced server systems they need that can be easily deployed in data centers.”
Supermicro 4U NVIDIA HGX H100/H200 8-GPU systems double the density of the 8U air-cooled system by using liquid cooling, reducing energy consumption and lowering the total cost of ownership of data centers. These systems are designed to support the next generation of GPUs based on the NVIDIA Blackwell architecture. The Supermicro Cooling Distribution Unit (CDU) and Manifold (CDM) are the main arteries for distributing chilled liquid to Supermicro’s custom direct-to-chip (D2C) cold plates. This keeps GPUs and CPUs at optimal temperatures so they deliver maximum performance. This cooling technology can reduce electricity costs for the entire data center by up to 40% and requires less space in the building.
Systems with NVIDIA HGX H100/H200 8-GPU technology are ideal for training Generative AI. The fast, interconnected GPUs via NVIDIA ® NVLink ® , the high bandwidth and capacity of the GPU memory are essential for cost-effectively running LLM models. The Supermicro SuperCluster creates a massive pool of GPU resources that work as a single AI supercomputer.
Whether customizing a massive foundation model trained on a dataset with trillions of tokens or building a cloud-scale LLM inference infrastructure, the spine and leaf topology with non-blocking 400Gb/s network fabrics makes it possible to seamlessly scale from 32 nodes to thousands of nodes. With fully integrated liquid cooling, operational effectiveness and efficiency are thoroughly proven in Supermicro’s testing processes before shipment.
Supermicro‘s NVIDIA MGX ™ system designs with the NVIDIA GH200 Grace Hopper Superchips create a blueprint for future AI clusters that address a critical bottleneck in Generative Al: the bandwidth and capacity of GPU memory for running large Language Models (LLM) with large inference batches to reduce operational costs. The 256-node cluster makes it a high-volume cloud inference powerhouse that is easily deployed and scalable.
SOURCE: PRNewswire