Amplify Partners, Canva, Leonardo.Ai, and OctoML are among customers looking forward to using Amazon EC2 Capacity Blocks for ML
Advancements in ML have unlocked opportunities for organizations of all sizes and across all industries to invent new products and transform their businesses. Traditional ML workloads demand substantial compute capacity, and with the advent of generative AI, even greater compute capacity is required to process the vast datasets used to train foundation models (FMs) and large language models (LLMs). Clusters of GPUs are well suited for this task because their combined parallel processing capabilities accelerate the training and inference processes. However, with more organizations recognizing the transformative power of generative AI, demand for GPUs has outpaced supply. As a result, customers who want to leverage the latest ML technologies, especially those customers whose capacity needs fluctuate depending on where they are at in the adoption phase, may face challenges accessing clusters of GPUs necessary to run their ML workloads. Alternatively, customers may commit to purchasing large amounts of GPU capacity for long durations, only to have it sit idle when they aren’t actively using it. Customers are looking for ways to provision the GPU capacity they require with more flexibility and predictability, without having to make a long-term commitment.
With EC2 Capacity Blocks, customers can reserve the amount of GPU capacity they need for short durations to run their ML workloads, eliminating the need to hold onto GPU capacity when not in use. EC2 Capacity Blocks are deployed in EC2 UltraClusters, interconnected with second-generation Elastic Fabric Adapter (EFA) petabit-scale networking, delivering low-latency, high-throughput connectivity, enabling customers to scale up to hundreds of GPUs. Customers can reserve EC2 UltraClusters of P5 instances powered by NVIDIA H100 GPUs for a duration between one to 14 days, at a future start date up to eight weeks in advance, and in cluster sizes of one to 64 instances (512 GPUs)—giving customers the flexibility to run a broad range of ML workloads and only pay for the amount of GPU time needed. EC2 Capacity Blocks are ideal for completing training and fine tuning ML models, short experimentation runs, and handling temporary future surges in inference demand to support customers’ upcoming product launches as generative applications become mainstream. Once an EC2 Capacity Block is scheduled, customers can plan for their ML workload deployments with certainty, knowing they will have the GPU capacity when they need it.
Also Read: SiMa.ai Appoints Chief Business Officer to Accelerate Growth and Machine Learning Adoption at the…
“AWS and NVIDIA have collaborated for more than 12 years to deliver scalable, high-performance GPU solutions, and we are seeing our customers build incredible generative AI applications that are transforming industries,” said David Brown, vice president of Compute and Networking at AWS. “AWS has unmatched experience delivering NVIDIA GPU-based compute in the cloud, in addition to offering our own Trainium and Inferentia chips. With Amazon EC2 Capacity Blocks, we are adding a new way for enterprises and startups to predictably acquire NVIDIA GPU capacity to build, train, and deploy their generative AI applications—without making long-term capital commitments. It’s one of the latest ways AWS is innovating to broaden access to generative AI capabilities.”
Since its founding in 1993, NVIDIA has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI, and is fueling industrial digitalization across markets. “Demand for accelerated compute is growing exponentially as enterprises around the world embrace generative AI to reshape their business,” said Ian Buck, vice president of Hyperscale and HPC Computing at NVIDIA. “With AWS’s new EC2 Capacity Blocks for ML, the world’s AI companies can now rent H100 not just one server at a time but at a dedicated scale uniquely available on AWS—enabling them to quickly and cost-efficiently train large language models and run inference in the cloud exactly when they need it.”
Customers can use the AWS Management Console, Command Line Interface, or SDK to find and reserve available Capacity Blocks. With EC2 Capacity Blocks, customers only pay for the amount of time they reserve. EC2 Capacity Blocks are available in the AWS US East (Ohio) Region, with availability planned for additional AWS Regions and Local Zones.
Amplify Partners works with engineers, professors, researchers, and open-source project creators to help turn their bold ideas into beloved products and companies. “We have partnered with several founders who leverage deep learning and large language models to bring ground-breaking innovations to market,” said Mark LaRosa, partner at Amplify Partners. “We believe that predictable and timely access to GPU compute capacity is fundamental to enabling founders to not only quickly bring their ideas to life but also continue to iterate on their vision and deliver increasing value to their customers. Availability of up to 512 NVIDIA H100 GPUs via EC2 Capacity Blocks is a game-changer in the current supply-constrained environment, as we believe it will provide startups with the GPU compute capacity they need, when they need it, without making long-term capital commitments. We are looking forward to supporting founders building on AWS by leveraging GPU capacity blocks and its industry-leading portfolio of machine learning and generative AI services.”NVIDIA
SOURCE: BusinessWire