Enabling Seamless Connectivity for Distributed AI Applications from Edge to Private Datacenters to Multi-cloud environments
Arrcus, the hyperscale networking software company and a leader in core, edge, and multi-cloud routing and switching infrastructure, announces the enhancement of its ACE-AI platform to address the growing demand for unified networking fabric for distributed AI workloads.
As AI workloads become increasingly distributed, driven by economic considerations and application requirements, Arrcus ACE-AI is a platform designed to seamlessly network them across various locations and deliver applications at the edge, with high speed and lossless connectivity. The emerging federated learning model for AI allows multiple entities to collaboratively train a model with decentralized data. Training models may be executed in hyperscale environments while inference models may be executed at the edge for various use cases. Arrcus recognizes the need for a unified networking fabric that interconnects these workloads, regardless of where they may reside. Modern data center applications demand high throughput (400-800Gbps) and ultra-low latency (< 10μs per hop), and Arrcus ACE-AI meets these demands while minimizing CPU overhead.
“The future of AI lies in its ubiquity, and Arrcus has built the industry’s most flexible and intelligent fabric that connects and orchestrates distributed AI workloads,” said Shekar Ayyar, Chairman and CEO of Arrcus. “With the enhanced ACE-AI platform, we are giving enterprises and service providers the power to unlock the full potential of AI, across clouds, data centers, and the edge.”
Emerging artificial intelligence, high-performance computing, and storage workloads pose new challenges for large-scale datacenter networking. Arrcus addresses these challenges by supporting new features that build a lossless Ethernet fabric, including RoCEv2, PFC, ECN, ETS, AR, Dynamic Load Balancing, and Global Load Balancing.
One of the significant challenges in achieving high-performance networking for AI workloads is the limitation of traditional TCP/IP stacks at such speeds due to their high CPU overhead. Arrcus addresses this challenge by incorporating RDMA (Remote Direct Memory Access) technology, which offloads transport communication tasks from the CPU to hardware, providing direct memory access for applications without involving the CPU. The second version of RDMA over Converged Ethernet (RoCE-v2) further enhances the protocol with UDP/IP headers with routing.
In addition to these feature enhancements, Arrcus is also pleased to announce support for new industry-leading platforms from Broadcom that are state-of-the-art 800G switching platforms – Tomahawk5, Jericho3, Ramon3 – in partnership with device manufacturers like Ufispace and Edgecore.
Also Read: Lambda Raises $320M to Build a GPU Cloud for AI
“Broadcom is very excited to collaborate with Arrcus to deliver industry-leading switching solutions that are optimized to meet the performance demands of next-generation AI workloads. Together, Arrcus and Broadcom are enabling customers to build high-performance, scalable, and intelligent data center networks,” said Ram Velaga, senior vice president and general manager, Core Switching Group, Broadcom.
Key features that make Arrcus the industry’s leading networking platform for AI:
- High-Performance Networking for AI: Support for industry-leading hardware with Broadcom switches like Tomahawk5 and Jericho3-AI, RoCEv2, SmartNICs from NVIDIA and Intel, and enabling ultra-low latency (< 10μs per hop) and high throughput (400-800Gbps) required by modern AI applications.
- Multi-Cloud Cost Optimization: Enhanced FlexMCN with Egress Cost Control (ECC) technology empowers users to analyze, allocate, and dynamically route traffic across clouds for optimal cost efficiency, reducing egress charges by up to 15%.
- Automated 5G Network Slicing for AI: Integration of SRv6 Mobile User Plane (MUP) and Flex Algo enables automated delivery of AI applications with 5G network slicing, streamlining service deployment and resource allocation.
- End-to-End Network Visibility with ArcIQ: New ArcIQ capabilities extend real-time network insights to user equipment, providing actionable data for proactive incident management.
With these latest capabilities and dedication to innovation, Arrcus is enabling organizations to break through the limitations of traditional networking and embrace the full potential of distributed AI. The enhanced ACE-AI platform signifies a pivotal step in building the intelligent fabric for the AI-powered future.
SOURCE: BusinessWire