Monday, December 23, 2024

Tachyum Solidifies Reliability, Availability and Serviceability of Prodigy Universal Processor

Related stories

Doc.com Expands AI developments to Revolutionize Healthcare Access

Doc.com, a pioneering healthcare technology company, proudly announces the development...

Amesite Announces AI-Powered NurseMagic™ Growth in Marketing Reach to Key Markets

Amesite Inc., creator of the AI-powered NurseMagic™ app, announces...

Quantiphi Joins AWS Generative AI Partner Innovation Alliance

Quantiphi, an AI-first digital engineering company, has been named...
spot_imgspot_img

Tachyum® added to its extensive white paper library with the publication of an overview examining its Reliability, Availability and Serviceability (RAS) strategy, including a detailed look at the key RAS features being built into Prodigy®, the world’s first Universal Processor, which will help satisfy the demands of data centers.

RAS is a set of related attributes that must be considered when designing, manufacturing, purchasing and utilizing a computer product or component. Designed from the ground up, Tachyum’s comprehensive RAS strategy encompasses multiple facets at the silicon, platform and system levels to ensure Prodigy deployments provide high performance along with high reliability and availability at all levels.

Prodigy’s RAS strategy is comprised of Device RAS, which includes advanced error detection and correction in all functional blocks; System RAS, which includes critical features such as machine check and recovery working with the Linux EDAC driver; and Platform RAS, encompassing features such as redundant power supplies and ease of serviceability.

Prodigy’s memory hierarchy provides robust error detection and correction for all memory subsystems. Both the L1 I-Cache and D-Cache are protected with SECDED (single error correction double error detection), and the L2/L3 block utilizes DECTED (double error correction triple error detection), exceeding Arm’s current parity offerings. In addition to Prodigy’s memory hierarchy, other functional blocks integrate significant amounts of memory that require protection to ensure Prodigy runs error-free and maintains high RAS standards.

Additional RAS features built into Prodigy include:

  • Error correcting codes (ECC) scrubbing and data poisoning
  • Watchdog timer
  • RAID for booting
  • PCIe 5.0 RAS features

Also Read: Google Cloud and NVIDIA Expand Partnership to Scale AI Development

Tachyum has incorporated redundant power supply unit (PSU) fans, network interface card (NIC) and efficient maintenance into its Prodigy evaluation platform. When launched, Tachyum will offer data center family SKUs with a 5-year warranty/support period and 10-year warranty/support periods for enterprise/telco family SKUs.

“A comprehensive approach to RAS becomes increasingly important as process shrinks drive higher density for components and platforms,” said Dr. Radoslav Danilak, founder and CEO of Tachyum. “In addition to the ever-increasing density, manufacturing chips on shrinking process nodes increases the risk of soft errors. Prodigy addresses these increased risks with a thorough approach to device and system reliability, ensuring that Prodigy-based systems function with maximum uptime to address the performance and demands of today’s data centers.”

As part of their recent keynote for GTC 24, Nvidia stressed the importance of RAS in their latest product introduction, spending valuable keynote time to include RAS features as part of their new products and features overview, and the importance of RAS was highlighted as they showed a large potential data center deployment that would provide 645 EF of AI performance.

The Tachyum RAS paper complements an earlier Tachyum white paper which showcased a large Prodigy lead customer data center designed to run 8,000 EF of AI performance where RAS will be a critical component.

As a Universal Processor offering industry-leading performance for all workloads, Prodigy-powered data center servers can seamlessly and dynamically switch between computational domains (such as AI/ML, HPC, and cloud) with a single homogeneous architecture. By eliminating the need for expensive dedicated AI hardware and dramatically increasing server utilization, Prodigy reduces CAPEX and OPEX significantly while delivering unprecedented data center performance, power, and economics. Prodigy integrates 192 high-performance custom-designed 64-bit compute cores, to deliver up to 4.5x the performance of the highest-performing x86 processors for cloud workloads, up to 3x that of the highest performing GPU for HPC, and 6x for AI applications.

SOURCE: BusinessWire

Subscribe

- Never miss a story with notifications


    Latest stories

    spot_img