Wednesday, July 3, 2024

Groq Smashes LLM Performance Record Again Using an LPU™ System With No Response From GPU Companies

Related stories

spot_imgspot_img

Groq, an artificial intelligence (AI) solutions provider, announced it has more than doubled its inference performance of the Large Language Model (LLM), Llama-2 70B, in just three weeks and is now running at more than 240 tokens per second (T/s) per user on its LPU™ system. As mentioned in its previous press release, Groq was the first to achieve 100T/s per user for Llama-2 70B.

Now that Groq smashed the performance record twice, generating a user experience for language responses at over 240T/s per user, is it possible that there could be room for more performance improvement on their first-gen 14nm silicon fabbed in the US?

Jonathan Ross, CEO and founder of Groq, shared, “Groq broke a record a few weeks ago by being the first to hit 100 tokens per second per user on Llama-2 70B–a record that no one has responded to with competitive performance. We announce 240T/s per user! It’s becoming unclear if GPUs can keep up with the Groq Language Processing Unit™ (LPU™) system on Large Language Models.”

Jay Zaveri, Social Capital partner, founder of Dropbox-acquired CloudOn, and Groq Board Member, commented, “The ultimate language processing system combines great software, programmability, ease of use, scalability, wrapped over a best-in-class processor. Groq has been building such a system quietly the last few years and has superior token throughput, token per dollar, and token per watt. While others may try to catch up, Groq is well on its way to roll out its systems to the people and customers who matter–developers who are building the future of AI.”

Also Read: MixMode Strengthens Generative AI Threat Detection Platform with New Enhancements

In private demo showings, Groq customers are seeing a new world of possibilities, going as far as to say that Groq solutions are making them consider new low latency LLM use cases for their verticals. For example, LLMs deployed to monitor large amounts of text data, from sources like online forums and social media, can help rapidly detect potential cyberattacks or security breaches. Ultra-low latency is essential to ensure real-time analysis and response, playing a pivotal role in safeguarding sensitive information, critical infrastructure, and national security interests.

Additionally, LLMs can be deployed to transform local emergency responses during natural disasters. Using real-time data from social media, emergency calls, or weather reports, the models can identify critical geographic areas needing assistance, predict threats, and provide accurate guidance to first responders and affected communities. Ultra-low latency can mean quick delivery of life-saving information, better prepared disaster management, and increased public trust. With real-time, fluid user experiences and using the most current and valuable data available, LLMs will continue to dominate more and more of the AI market and create impact with real-world applications.

SOURCE: PRNewswire

Subscribe

- Never miss a story with notifications


    Latest stories

    spot_img