EvolutionaryScale, a frontier AI research lab for biology, launched with ESM3, a milestone AI model capable of generating novel proteins. ESM3 generated a new Green Fluorescent Protein (GFP), a process that would take 500 million years of evolution to occur naturally. This milestone generative AI model allows interactive prompting to create proteins, empowering scientists to advance applications from drug discovery, and materials science, to carbon capture.
The founding team at EvolutionaryScale and behind ESM3 are pioneers in applying AI to biology, building what is widely considered to be the first transformer language model for proteins ESM1. The ESM models have empowered groundbreaking scientific research, including a breakthrough in protein folding that helped reveal the structures of hundreds of millions of metagenomic proteins; the models have been used by scientists across the world to model and understand proteins.
EvolutionaryScale described ESM3 in a scientific preprint and released an open version of the model for scientific researchers (links below).
The Frontier Language Model for Biology
ESM3 was trained with 1 trillion teraflops – more compute than any other known model in biology – on a dataset of 2.78 billion proteins across the Earth’s natural diversity. It is the first generative model for biology that simultaneously reasons over the sequence, structure and function of proteins. This enables scientists to understand and create new proteins, making biology programmable.
“ESM3 takes a step toward a future of biology where AI is a tool to engineer from first principles, the way we engineer structures, machines, and microchips, and write computer programs,” said EvolutionaryScale co-founder and chief scientist, Alexander Rives. “We’ve been working on this for a long time, and we’re excited to share it with the scientific community and see what they do with it.”
With this capability, the model has the potential to accelerate discovery across a broad range of applications, ranging from the development of new cancer treatments to creating proteins that could help capture carbon.
Also Read: Tempus Announces Collaboration with United Therapeutics to Study Use of AI to Detect Patients at Risk for Pulmonary Hypertension
Simulating 500 Million Years of Evolution with a Language Model
Prompted through a chain of thought to reason over possible sequences and structures of GFP, ESM3 stepped across 500 million years of evolution to create a new fluorescent protein. GFP is one of the most beautiful and unique proteins in nature, responsible for the glowing of jellyfish and the vivid fluorescent colors of coral. It is the only protein that emits light, and the biological mechanism for this is unique – it is a protein that transforms itself forming a light emitting chromophore out of its own atoms.
GFP has become an important tool in molecular biology, helping scientists to see molecules inside cells. The mechanism that powers this phenomenon is incredibly complex, and generating a variant this distant by computational or experimental laboratory techniques has not been scientifically documented. New fluorescent proteins this distant from known ones have only been found through the discovery of new GFPs in the natural world. Our analysis suggests that under natural evolution it could take more than 500 million years for a protein this different to evolve.
ESM3: A Tool for Scientists
ESM3’s success in generating a new GFP underscores the model’s potential for advancements in biological research and life sciences.
EvolutionaryScale will be opening an API for closed beta and code and weights are available for a small open version of ESM3 for non-commercial use. EvolutionaryScale is also collaborating with Amazon Web Services (AWS) and NVIDIA to accelerate applications from drug discovery to synthetic biology with AI.
By working with AWS, Evolutionary Scale is making the full ESM3 model family easily accessible to hundreds of thousands of researchers around the world and nine out of the top ten global pharma companies, who already use AWS’s generative AI and health services — Amazon SageMaker, Amazon Bedrock, and AWS HealthOmics. This move will make it easier for researchers to fine-tune the ESM3 models using their own proprietary data securely, and at scale.
All versions of ESM3 will be optimized for training and inference performance through the company’s ongoing collaboration with NVIDIA, including NVIDIA BioNeMo NIMs to accelerate runtime performance and support through the NVIDIA AI Enterprise software license and at ai.nvidia.com.
Closes More Than $142 Million in Seed Funding
EvolutionaryScale also announced a seed round of more than $142 million, led by Nat Friedman and Daniel Gross, and Lux Capital, with participation from Amazon, NVentures (NVIDIA’s venture capital arm) and angel investors. Funding will be used to further expand the capabilities of its models.
Source: AWS