Translate ideas into impact™
Explore current openings and get ready to prove that impossible isn’t.Unlimit with us™
While discovering a novel therapeutic protein can be ground-breaking, manufacturing the biologic drug is often a big barrier to bringing it to market especially for the next-gen modalities which don’t exist in nature. Producing novel proteins at high titers with high quality is a crucial challenge that has long been the expertise of Absci.
Here, we provide a practical example of how we employed AI technology to identify novel chaperones, including one that helped double the production titer of a hard-to-produce protein. Interestingly, this protein sequence wasn’t characterized as a chaperone in the public databases, and had less than 24% sequence homology to any of the canonical chaperones. The Denovium™ AI engine described here aims to predict functionality of any protein sequence (instead of just structure) and this example describes a powerful application of such a technology.
Tasked with this effort was Absci’s Denovium™ Engine, an artificial intelligence platform focused on biological sequence and functional data, including DNA and proteins. In fact the name Denovium came from the vision of designing proteins completely from scratch, ie. De novo using artificial intelligence.
Most relevant to this case study was our deep learning model of protein function, which is the most comprehensive model for determining a protein’s function directly from its sequence. It was trained on a massive curated dataset of more than 100M proteins, each of which was annotated with up to 30 distinct functional tasks, and more than 700 thousand functional labels. This includes functional ontologies, sequence homology, structural information, enzymatic activity, taxonomy, transmembrane regions, signal peptides, subcellular location and much more. It allows for real-time annotation of protein sequences, including those with unknown function and even those with no known sequence homologs.
Thanks to the speed of deep learning inference, we were able to use this model to reannotate the entirety of the Uniprot protein database over the course of a weekend on consumer-grade hardware, essentially distilling and enhancing decades of academic research.
A key feature of our AI protein model is the transformation of protein sequence data into a high-dimensional functional representation, or embedding. You can think of an embedding as a numeric vector, which represents a summary of a protein’s collective function. These embeddings serve as a mathematically powerful tool for replacing much of traditional bioinformatics approaches, while also greatly increasing the power to generalize to novel proteins.
An important example of this is the ability to search for novel functional homologs. This is done by first organizing the entire protein universe into the functional embedding space. The picture on the top right shows a 3D representation of this for half a million distinct proteins. Once indexed, proteins of interest can be used as search queries to find novel functional homologs by searching for neighbors in the functional space. This is similar to how other state-of-the-art search engines operate.
Our AI-based search technology was used for the purpose of characterizing the chaperone universe to identify useful proteins to coexpress in the SoluPro™ cell line and achieve higher titers and quality. We did this by using a comprehensive list of known chaperones as in the functional homolog search.
The resulting candidates were too many to synthesize and test individually, so the model was tasked with organizing the candidate proteins into 1000 distinct functional groups. The most representative protein from each group was then synthesized for laboratory validation. We knew this approach of organizing hits functionally would better cover the test space and have a much higher chance of success than traditional top-N ranking approaches.
We used our proprietary ACE assay to screen chaperone candidates for the production of a very difficult to express fAb. We soon found an exciting hit we named XYZ. It was a putative alkyl hydroperoxide reductase C. originally discovered in a root bacterium and was not previously considered a chaperone using traditional screening approaches. When coexpressed with our protein-of-interest, it was found to nearly double the titer and improve the quality of the product, meeting a key milestone for our partner.
This use case serves as an exciting practical demonstration towards our vision of deploying AI for proteins and strain engineering.