Demystifying codon optimization with artificial intelligence
Absci creates the largest expression database of its kind to overcome a longstanding codon optimization problem.
With ultra-high-quality expression data from just three well-studied proteins, Absci’s AI model could generalize accurate predictions relating codon optimization to protein yield.
Codon optimization is the process of finding the perfect DNA sequence to maximize the production of the desired protein therapeutic in a host cell. There are a lot of codon optimization tools out there, but what are they doing?
In this preprint manuscript, we describe how we used the largest expression database of its kind to train our AI models to make accurate predictions relating codon optimization to protein yield. The work represents a robust, accurate AI model to optimize DNA codon sequences to maximize therapeutic protein yield.
The technical details are in the manuscript below. Here are a few takeaways:
We used our scalable wet lab technologies to generate the largest synonymous mutant expression dataset we know of – a feat on its own.
We used large language models (LLMs) to learn natural patterns of codon usage, predict expression levels, and ultimately design high-expressing coding sequences (CDSs) on proteins outside our training set.
We measured functional activity of three different proteins to ensure we produced properly folded, soluble molecules — not misfolded ones.
Our model outperformed commercially-available algorithms, suggesting it had learned fundamental rules governing codon optimization.
AI-based codon optimization could theoretically be applied across protein classes to save significant time and money by maximizing protein production at scale.
In drug creation, this is an exciting tool for increasing expression levels of recombinant proteins, including biologics such as antibodies. Increasing production yields of therapeutic antibodies can increase the availability and accessibility of drugs to patients.
The bottom line: Absci has demonstrated the application of AI to solve another longstanding challenge in the field – creating a robust, accurate AI model to optimize DNA codon sequences to maximize therapeutic protein yield. This could potentially save significant resources in drug creation.
Absci recently announced being the first to design and validate de novo therapeutic antibodies with zero-shot generative AI, creating novel antibodies whose in-silico designs were tested and validated in the wet lab — without further lab optimization or affinity maturation. You can read more about that work here.
Absci also recently showed its ability to simultaneously optimize multiple parameters important to drug developability, including binding affinity and Naturalness score – a measure associated with drug developability and immunogenicity. More details on that work can be found here.
Please note that the preprint manuscript has not undergone peer review, the findings are provisional, and the conclusions may change.