Abstract
Deep generative models have demonstrated success in learning the protein sequence to function relationship and designing synthetic sequences with engineered functionality. We introduce the Protein Transformer Variational AutoEncoder (ProT-VAE) as an accurate, generative, fast, and transferable model for data-driven protein design that blends the merits of variational autoencoders to learn interpretable, low-dimensional latent embeddings for conditional sequence design with the expressive, alignment-free featurization offered by transformer-based protein language models. We implement the model using NVIDIA’s BioNeMo framework and validate its performance in retrospective functional prediction and prospective functional design. The model identifies a phenylalanine hydroxylase enzyme with 2.5× catalytic activity over wild-type, and a γ-carbonic anhydrase enzyme with a melting temperature elevation of ΔTm = +61 ◦C relative to the most thermostable sequence reported to date and activity in 23% v/v methyl diethanolamine at pH 11.25 and 93 ◦C corresponding to industrially relevant conditions for enzymatic carbon capture technologies. The ProT-VAE model presents a powerful and experimentally validated platform for machine learning-guided directed evolution campaigns to discover synthetic proteins with engineered function.
| Original language | English |
|---|---|
| Article number | e2408737122 |
| Journal | Proceedings of the National Academy of Sciences of the United States of America |
| Volume | 122 |
| Issue number | 41 |
| DOIs | |
| State | Published - Oct 2025 |
| Externally published | Yes |
Keywords
- generative modeling
- protein design
- protein language models
- transformers
- variational autoencoders
Fingerprint
Dive into the research topics of 'ProT-VAE: Protein Transformer Variational AutoEncoder for functional protein design'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver