"We decode the 'language' of proteins through AI models”

November 2024

Driven by the mystery surrounding protein function and its link to diseases, Thomas Lemmin explores the world of these tiny machines that power life. As Assistant Professor at the Institute of Biochemistry and Molecular Medicine at the University of Bern, Switzerland, he leads a team that currently uses a linguistic approach in which Artificial Intelligence (AI) deciphers the functions encoded within protein sequences. By cracking this code, far-reaching possibilities for designing new functions into proteins with specific benefits for health and society are unlocked.

Thomas leads a dynamic team of 8 young researchers who focus on molecular modeling of proteins. (© CAIM, University of Bern)

Thomas, why is understanding proteins important?
Imagine your body as a complex symphony of biochemical reactions, wherein proteins are tiny musicians, each playing their unique notes, each having their unique functions. When someone is in good health, they play in harmony. When there is illness or disease, they produce a cacophony. By understanding how these proteins interact, we can develop new therapeutics to fine-tune their function, potentially leading to groundbreaking treatments for a wide range of diseases.

How are you studying proteins at UniBE?
In recent years, the field of molecular biology has witnessed impressive advances, fueled by groundbreaking techniques, such as cryo-electron microscopy (cryo-EM) and Next-Generation Sequencing (NGS). As part of the Dubochet Center for Imaging, the University of Bern is at the forefront of cryo-EM, a cutting-edge technique that allows us to observe proteins in their natural environments, providing sharper insights into their behavior. Simultaneously, NGS revolutionizes genomics with its comprehensive, high-throughput, and cost-effective methods for analyzing genetic material. But both cryo-EM and NGS generate vast amounts of data. This is where AI comes in as a powerful analytical tool.

AI-based protein modeling is a promising new world waiting to be explored, whose limits have yet to be determined

AI can help analyze the complexity of proteins and their functions, allowing for the discovery of new exciting biotechnology applications. Here, Thomas is discussing an AI generated 3D protein model with two researchers from his group. (© CAIM, University of Bern).

Does AI push the envelope of current protein modeling?
Absolutely. AI is revolutionizing protein modeling. In particular, open AI initiatives that share large, pre-trained models (foundation models) is a game-changer. It allows us to build specialized protein modeling tools for our research much faster than starting from scratch. Furthermore, the massive datasets becoming available in biomedical research are a perfect fit for AI's strengths. We thus believe AI can help us tackle more complex problems and significantly accelerate our research.
However, AI models are somewhat like black boxes: their predictions are powerful but understanding the "why" behind them can be challenging. This interpretability issue is a key focus in my lab. We are convinced that the traditional methods, based on physical molecular models, can offer a crucial scientific foundation for interpreting and validating AI's predictions. Vice versa, AI can be used to refine and improve traditional models, creating a feedback loop that fuels further advancements.
We are also looking into using techniques from unrelated fields, such as linguistics to analyze and model proteins in innovative ways. This cross-disciplinary approach could lead to groundbreaking discoveries in protein function and design.
In short, AI-based protein modeling is a promising new world waiting to be explored, whose limits have yet to be determined.

The University of Bern is at the forefront of providing sharper insights into protein behavior

Protein design using AI techniques can offer the opportunity to develop effective vaccines and treatments for diverse “elusive” diseases. At the same time, as is the case with all new technology, AI harbors the potential for abuse and thus should be regulated, Thomas cautions. (© CAIM, University of Bern).

What would be a real-world benefit of such research?
Take, for example, HIV, a virus notorious for its rapid mutations and complex glycan shield, which I started working on at the National Institutes of Health (NIH), in Bethesda (USA), before coming here. Some people naturally develop a stronger immune response against diverse HIV strains. By understanding why this happens at the protein level, we hope to design an effective HIV vaccine that mimics this mechanism. This is just one example of how protein research translates into real-world benefits for health.

What drives you in your research?
Here in Bern, where I have been these past two years, I get to share my passion! I teach biochemical principles to biomedical engineering students, trying to highlight the incredible world of proteins from an engineering perspective. In addition, I enjoy demystifying AI for biology and medical students so that they can understand and build an “informed trust” on them.
The interdisciplinary research environment at the Institute of Biochemistry and Molecular Medicine here is particularly stimulating. This constant exchange with experimental groups is what excites me most. It allows us to close the loop between computational predictions and real-world applications, crucial for determining the strengths and limitations of our computational models, and for improving them.
Unraveling how things work has always driven me to push beyond theory. It is one thing to understand a system, but it's another to create or manipulate it – that's where true understanding lies.

(© CAIM, University of Bern)

Dr. Thomas Lemmin is an Assistant Professor at the Institute of Biochemistry and Molecular Medicine (IBMM) of the University of Bern, Switzerland, where he established his first research group with the help of an SNF Eccellenza Fellowship. He is dedicated to bridging the gap between computational and experimental methods to unlock the mysteries of complex biological systems at the molecular level. At present, his research leverages the power of Deep Learning to decipher the "language" of proteins, combining cutting-edge Deep Learning techniques with traditional modeling methods to develop novel protein design strategies.

Thomas Lemmin's research journey began with a PhD in Molecular Dynamics (MD) of membrane proteins at the Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland. An SNSF fellowship then led him to the University of California, San Francisco (UCSF), USA where he investigated the molecular base of Alzheimer's disease and designed proteins that bind to unstable small molecules. Notably, he pioneered the use of MD simulations and graph theory to study the dynamics of the HIV-1 Env glycan shield in collaboration with Vaccine Research Center of the National Institutes of Health (NIH), USA.

Currently, Dr. Lemmin focuses on modeling proteins as a language, a crucial step in data-driven protein engineering. By utilizing large language models to learn the biomolecular "grammar," he aims to refine understanding of the relationship between the structure, dynamics and function of proteins. This is followed by the question whether we can computationally predict and tailor new functions into proteins.

Publications

  • Lazaridi, S., Yuan, J., & Lemmin, T. (2024). Atomic insights into the signaling landscape of E. coli PhoQ Histidine Kinase from Molecular Dynamics simulations. Scientific reports, 14(1), 17659.
  • Gut, J. A., & Lemmin, T. (2024). Dissecting AlphaFolds Capabilities with Limited Sequence Information. bioRxiv, 2024-03.
  • Rodella, C., Lazaridi, S., & Lemmin, T. (2024). TemBERTure: Advancing protein thermostability prediction with Deep Learning and attention mechanisms. Bioinformatics Advances, Volume 4, Issue 1.
  • Hassan-Harrirou, H., Zhang, C., & Lemmin, T. (2020). RosENet: improving binding affinity prediction by leveraging molecular mechanics energies with an ensemble of 3D convolutional neural networks. Journal of chemical information and modeling, 60(6), 2791-2802.