Alex Bateman1
1EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
agb [at] ebi.ac.uk
Abstract
We are living through a revolution in AI approaches, which is transforming molecular biology and computational biology. I will discuss how the advent of high accuracy structural models has made a large impact in our ability to completely and accurately classify protein domains. I will also talk about how Deep Learning models such as ProtENN developed by Google Research have expanded our ability to find distant homologues for known protein families. I will argue that these models represent the most significant change in protein classification in three decades. Even more recently we have seen to arrival of Large Language Models such as ChatGPT, which may now enable us to develop high throughput tools for annotating proteins, non-coding RNAs and families, if only we can stop them hallucinating! I will talk about our efforts to harness these models to write accurate and verifiable annotation at scale.
Keywords: AI, protein domains, molecular biology databases