Venue: BBS
Ana M. Rojas
CSIC
Andalusian Center for Developmental Biology (CABD), Sevilla, Spain
Invited by Dr Harald Wodrich
Laboratoire Microbiologie Fondamentale et Pathogénicité (MFP), UMR 5234
Title
Annotating “dark” proteomes using AI-based approaches.
Abstract
Gene/protein function annotation is a challenging problem in computational biology. Traditional methods relying on sequence homology do not scale in the absence of
identifiable sequence similarity. We have extensively tested various deep learning-based methods (CNNs and general Protein Language Models) on full proteomes to assess their performance at the organism level. We found that the ProtTrans language model is the best model for protein annotation in our benchmark and is suitable for further functional genomics (e.g., recovering functional information from RNAseq experiments). We next applied the best methods to annotate 24 million genes from 1,000 species across the animal phyla. These organisms contain a substantial fraction of « dark » proteomes (~50% annotation on average by homology-based methods). We successfully annotated all of them and identified genes that are biologically relevant and coherent with the organism’s biology. We have devised a computational pipeline to annotate them all.
Biosketch
Ana M. Rojas is a CSIC Research Scientist. Her main expertise focuses on protein evolution and function. A biologist by training in Madrid and the USA, she specialized in
bioinformatics and computational biology in various labs in the USA (Prof. Doolittle’s lab at UCSD, under a NASA-NSCORT fellowship, and Prof. Adam Godzik at The Burnham
Institute in La Jolla) and in Spain (as a Marie Curie fellow at CNB-CSIC under Prof. Valencia). Later, she accepted a staff scientist position at CNIO in Madrid, where she
stayed from 2006 to 2009, and moved to Barcelona in 2009, where she established her independent group. In 2013, she moved to the Institute of Biomedicine of Seville and
subsequently relocated her group to the Andalusian Center for Developmental Biology (CABD) in Seville in 2018. She has been a Track Chair of the International Society for
Computational Biology (ISCB) Function COSI since 2024. She is also a founding member of the Spanish Society of Computational Biology (SEBIBiC), launched in 2020. She is a
researcher at the ENIA-Chair Univ. of Seville / Google Spain for Artificial Intelligence and is also very active in several outreach activities. Her current research interests focus on
understanding the complex relationships among sequence, structure, and function, particularly addressing multifunctional aspects of proteins relevant to biotechnology
(biosensors) and biomedicine (therapeutic drugs) using AI-based techniques.