Advancing Genomics with OrthoDB, BUSCO, and the LEM Framework

EV Kriventseva*, M Manni, M Seppey, F Tegenfeldt, M Berkeley, D Kuznetsov, EM Zdobnov

Department of Genetic Medicine and Development, University of Geneva Medical School, Swiss Institute of Bioinformatics, rue Michel-Servet 1, 1211 Geneva, Switzerland.

Evgenia.Kriventseva [at] unige.ch

Abstract

The rapid growth of genomics data necessitates continuous advancements in bioinformatics tools. This presentation highlights the latest updates to our toolbox, including OrthoDB v11, BUSCO v5, and the LEM benchmarking framework.

OrthoDB (https://www.orthodb.org) is a leading resource for gene orthology and functional annotations across diverse eukaryotes, prokaryotes, and viruses. Orthology facilitates precise bridging of gene function knowledge within the genomics sphere. OrthoDB v11 encompasses over 100 million genes from 18,000 prokaryotes and nearly 2,000 eukaryotes, providing extensive species coverage. The open-source OrthoLoger software (https://orthologer.ezlab.org) allows mapping of novel gene sets to precomputed orthologs, linking them to relevant annotations.

BUSCO (https://busco.ezlab.org) serves as a standard tool for assessing the completeness of genome assemblies, transcriptomes, and predicted gene sets, complementing assembly contiguity measures like N50 values. A spin-off of OrthoDB, BUSCO evaluates the presence and coverage of marker genes, offering an evolutionarily-grounded expectation of gene content completeness. BUSCO v5 now automatically selects the most suitable dataset for evaluation, outperforming the popular CheckM tool. Its efficiency is particularly evident in large eukaryotic genomes, and it is uniquely capable of assessing both eukaryotic and prokaryotic species, making it applicable to metagenome-assembled genomes of unknown origin.

The LEMMI (https://lemmi.ezlab.org) benchmarking framework, now in version 2, facilitates informed software tool selection. This Live Evaluation of Methods (LEM) for Metagenome Investigation uses a container-based approach for continuous benchmarking and effective end-user distribution. The versatile framework can be extended to other procedures, such as gene orthology inference with LEMOrtho (https://lemortho.ezlab.org). The LEM benchmarking approach aims to become a community-driven effort, allowing developers to showcase novel methods and users to access standardized, easy-to-use software. We encourage researchers to apply this framework in their domain and welcome feedback.

Keywords: genomics, genomes, orthologs, genes, continuous benchmarking

Comments are closed.