Bioinformatics tools for reconstruction of gene networks of complex diseases

Yuriy L. Orlov1,3*, Ekaterina A. Savina1, Vasilisa A. Turkina1, Anastasia A. Anashkina1,2

1 Sechenov First Moscow State Medical University of the Russian Ministry of Health (Sechenov University), Moscow, Russia

2 Engelhardt Institute of Molecular biology RAS, Moscow, Russia

3 Institute of Cytology and Genetics, SB RAS, Novosibirsk, Russia

orlov [at] d-health.institute

Abstract

The study of gene networks of complex diseases is an important biomedical task demanding data integration. Despite the existing wide range of computer programs, their adaptation is necessary for clinical data analysis. The adaptation assumes preparation of tutorials, textbooks and training materials for students, interns, and workers of medical institutions without mathematical or computer science background. Such tutorials should be based on available online bioinformatics tools. Here we discuss a scientific project for the study of complex diseases in which it is difficult to identify genetic components, such as cancers, mental disorders, schizophrenia, and Parkinson’s disease.

The available software tools have been collected, a data processing pipeline for creating a list of genes associated with given complex disease has been prepared. The list of genes could be compiled based on queries to GEO NCBI (RefSeq), the OMIM (Online Mendelian Inheritance in Man), GeneCards, and MalaCards databases. Then such a list of genes could be refined using other data, such as data on the differential expression of genes (GEO Dataset Browser resource), including non-coding RNAs (from the TCGA database), and information from published papers. The gene ontologies analysis could be performed using open resources for bioinformatic analysis: PANTHER (http://pantherdb.org) and DAVID (https://davidbioinformatics.nih.gov), the g:GOSt resource for visualization of gene ontologies (http://biit.cs.ut.ee/gprofiler/gost).

Next set of tools for gene targets search is related to gene expression. The computer study of sequencing data is based on the integration of available sequencing data (RNA-seq and ChIP-seq) and computer resources: ArrayExpress, TCGA, CCGA, ENCODE (ENCyclopedia Of DNA Elements), as well as local computer resources ANDSystem, TRRD, GeneNet (wwwmgs.bionet.nsc.ru) and ICGenomics (https://www-bionet.sscc.ru/icgenomics/).

As the main examples of applications, the tasks of analyzing brain tumors are considered – for glioma, meningioma, with the study of complications associated with virus infections, including available data published after the coronavirus pandemic. As applications we consider computer reconstruction of gene networks for a number of oncological diseases – glioma, breast cancer, colorectal cancer, and a number of mental disorders such as Parkinson’s disease.

Keywords: bioinformatics, gene networks, complex diseases, education, online tools

Acknowledgement: The study was supported by the Russian Science Foundation (grant 24-24-00563).