MONFIT: Multi-omics factorization-based integration of time-series data sheds light on Parkinson’s disease

Katarina Mihajlović1*, Noël Malod-Dognin1, Corrado Ameli2, Alexander Skupin2,3,4, and Nataša Pržulj1,5,6

1 Barcelona Supercomputing Center (BSC), Barcelona, Spain

2 The Integrative Cell Signalling Group, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg

3 Luxembourg Institute of Health (LIH), Esch-sur-Alzette, Luxembourg

4 University of California San Diego, La Jolla, CA, USA

5 Department of Computer Science, University College London, London, United Kingdom

6 ICREA, Pg. Lluís Companys 23, Barcelona, Spain

katarina.mihajlovic [at] bsc.es

Abstract

Parkinson’s disease (PD) is a severe and complex multifactorial neurodegenerative disease whose elusive pathophysiology prevents the development of curative treatments. Studying PD using longitudinal multi-omics data is a promising approach to identifying its mechanisms of etiology and progression. However, heterogeneous data require new analysis frameworks that can utilize the complementary information captured by diverse data types and further the understanding of PD across biological entities and processes.

We present MONFIT, a holistic analysis pipeline that integrates and mines time-series single-cell RNA-sequencing data of disease and control cell lines, along with bulk proteomics and metabolomics data, by non-negative matrix tri-factorization, hence enabling prior knowledge integration from molecular networks. MONIFT first integrates (fuses) time-point-specific data, producing time-point-specific gene embeddings, which it then collectively mines across time points.

We apply MONFIT to longitudinal, multi-omics data of PD and control cells obtained from patient-derived induced pluripotent stem cells that were differentiated into dopaminergic neurons. We predict 123 genes related to PD, which we validate by network analysis to be specific to the PINK1 mutation causing PD. We investigate the top 30 gene predictions and propose five novel PD gene candidates: CENPF, CRABP1, TOP2A, TMSB10, and NASP. In addition, we emphasize molecular pathways that play important roles in PD pathology and suggest new intervention opportunities by drug repurposing. We demonstrate that MONFIT goes beyond standard differential analysis approaches of single-omics data by predicting PD-associated genes that would otherwise elude discovery. MONFIT is a generic method and can be modified to accommodate data from tissue samples and other multi-omics data types.

Keywords: data fusion, data mining, single-cell data, network biology, multi-omics data

Acknowledgement: This project has received funding from the European Union’s EU Framework Programme for Research and Innovation Horizon 2020, Grant Agreement No 860895, the European Research Council (ERC) Consolidator Grant 770827, the Spanish State Research Agency and the Ministry of Science and Innovation MCIN grant PID2022-141920NB-I00 / AEI /10.13039/501100011033/ FEDER, UE, and the Department of Research and Universities of the Generalitat de Catalunya code 2021 SGR 01536.