Detecting Genetic Interactions with Visible Neural Networks

Arno van Hilten1*, Federico Melograna2,3*, Bowen Fan4, Wiro Niessen1,5, Kristel van Steen2,3+, Gennady Roshchupkin1,6,+

1 Department of Radiology and Nuclear Medicine, Erasmus MC, Rotterdam, The Netherlands

2 Department of Human Genetics, KU Leuven, Leuven, Belgium

3 GIGA-R Molecular and Computational Biology, University of Liège, Liège, Belgium

4 Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland

5 Department of Imaging Physics, Delft University of Technology, Delft, The Netherlands

6 Department of Epidemiology, Erasmus MC, Rotterdam, The Netherlands

* Authors contributed equally and share first authorship

+ Authors contributed equally and share last authorship

federico.melograna [at] kuleuven.be

Abstract

Non-linear interactions among single nucleotide polymorphisms (SNPs), genes, and pathways play an important role in human diseases, but identifying these interactions is a challenging task. Neural networks are state-of-the-art predictors in many domains due to their ability to analyze big data and model complex patterns, including non-linear interactions. In genetics, visible neural networks are gaining popularity as they provide insight into the most important SNPs, genes and pathways for prediction. Visible neural networks use prior knowledge (e.g. gene and pathway annotations) to define the connections between nodes in the network, making them sparse and interpretable. Currently, most of these networks provide measures for the importance of SNPs, genes, and pathways but lack details on the nature of the interactions. Here, we explore different methods to detect non-linear interactions with visible neural networks. We adapt and speed up existing methods, create a comprehensive benchmark with simulated data from GAMETES and EpiGEN, and demonstrate that these methods can extract multiple types of interactions from trained visible neural networks. We also highlight the strengths and weaknesses of the various methods in different settings, providing guidelines for general use-cases. Finally, we apply these methods to a genome-wide case-control study of inflammatory bowel disease and find high consistency of epistasis signals. Follow-up association testing revealed seven statistically significant epistasis SNP pairs. The results and the code to reproduce the analysis are available at https://github.com/ArnovanHilten/GenNet.

Keywords: epistasis, non-linear, interactions, visible, neural networks

Acknowledgement: We would like to acknowledge all the investigators and participants in the International Inflammatory Bowel Disease Genetics Consortium. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreements No 813533 (MLFPM) and No 860895 (TranSYS), the FNRS convention PDR T.0294.24 “Expanded PRS embracing pathways and interactions for increased clinical utility” and through the 2005 Simon Steven Meester grant 2015 to W.J. Niessen by the Dutch Technology Foundation (STW). Work was carried out on the Dutch national e-infrastructure with the support of SURF Cooperative (application number 17610). Gennady V. Roshchupkin supported by the ZonMw Veni grant (Veni 1936320) .