Graphlet-based higher-order network embeddings: the past, the present and the future

Sam F. L. Windels1, Noël Malod-Dognin1 and Nataša Pržulj1,2,3,*

1 Barcelona Supercomputing Center (BSC), Barcelona, Spain

2 Department of Computer Science, University College London, London, UK

3 ICREA, Pg. Lluís Companys 23, Barcelona, Spain

natasha [at] bsc.es

Abstract

At a high level, there exist two approaches for mining networks. Neighbourhood-based approaches uncover groups (i.e., clusters) of tightly connected neighbouring nodes in a network and make predictions based on guilt by association: two nodes are assumed to be more likely to interact or share attributes if they belong to the same group(s) in the network. Topology (i.e., structure) based approaches make predictions based on structural similarity. The state-of-the-art methods to quantify local topology are based on graphlets, which are small connected non-isomorphic induced subgraphs. To combine neighbourhood and graphlet-based information, we defined graphlet adjacency, which weighs the adjacency of two nodes based on their co-occurrence frequency on a given graphlet (i.e., there is one type of adjacency for each graphlet). In this talk, we provide an overview of various methodologies we generalised using graphlet adjacency, including graphlet spectral embedding, graphlet eigencentrality, graphlet diffusion and hyperbolic graphlet coalescent embedding, and show how we applied them to better describe the functional organisation of various molecular networks and to better capture disease mechanisms. Recently, we used graphlet based symmetries to improve random walk based approaches. We conclude by presenting future research directions for new graphlet adjacency-based methods and applications.

Keywords: higher-order network topology, network biology, data mining

Acknowledgement: This project has received funding from the European Union’s EU Framework Programme for Research and Innovation Horizon 2020, Grant Agreement No 860895, the European Research Council (ERC) Consolidator Grant 770827, the Spanish State Research Agency and the Ministry of Science and Innovation MCIN grant PID2022-141920NB- I00 / AEI /10.13039/501100011033/ FEDER, UE, and the Department of Research and Universities of the Generalitat de Catalunya code 2021 SGR 01536.