Exploration of Intrinsic Disorder Regions through Classification of Intrinsically Disordered Proteins Using PPI Network Structure and Sequence Attributes: A Case Study

Milana Grbić*, Milan Predojević, Nenad Vilendečić and Dragan Matić

Faculty of Natural Science and Mathematics, University of Banja Luka, Banja Luka, Bosnia and Herzegovina

milana.grbic [at] pmf.unibl.org

Abstract

In this study, the prediction of Intrinsically Disordered Proteins (IDPs) was explored by utilizing the structure of Protein-Protein Interaction (PPI) networks and sequence characteristics. A weighted PPI network, where edge weights represented gene co-expression information between two proteins, was used to extract attributes related to protein topological properties via the node2vec+ tool. Additionally, attributes derived from primary sequence information were incorporated, focusing on amino acid properties such as order/disorder promotion (Type A attributes) and physicochemical properties including aromatic/aliphatic, polar/non-polar, non-zero/zero, hydrophobic/hydrophilic, and positive/negative (Type B attributes).

Proteins were classified into IDP and non-IDP categories using a K-Nearest Neighbors (KNN) classifier under four scenarios: (1) based solely on network attributes, (2) incorporating network attributes and sequence Type A attributes, (3) incorporating network attributes and sequence Type B attributes, and (4) considering network attributes along with both sequence Type A and Type B attributes. Proteins misclassified as IDPs in these scenarios were further examined using the IUPred2 tool, which revealed that only a subset of these proteins indeed possessed intrinsic disorder regions (IDRs) along their sequences.

This study was conducted as a case study using the PPI network model of the yeast organism from the BioGRID database, with the list of yeast IDP proteins sourced from the DisProt database. Gene co-expression information was obtained using the SPELL tool.

Keywords: Intrinsically Disordered Proteins (IDPs), Protein-Protein Interaction (PPI) Networks, Sequence Attributes, IDP prediction

Acknowledgement: This research is supported by a project titled “Support to COST action: Information, Coding and Biological Function: The Dynamics of Life” funded by the Ministry of Scientific and Technological Development and Higher Education, Government of Republic of Srpska, B&H. Also, it is supported by two projects funded by Ministry of Civil Affairs, B&H: “Support for the Participation of the Research Team of Bosnia and Herzegovina in COST Action CA21169: Information, Coding, and Biological Function: the Dynamics of Life”, and “Support for the Implementation of COST Action CA21160 Non-globular Proteins in the Era of Machine Learning in Bosnia and Herzegovina (ML4NGPB&H)”.