Integrated relational database of human protein-protein interactions

Bojana Jošić1*, Jovana Kovačević1, Vladimir Perović2, Nevena Veljković2

1Faculty of Mathematics, University of Belgrade, Studentski trg 16, Belgrade, Serbia

2Institute of Nuclear Sciences Vinča, University of Belgrade, Mike Petrovića Alasa 12-14, Belgrade, Serbia

mr17128 [at] matf.bg.ac.rs

Abstract

Protein-protein interactions’ data are stored in various publicly available databases of different types and formats. In this work, a new database for protein-protein interactions is created by integrating data from multiple existing databases. This task is not trivial since different databases use distinct gene or protein identifiers for protein annotation. Additionally, they use different methods to determine interaction scores, and the interactions are obtained through diverse experimental or predictive methods. As a result, two databases may store different data about the same interaction.

To integrate data from various databases, namely BioGRID, STRING, HIPPIE, IntAct, and Reactome, into a single PPI database, the following process is undertaken. Initially, data is downloaded from these databases in the MITAB format, encompassing all pertinent interaction information such as protein identifiers, publication sources and other. In order to obtain unique protein identifiers in all PPIs in the database, the UniProt ID mapping tool was used to determine UniProt IDs. Next, since scoring systems differ among databases, for every interaction a new score is calculated using MISCORE tool as an additional metrics unique for all the PPIs in the database. The resulting database contains tens of millions of human PPIs from five different sources.

Keywords: bioinformatics, protein-protein interaction, database, computer science

Comments are closed.