An agnostic analysis of the human AlphaFold2 proteome using local protein conformations

Alexandre G. de Brevern1*

1 DSIMB Bioinformatics Team, INSERM UMR_S 1134, BIGR, Université Paris Cité and Université de la Réunion, 75014 Paris, France

alexandre.debrevern [at] univ-paris-diderot.fr

Abstract

For more than 30 years, different computational approaches have been implemented to propose 3D structural models of proteins from their amino acid sequence. Using deep Learning, AlphaFold 2 obtained particularly remarkable results; some models were within the uncertainties of the experimental resolution (Jumper et al., Nature 2021). AlphaFold 2 code is freely avalaible and EBI provides structural model databases (Tunyasuvunakool et al., Nature 2021), i.e. 98.5% of the human proteome is given. 36% of these models are predicted with atomistic quality.

The human protein models provided by AlphaFold were analyzed using its confidence index (pLDDT score), with classic secondary structure and finer analysis of local protein conformation, e.g. γ-turns, β-turns and bends, β-turn types, PolyProline II (PPII), helix curvatures, β-bulges, and a structural alphabet, namely Protein Blocks (PB).

As expected, the large majority of α-helices are well predicted with high pLDDT scores. However, some points are intriguing and could potentially lead to improvements in the future: (i) PPII helices are too often encountered with a low confidence index. They represent 4-5% of all residues and are important in protein-protein interactions; it could so be an issue to be poorly approximated. (ii) In a very surprising way, while β-turns (turns of 4 residues) are well predicted, 55% of γ-turns (3 residues) have very low pLDDT scores. (iii) Even more strikingly, 94.8% of cis ω angles associated with low pLDDT scores, i.e. AlphaFold is clearly unable to propose proper cis ω angles. (iv) β-sheet occurrence is lower than expected, while PB d (i.e. β-sheet core geometry) occurrence is completely in accordance with the expected frequencies. There are so potentially β-sheets that were not founded until the end, which would explain this low frequency (de Brevern, Biochimie 2023). AlphaFold 2 had impacted the structural modeling area but works remained (Tourlet et al., BioMedInformatics 2023)

Keywords: bioinformatics, deep learning, computer science, protein structure, secondary structures.

Comments are closed.