Characterizing Somatic Mutation Clusters in Cancers Enriched with APOBEC Mutagenesis

Gennady V. Ponomarev1, Fedor M. Kazanov2 and Marat D. Kazanov1,3,4,5*

1 A.A. Kharkevich Institute for Information Transmission Problems, Moscow, Russia

2 “Foxford” High School, Moscow, Russia

3 Skolkovo Institute of Science and Technology, Moscow, Russia

4 Dmitry Rogachev National Medical Research Center of Pediatric Hematology, Oncology and Immunology, Moscow, Russia

5 Sabanci University, Istanbul, Turkey

marat.kazanov [at] sabanciuniv.edu

Abstract

The APOBEC family of cytidine deaminases plays an active role in the human immune system, combating viruses and transposable elements. These enzymes convert cytosine to uracil, which often leads to C->T or C->G mutations within the specific nucleotide contexts (TCN). Clusters of such mutations have been identified in the genomes of various cancer types, including bladder, breast, cervical, head and neck, and lung cancers. Previous studies have shown that these APOBEC-induced mutation clusters are not uniformly distributed across the genome. To analyze their distribution, we developed a method for detecting APOBEC-induced clusters in cancer genomes.

The method presented was developed using a subset of cancer samples from the PCAWG project, which exhibited a strong enrichment of APOBEC-signature mutations. Mutations were iteratively merged into clusters using a distance threshold determined through chromosome-specific simulations that matched the actual number of mutations. We constructed distributions of distances between neighboring mutations and derived distance thresholds for several significance levels (5%, 1%, 0.1%). We also separately estimated the heterogeneity of APOBEC mutagenesis along the genome, adjusting the distance threshold accordingly. This method was applied to the entire PCAWG dataset, identifying traces of APOBEC mutagenesis in additional cancer types.

Keywords: cancer bioinformatics, APOBEC, mutagenesis, mutation clusters.

Acknowledgement: This study was supported by Scientific and Technological Research Council of Turkey (TUBITAK) under the Grant Number 123E476. The authors thank to TUBITAK for their supports.