Detecting somatic copy number variations in 245,388 participants from All of Us biobank

Milovan Suvakov1, Zhiyv Niu2 and Alexej Abyzov1

1 Department of Quantitative Health Sciences, Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota, USA

2 Department Laboratory Medicine and Pathology, Mayo Clinic College of Medicine, Rochester, Minnesota, USA

suvakov.milovan [at] mayo.edu

Abstract

In recent years, the study of mosaic mutations in human tissues gain significant attention due to advancement in methodology and late data biobanks. As part of that clonal hekotapaisis (CHIP) and its implications in age-related diseases, has attracted considerable attention. CHIP, a prevalent phenomenon in aging individuals, is linked with all-cause mortality, blood cancer, and cardiovascular disease risks, but also exhibits protective effects against conditions like Alzheimer’s disease.

We have developed a new methodology implemented in CNVpytor, that detects mosaic copy number variation (mCNV) from WGS by leveraging two independent signals from sequencing data: (1) depth of mapped reads; (2) B-allele frequency of SNPs and small indels. This technique allows for the detection of somatic mCNVs, down to 1% cell frequency. To improve quality of our detection we considered evidence from discordant read pairs and SNP genotyping array data.

Our initial analysis of data from 245,388 individuals in the All of Us (AoU) cohort led to the identification of 2,607 large (>10Mbp) confident somatic mCNVs. We observed an expected trend where older individuals exhibited a higher number of somatic CNAs, consistent with the understanding that detectable clonal hematopoiesis increases with age. Our investigation into chromosome Y loss (LOY) among male samples revealed that over 20% exhibit LOY, indicating a higher prevalence than other somatic mCNVs. Additionally, we found hundreds of thousands smaller mCNVs. The discovery of a small mCNVs in a young individuals, presumed to have originated during development, indicates that analysis of all AoU samples can be useful for understanding of the differences in the occurrence and nature of CNAs during development compared to those in aging.

This comprehensive analysis is expected to result in a shared computational resource, offering mCNV calls for the wider research community. By providing these resources, we aim to not only augment the value of AoU data but also establish a foundation for future research methodologies as the AoU’s sample collection expands.

Keywords: clonal hematopoiesis, mosaic copy number variation, copy number alterations, somatic mutations, aging

Acknowledgement: This work is supported by the National Institutes of Health (grant no 1R03AG085705)  and Mayo Clinic DLMP Scholarly Clinician Award.