Impact of 3D chromatin structure on cancer mutation patterns and tissue-of-origin prediction

Paula Štancl* and Rosa Karlić

Faculty of Science, University of Zagreb, Croatia

pstancl [at] bioinfo.hr

Abstract

Tissue-of-origin (TOO) detection poses a challenge for effective therapy selection due to the heterogeneity and distinct genomic and molecular characteristics of carcinoma of unknown primary site (CUP). Machine-learning algorithms utilizing mutational landscape data from whole-genome sequencing and normal tissue epigenetic features have shown promise in predicting TOO. These models leverage the association between mutation densities, regional histone modifications, and the non-uniform distribution of mutations across 1 MB genomic regions and tumor types. However, some cancer types remain difficult to classify accurately. To address this, we developed TOO models utilizing tissue-specific topologically associated domains (TADs). Since TADs represent fundamental units of 3D genome architecture, we investigated whether the TOO prediction can be improved by using TADs (TAD model) or genes clustered based on their location in TADs (TAD gene model).

We analyzed publicly available liver cancer cohorts from the International Cancer Genome Consortium, obtaining ChIP-seq data for six histone modifications and input controls from the Roadmap Epigenomics project. Tissue-specific TADs were downloaded from TADBK and the 3D Genome Browser. We used a multiple linear regression model with 10-fold cross-validation to compute the amount of variance of aggregated mutations across various TADs or TAD-based gene clusters explained by the epigenome of each normal tissue. The model with the highest variance represents the TOO for a specific cancer type.

The results demonstrated consistent correct TOO prediction across all tissue-specific TADs identified using various tools. Although the overall accuracy of TAD models did not significantly differ from the original model developed using 1 MB region-based predictions, TAD gene models showed a significant increase in correct TOO prediction compared to other gene models we developed. Genes located in TADs where the number of predicted mutations was lower than the observed number were associated with cancer development and progression, indicating that this type of analysis can facilitate the identification of structural units that influence carcinogenesis.

Overall, the results show that we can use the cell’s epigenome and cancer’s mutation profile based on TADs to predict the tissue-of-origin and use the developed models to analyze the mechanisms of cancer initiation and progression.

Keywords: tissue-of-origin, machine learning, topologically-associated domains, mutational landscape, epigenome