Background
Breast cancer is a leading cause of cancer-related mortality in women worldwide, with recurrence rates of 10–15% within 5 years of diagnosis [
1,
2]. Currently, the 70-gene expression profile [
3] and 21-gene recurrence score assays [
4] are recommended in clinical practice to predict the risk of recurrence and guide decisions regarding adjuvant chemotherapy [
5]. However, the high cost of these assays and limited availability of tissue samples for assessment pose challenges to their widespread adoption, potentially overlooking the spatial heterogeneity of breast tumors. Furthermore, these options are only suitable for luminal subtype patients, leaving non-luminal subtype patients at risk of over or undertreatment. In current clinical practice, patients with hormone receptor (HR)-positive or human epidermal growth factor receptor 2 (HER2)-positive tumors receive endocrine therapy or HER2-targeted therapy, respectively. However, there is considerable variation in survival rates among patients within the same treatment strategy. Therefore, a more universally applicable and accurate method is needed to identify patients at high or low risk of recurrence, facilitating personalized treatment decisions and achieving precision therapy.
In recent years, deep learning methods, in particular convolutional neural networks, have become widely used for analyzing nonstructural image data and have demonstrated their effectiveness in capturing image features [
6]. For instance, a previous study proposed a multi-task deep learning approach for segmenting tumors and predicting treatment response based on magnetic resonance imaging (MRI) scans of rectal cancer patients [
7]. Moreover, in the field of survival analysis, a deep learning survival neural network (DeepSurv) has been developed, which combines the Cox proportional hazards model with deep learning techniques [
8]. These studies indicated that incorporating the techniques into the field of radiomics could lead to significant advancements in personalized medicine. This study also demonstrated that DeepSurv has the potential to provide treatment recommendations that lead to improved survival outcomes.
Although radiomic features have been widely utilized for predicting outcomes in cancer patients, the underlying biological mechanisms are still not well-understood. A recent study demonstrated that radiomic features differ between treated and untreated tumors [
9], suggesting that these features may reflect changes in the tumor microenvironment. Consequently, it is imperative to investigate the relationship between radiomic features and therapeutic response. Additionally, there is a growing research interest in the epigenetic changes that occur in cancer, with long non-coding RNAs (lncRNAs) gaining recognition for their clinical value. However, the detection methods for lncRNAs currently limit their clinical application. A previous study proposed an artificial intelligence system that employed CT images to predict the epidermal growth factor receptor (EGFR) genotype and prognosis with EGFR-tyrosine kinase inhibitors [
10], which reminds us the potential for quantifying lncRNA expression using radiomics. Due to the association between radiomic features and therapeutic response or epigenetics remains uncertain, and prior findings lack robust validation, it certainly seems worthwhile to explore the possible biological basis of radiomics and develop noninvasive tools for detecting lncRNA expression.
In this multicenter study, we constructed the interpretable deep-learning-based Radiomic DeepSurv Net (RDeepNet) model to predict recurrence risk, and evaluated the changes in radiomics before and after therapy with consideration of the therapy response status. The association between radiomic features and lncRNAs was further assessed to explore the potential epigenetic biological underpinning of nonmetastatic invasive breast cancer.
Methods
Study design and patients
This study was conducted in accordance with the STROBE guideline checklist [
11]. This study included three phases to train and validate the RDeepNet model for prediction of recurrence-free survival (RFS) and explore the association between radiomics and the treatment or epigenetic biological underpinning. In the RDeepNet model construction and validation phase (phase 1), the RDeepNet model was constructed with a combination of the intra- and peritumoral radiomic features using contrast-enhanced T1-weighted imaging (T1 + C) and T2-weighted imaging (T2WI) sequences, which aimed to pinpoint patients with a high or low risk of recurrence. The RDeepNet model was validated in an independent external validation cohort and a testing cohort. RNA-sequencing (RNA-seq) was performed to preliminarily explore the potential molecular mechanisms of radiomics. In phase 2, correlation and variance analyses were conducted to examine the changes of radiomics in patients before and after neoadjuvant chemotherapy with the response status. Based on the above findings, the association and quantitative relation of radiomics and epigenetic molecular characteristics were further analyzed with RNA-seq data in phase 3.
A total of 1,186 nonmetastatic invasive breast cancer patients were retrospectively recruited from four institutions in China, of which 73 patients did not pass the quality control (55 patients were not histologically confirmed to have stage I–III invasive breast cancer [
12], and 18 patients lacked an MRI before surgery), and 1113 patients were finally enrolled. A total of 698 patients recruited from the national hospitals Sun Yat-sen Memorial Hospital of Sun Yat-sen University (Guangzhou, China) and Sun Yat-sen University Cancer center (Guangzhou, China) between March 23, 2011, and August 26, 2019, were assigned to a training cohort. Then, 171 patient cases collected from the Shunde Hospital of Southern Medical University (Foshan, China) and the Tungwah Hospital of Sun Yat-sen University (Dongguan, China) between March 09, 2012, and September 21, 2019, were used as the validation cohort. A total of 244 patients from the Sun Yat-sen Memorial Hospital of Sun Yat-sen University (Guangzhou, China) between April 19, 2013, and December 05, 2018, were assigned to the testing cohort. We retrospectively collected 92 formalin-fixed paraffin-embedded (FFPE) biopsy tissues from patients treated at the Sun Yat-sen Memorial Hospital of Sun Yat-sen University. All samples were reassessed by two pathologists and were found to contain more than 70% tumor cells. A total of 72 patients, who had both T1 + C and T2WI sequences from The Cancer Genome Atlas (TCGA) and The Cancer Imaging Archive (TCIA), were assigned to the TCGA cohort for assessing the efficacy of the deep learning prediction model.
The inclusion criteria were female patients aged at least 18 years with histological confirmation of stage I–III invasive breast cancer [
12], underwent breast tumor and axillary MRI scans before surgery and axillary lymph node dissection, and who experienced perioperative therapy. Cases of patients with other previous or simultaneous tumors, incomplete pathological information, or unavailable standard MRI scans with or without contrast enhancement were excluded. The outcome was RFS, calculated from the date of surgery until the date of the most recent medical review or diagnosis of recurrence, or metastasis, and the association of radiomics with lncRNAs.
The four molecular subtypes of breast tumors were defined according to the St. Gallen Consensus Conference 2013 [
13], with biomarkers measured by immunohistochemistry or in situ hybridization. Luminal A subtype patients were defined as estrogen receptor (ER)- and progesterone receptor (PR)-positive, HER2-negative, and Ki-67 level < 14%. Luminal B subtype patients were defined as ER-positive and over-expressed/amplified HER2, or ER-positive and HER2-negative, with Ki-67 level > 14%, or PR-negative/low. In contrast, ER- and PR-negative, HER2-positive subtype patients had over-expressed/amplified HER2, and triple-negative breast cancer (TNBC) subtype patients were HER2-negative.
Procedures of transcriptome RNA sequencing
Total RNA was extracted from FFPE samples using the QIAGEN FFPE RNeasy kit (QIAGEN GmbH, Hilden, Germany). RNA was analyzed using an Agilent RNA 6000 Nano Kit (Aglient Technologies, Santa Clara, CA, USA), and RNA integrity numbers were determined to evaluate RNA integration using an Agilent Bioanalyzer 2100 (Aglient Technologies, Santa Clara, CA, USA). An input of 500 ng of total RNA was amplified using an Ovation FFPE WTA System (NuGEN, San Carlos, CA, USA), and a NEBNext® Ultra™ II DNA Library Prep Kit (Illumina) was used for fragmentation and labeling. The quality and quantity of amplified libraries were evaluated using Qubit (Invitrogen, Carlsbad, CA, USA) and Agilent Bioanalyzer 2100 (Aglient Technologies, Santa Clara, CA, USA). All libraries were sequenced using a DNBSEQ-T7RS (MGI) with 100 bp paired-end reads. Base call files were converted to the fastq format using cal2Fastq. Raw data were normalized using the fastp (version 0.20.1) for data processing.
The acquisition protocol of the multiparametric MRI (including T1 + C, and T2WI) used across all institutions and the MR scanner parameters are described in Additional file
1: eAppendix 1 and Additional file
1: Table S1. All of the MRIs were normalized to obtain a standard normal distribution of image intensities using the N4ITK Bias Correction code. The 3D regions of interest (ROIs) in the breast intratumoral area and the peritumoral area (10-mm extension outward of the tumor parenchyma) were semi-automatically segmented using the 3D Slicer software (
https://www.slicer.org/, version 4.10.2) [
14]. The 3D regions of intra- and peritumoral (DICOM format) were transferred to the SlicerRadiomics code, a texture extraction platform based on the python package “PyRadiomics” [
15]. For each patient, 3,452 quantitative radiomic features (863 features from each ROI in each sequence, including 12 diagnostic features, 107 original features, and 744 wavelet features) were extracted to analyze shape, size, intensity, morphology, and texture. Besides diagnostic features, the remaining radiomic features were categorized into seven groups: shape descriptors, first-order statistics, gray-level co-occurrence matrix (GLCM), gray-level size zone matrix (GLSZM), gray-level run-length matrix (GLRLM), gray-level dependence matrix (GLDM), and neighboring gray tone difference matrix (NGTDM). More details regarding the radiomic feature extraction are described in Additional file
1: eAppendix 2.
RDeepNet model building and validation
The Cox proportional hazards deep neural network, DeepSurv [
8], was applied to construct the RDeepNet model for predicting individual recurrence risk. The network took 3,452 radiomic features as input for each patient. For the recurrence risk, the RDeepNet score was calculated with a single output node based on the negative log-partial likelihood function. The RFS predicted from the RDeepNet model was then assessed in the validation cohort and the testing cohort, respectively. More details about the network were described previously [
8].
Radiomic features varied among patients with different responses and after neoadjuvant chemotherapy
In total, 127 (52%) of the 244 patients from the testing cohort had radiomic features from before and after neoadjuvant chemotherapy, of which 72 (57%) patients were evaluated as responsive (complete response + partial response) to the therapeutic, with the standard of Response Evaluation Criteria in Solis Tumors (RECIST). The other 55 (43%) patients were defined as unresponsive (stable disease + progressive disease). The differential therapy-related radiomic features between responsive and unresponsive patients or before and after neoadjuvant chemotherapy were identified using the limma package, t test and paired samples t test, respectively. The heatmaps of the differentially expressed radiomic features were obtained with the R package pheatmap. The correlation matrix maps of the radiomic features extracted from intratumoral region were performed with the R package ggplots and RColorBrewer.
Exploration of the molecular mechanisms of radiomics
To explore the related biological mechanisms of radiomics, we performed RNA-seq for 92 patients from the training cohort. Additional file
1: Table S2 shows the clinicopathological characteristics of these patients. The compared files were downloaded from
https://www.ensembl.org/index.html and annotated with Perl software according to the ensemble ID of sequencing results. Next, the gene length was compared through the Gencode27 database on the basis of the counts data. Then, the counts data were converted into TPM data, and the lncRNAs were distinguished in accordance with the Ensembl database.
The t test and limma package were used to identify differentially expressed genes between high- and low-risk patients according to the RDeepNet score. Then, the proportion of the tumor immune microenvironment were quantified in the 92 patients with the ssGSEA algorithm, which were used for highly sensitive and specific discrimination of 28 human immune cell phenotypes, including B cells, T cells, natural killer cells, macrophages, dendritic cells, and myeloid subsets. Spearman’s rank correlation analysis and limma package were used between high- and low-risk patients to further explore the association between radiomics and the tumor immune microenvironment.
To explore the potential epigenetic biological underpinning of radiomics, 15 lncRNAs were selected using the Spearman’s rank correlation analysis and univariable Cox proportional hazards regression model in 92 patients with RNA-seq data. The limma package was utilized to identify the differential radiomic features between patients with high and low expression of the key lncRNA. The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses were performed using the clusterProfiler R package [
16]. The pathways were also identified by running a gene set variation analysis (GSVA) with the R package gsva. The pathway enrichment analyses were considered statistically significant, with
P values and false discovery rates of less than 0.05. Next, the deep learning prediction model of lncRNA expression was built with the intratumoral radiomic features based on the multilayer neural network (MLP) [
17,
18]. A total of 92 patients with RNA-seq data were included for training the model, and 72 patients with both T1 + C and T2WI sequences from TCGA and TCIA were assigned to the TCGA cohort for assessing the efficacy of the model.
Statistical analysis
Fisher’s exact tests were performed to examine differences in the occurrence of categorical variables, while independent t tests were used to compare differences in continuous variables between the two groups. Survival was calculated using the Kaplan–Meier method and the log-rank test. Hazard ratios (HRs) and 95% confidence intervals (Cls) were calculated using a Cox regression analysis. Patients were categorized into high and low-risk groups with the optimal cutoff values defined by the R package ggsurvimier. The prognostic or predictive accuracy of the RDeepNet model and prediction model of lncRNA expression was assessed by using receiver operating characteristic curve (ROC) analysis. The performance of the RDeepNet model for RFS prediction and prediction model of lncRNA expression was evaluated by assessing sensitivity and specificity calculated by using the area under the ROC curve (AUC) method. For all analyses, two-sided P-values less than 0.05 were considered statistically significant. Statistical analyses were performed using R software (version 4.0.0).
Discussion
In this multicenter study, deep learning algorithms based on the T1 + C and T2WI sequences combining the intratumoral and peritumoral radiomic features were found to be significantly associated with RFS and presented a higher predictive value for RFS. The RDeepNet model successfully classified patients with different breast cancer molecular subtypes or different therapy regimens in high- and low-recurrence risk categories. Furthermore, it was observed that some radiomic features varied from patients with different response statuses and after neoadjuvant chemotherapy. More importantly, the radiomics showed significant association with lncRNAs according to the results of RNA-seq, and the expression of lncRNA could be quantified by radiomics. Overall, this study developed and validated a prognostic network for individualized prediction of high and low recurrence risk, which serves as an effective tool for survival prediction and clinical decision-making in patients with nonmetastatic invasive breast cancer. Moreover, the potential epigenetic biological underpinning of radiomics was preliminarily revealed, and a non-invasive method was established to predict expression of epigenetic molecule.
While previous studies [
22,
23] showed the potential of MRI-based radiomics for predicting breast cancer recurrence, their clinical value was limited because they used a small sample size and single-center cohorts, extracted the radiomic features only from the tumor region, and were based on machine learning algorithms. A previous study [
24] constructed a radiomics nomogram based on intratumoral features in 294 invasive breast cancer patients from a single center, and estimated DFS with C-index of 0.76. As far as we know, our study was the first to build a network based on deep learning with both intratumoral and peritumoral radiomic features in multicenter cohorts of more than 1,000 breast cancer patients. Furthermore, we analyzed the efficacy of the RDeepNet model in patients treated with different therapy regimens and the change in radiomics with different therapeutic response or before and after therapy. We also performed RNA-seq to explore the potential epigenetic biological underpinning of radiomics, and achieved noninvasive prediction expression of lncRNA by utilizing radiomic features.
In current clinical practice, patients with positive HR status are considered for endocrine therapy, and HER2-targeted therapy is selected for HER2-positive patients. However, some patients still experience progress owing to therapy resistance [
25,
26]. The Oncotype DX21-gene [
27] and the PAM50 risk score [
28] have been used to predict the response of endocrine therapy, but these methods are invasive and only suitable for a subset of the population. As for HER2-targeted therapy, only HER2 amplification or overexpression predicts an enhanced survival benefit from the HER2-targeted therapy at present. Although a previous study presented an MRI-based signature, which could noninvasively characterize HER2-positive tumor biological factors and estimate the response to HER2-targeted neoadjuvant therapy, the small size sample and highly heterogeneous data limited the application [
29]. Therefore, it is urgent to explore other methods for predicting the therapy response in addition to the status of HR or HER2. In this study, the RDeepNet model could recognize recurrence risk among patients treated with endocrine therapy or HER2-targeted therapy, and the efficacy showed all of the AUCs of more than 0.90. These results indicate that the RDeepNet model had the potential to assist in treatment decisions.
In the present study, the differentially expressed genes between the high- and low-risk groups were identified with the RNA-seq data. Results of pathway enrichment analyses show that these genes might be involved in the regulation of host immune responses. The further evaluation demonstrated that the RDeepNet score was significantly related to most immune cells, and high-risk patients showed lower expression of CD56dim natural killer cells. As we know, CD56dim natural killer cells account for more than 90% of natural killer cells and mainly play a cytotoxic role, with stronger killing activity [
30]. In addition, the RDeepNet model could identify a high and low risk of recurrence in the testing cohort, in which all of the patients underwent neoadjuvant chemotherapy. It is worth noting that there some radiomic features were differentially expressed before and after neoadjuvant chemotherapy and varied in responsive and unresponsive patients. These radiomic features were defined as therapy-related features. The above findings remind us that radiomics can reflect the change in the tumor microenvironment or molecular characteristics.
In recent years, emerging evidence has suggested that abnormal expression of lncRNAs is a frequent biological phenomenon in tumors and is closely associated with the prognosis of cancer patients. Several studies have indicated that the MRI radiomic profile of cancer patients can predict the prognosis, but the potential biological underpinning of MRI radiomics remains indistinct. We hypothesized that MRI radiomics can reflect the expression of lncRNAs, and therefore provided prognosis information. In this study, based on patients who had both RNA-seq and preoperative MRI data, we screened 15 lncRNAs related to both radiomic features and RFS to confirm our hypothesis. Among these lncRNAs, KRT7-AS was significantly correlated with the therapy-related radiomic features, and the KRT7-AS-based differentially expressed genes were enriched in process of lncRNA-mediated mechanisms of therapeutic resistance and various metastasis- or metabolism-associated pathways. Previous research has found that the increasing stability of lncRNA KRT7-AS could promote breast cancer lung metastasis by regulation of
N6-methyladenosine [
19]. KRT7-AS also supports gastric cancer and colorectal cancer progression by modulating KRT7 expression [
20,
21]. Therefore, the lncRNA KRT7-AS indeed plays an important role in tumor progression, and it is necessary to examine KRT7-AS expression to predict survival.
However, the clinical application of lncRNAs as biomarkers is severely limited owing to the lack of detection methods. Our results suggest that MRI radiomic profiles can help identify potential targets for molecular-based therapy of breast cancer, and MRI examination may be used to monitor the expression level of molecular features during the therapy. Based on the above findings, a deep learning prediction model of KRT7-AS expression was further constructed with MLP and showed high predictive efficacy in both training and testing cohorts. This result can afford non-invasive detection of molecular expression by just acquiring radiomic features, which can assist in conveniently monitoring dynamic changes in tumors. Furthermore, the exploration of the association between lncRNAs and MRI radiomics is just the fundamental starting point, and the potential biological relationship of MRI radiomic profiles with other molecular species, such as DNA methylation, DNA copy number and sequence variation, should be evaluated in the future.
Several limitations existed in the present study. Heterogeneity among the MRI scans from multiple clinical centers was inevitable. The median follow-up was about 40 months. Therefore, the outcomes were limited, and the RDeepNet model could not be applied to predict overall survival. It is necessary to evaluate the radiomic changes with the extension of follow-up time. Due to the relatively low incidence of TNBC among breast cancer patients and the retrospective approach taken in this study, TNBC patients may be under-representation. Previous studies have shown the association between radiomic features and tumor environment [
31,
32]. In this study, we performed RNA-seq for a few patients. However, owing to the lack of available data on gene expression or MRI sequences, we were unable to further analyze and validate the association between radiomic features with lncRNAs. In particular, the mechanisms underlying the use of radiomic features to predict recurrence and lncRNA expression need to be further explored. It may be beneficial to combine the RDeepNet model with genetic signatures such as genomics and transcriptomics, which have better prediction for recurrence and clinical application values.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.