Background
It is estimated that over 2.2 million breast cancer cases occurred worldwide in 2020, representing 1 in 4 cancers diagnosed among women [
1,
2]. The molecular classification of breast cancer is based on human epidermal growth factor receptor-2 (HER2) gene amplification and the expression of sex hormones, estrogen, and progesterone receptors. Triple-negative breast cancer (TNBC), approximately 15% of invasive breast cancers, is the most aggressive breast cancer subtype; it lacks estrogen receptor (ER) expression, progesterone receptor (PR) expression, and HER2 gene amplification [
3,
4]. Patients with TNBC have the highest mortality rates compared to the other breast cancer subtypes and are treated with chemotherapy prior to surgery, known as neoadjuvant chemotherapy (NAC) [
5,
6]. NAC reduces the tumor burden and nodal involvement. Most importantly, the response of TNBC to NAC is associated with long-term patient outcomes [
7]. However, nearly 50% of patients with TNBC have residual disease after NAC [
6]. Of those, 20–30% of TNBC patients develop early disease progression within three years and exhibit high metastatic recurrence rates and poor long-term outcomes [
7,
8].
With advances in high-throughput technologies, large-scale breast cancer genomics and transcriptomics data have been collected and analyzed to identify biomarkers associated with prognosis as well as treatment resistance [
7,
9‐
16]. Most TNBC studies have focused on subtyping or developing signatures associated with prognosis or disease recurrence, mainly utilizing pretreatment gene expression data from tumor samples. Few studies evaluated transcriptomics data from pre-NAC and post-NAC TNBC tumors [
7,
13‐
16]. These studies have examined changes in gene expression post-NAC to understand the molecular underpinnings driving treatment resistance in TNBCs. Identifying genes involved in chemotherapy-resistant disease (residual disease post-NAC) could lead to better stratification of patients and better therapeutic strategies for TNBC. Balko et al. measured gene expression for 450 transcripts from pre- and post-NAC breast tumor issues using the NanoString platform and identified DUSP4 deficiency as an important mechanism of TNBC drug resistance [
15]. The I-SPY1 breast cancer clinical trial study by Magbanua et al. has shown an association of response with cell cycle and immune pathways [
13]. Hancock et al. generated pre- and post-transcriptomics data using the Ion Torrent platform. They identified an association of increased SMAD2 expression, TP53 loss, and MYC-driven amplification with chemorefractory TNBC tumors [
7]. Importantly, the I-SPY1 study and Hancock et al.'s study are complementary in that they describe the depletion of the immune microenvironment and up regulation of markers related to stemness in chemoresistant tumors [
7,
13].
In our study, we investigated the topological differences in chemoresistant TNBC tumors. Specifically, we investigated transcriptome sequencing data from treatment resistant TNBC tumors. Since TNBC represents a heterogeneous group of TNBC subtypes, we focused on androgen receptor (AR) negative TNBC (i.e., ER-, PR-, HER2- and AR-). We developed a signature associated with early recurrence using post-NAC gene expression data from chemoresistant TNBC tumors. That signature was then compared with the patient's pre-NAC transcriptomics, whole-exome sequencing (WES), and reverse-phase protein array (RPPA) data. Moreover, we validated the signature using I-SPY1 pre- and post-NAC TNBC data. Finally, we confirmed that the 17-gene signature could predict post-NAC patients with an elevated risk of recurrence.
Discussion
Triple-negative breast cancers are phenotypically heterogeneous diseases that lack therapeutic targets, and as such neoadjuvant chemotherapy is the standard of care. For 25–30% of patients who achieve pCR, the 3-year overall survival rate is 94%. However, for patients with residual disease after NAC, approximately 50% of patients will develop recurrent disease within the first 3–4 years [
7]. We have previously reported the up regulation of genes (including
ERBB4,
EGF,
MAPK10,
KIT, and
FGFR2) involved in hormone receptor HR cross-talk and the androgen receptor signaling pathway (
AR,
FOXA1,
FOXA2, and
FOXA3) associated with resistance to NAC treatment in TNBC patients using pre-NAC multiomics data [
6]. Hence, to investigate potential biomarkers delineating early recurrence among TNBC NAC-resistant patients, we evaluated longitudinal TNBC tumor biopsies. To ensure a clean signal, we removed LAR samples and confirmed that post-NAC biopsies contained more than the tumor bed using an
in-silico deconvolution model.
Our initial investigation of the ERC and NRC groups pre-NAC transcriptome gene expression identified only 11 genes to be differentially expressed ((FDR < 0.05). While analysis of pre-NAC RPPA data identified 20 differentially regulated proteins and 10 differentially regulated phosphorylation sites. Conversely, there was minimal change in the post-NAC RPPA data. The RPPA assay should be noted is constructed with a bias of signaling related proteins. This would suggest a transcriptional and translational temporal lag in signaling pathways, and that the post-NAC data reflects a common response to therapeutic intervention. However, the study of the transcriptional differences between ERC and NRC in the post-NAC transcriptome identified 660 genes which suggest that therapeutic selection pressure drives post-NAC differences. Among the 660 differentially expressed genes (FDR < 0.05 and |logFC|> 1), we observed the enrichment of genes in post-NAC data in a few cytoband regions (Fig.
1C-E, Fig.
2). Two of these regions (17q25 and 1q23-24) were confirmed to demonstrate copy gains in the pre-NAC biopsy, which coincided with the elevated expression observed in the post-NAC early recurrent samples (Fig.
2). Three genes in the 17q25 region are involved in sumoylation:
NUP85,
CBX2, and
CBX4. Similarly, the cytoband 4p15 presented a copy loss in the early recurrent tumors (pre-NAC), and the post-NAC data also demonstrated decreased expression (not shown. Contrary to the transcriptional observations, the protein expression levels were more altered in the pre-NAC data (20 of 232 total proteins and 10 of 63 phosphorylation sites,
p-value < 0.05). In contrast, four protein changes were observed in the post-NAC data.
A parallel track of analysis using topological gene set measurements was adopted to navigate the challenges we faced, given the relatively low sampling power. To reduce the number of genomic features (20,543 genes), we conducted a GSVA analysis (with 2282 gene sets) on the post-NAC BEAUTY gene expression data and leveraged our observations with the independent I-SPY1 study post-NAC microarray data. We observed that 251 gene sets were altered between the ERC and NRC BEAUTY TNBC tumors (Fig.
3). Cluster analysis of the 251 gene sets identified two clusters, with the first cluster including
TUBB4A, the therapeutic target of paclitaxel. The gene sets which were up regulated in the ERC and included metastasis-promoting gene sets, DNA mismatch repair, and
TP53 gene sets. In contrast, the second cluster of gene sets that were down regulated in the ERC samples consisted of tumor suppressor gene sets, including
FOXO signaling,
TGF-β signaling, and apoptosis (Fig.
3).
Due to the technology differences between the microarray (I-SPY1) and RNA-Seq (BEAUTY) data, only 188/251 gene sets from the BEAUTY study were also investigated in the I-SPY1 study. We confirmed 56/118 gene sets in the I-SPY1 data to be significantly and concordantly altered, among which 113 genes were significantly and concordantly DE in both study cohorts. We refined the 113 genes by evaluating their individual prognostic value with the publicly available TNBC cohort provided by KM-plotter. We identified top candidate genes (
n = 17) associated with recurrence-free survival in the KM-plotter TNBC cohort log-rank test
p-value < 0.05 and adjusted
p-value < 0.3 (Fig.
4, Additional file
1: Figure s2). Among these 17 genes, four genes (
FHL2,
TANK,
PDE2A, and
RSPO3) were well documented to be associated with breast cancer and down regulated in the post-NAC early recurrent samples.
PDE2A is a phosphodiesterase that regulates mitochondrial respiration and mitogenic clearance [
48,
49].
FHL2 is a zinc finger transcription factor associated with several cancers, including ovarian and cervical cancers [
50‐
52]. We note that the I-SPY1 trial observed an increase in interferon signaling associated with shorter recurrence-free survival among nonresponding patients [
13]. Recent research has suggested that two molecules,
RSPO3, and
TANK, are related to
NF-κB signaling and survival response through the induction of inflammatory molecules.
RSPO3 down regulation is involved in prostate cancer invasiveness and interacts with the inflammatory mediator
IL-1β [
53,
54]. Two SNPs (
rs17705608 and
rs7309) in the
TANK gene have been associated with breast cancer risk [
55‐
57], involving TNF-mediated signal transduction. Most importantly,
TANK, a member of the
TRAF family, binds to
NEMO (
IKKγ) to induce inflammation through the
IKK complex and
NF-κB signal transduction [
58]. These findings suggest that initiating inflammatory signaling via
NF-κB signal transduction might be integral to recurrent free disease. Moreover, we observed significant down regulation (− 2.91,
p-value 0.044) of the key apoptotic protein,
BCL2, suggesting that inflammation might be accompanied by immune infiltration and subsequent cell death. We also compared this 17-gene signature of early recurrence with the DE genes for chemoresistant (pCR vs non-pCR) in the BEAUTY TNBC cohort previously described in [
6]. We observed only one gene (
COL4A6) presented in both DE analyses.
Cross-validation analysis of these 17 genes demonstrated a robust ability to predict recurrence, particularly with random forest, kNN, and kernel SVM algorithms (Fig.
5), achieving an AUC of 0.88. We investigated several datasets and confirmed that our 17 gene signature was specific to non-LAR tumors and post-NAC non-LAR tumor biopsies (Table
3). We also systematically analyzed the in-house and publicly available TNBC gene expression NAC datasets using computational biology and machine learning methods. We concluded that the post-NAC non-LAR TNBCs changed significantly during treatment compared to the baseline pre-NAC tumors. We have shown that the 17 genes identified in this study are novel biomarkers that predict recurrence in post-NAC residual tumors. Given the nature of our research, where we needed clinical, and omics data from paired pre- and post-NAC tumors from the same TNBC patient cohort, our ability to robustly evaluate the data was limited. However, we consciously tried to reduce the potential for false discoveries with our limited sampling population by applying
p-value cutoffs along with fold change thresholds. Although we observed an average AUC of 0.88 using cross-validation analysis of the publicly available dataset and our Mayo Clinic BEAUTY study, further validation of the 17-gene signature in large TNBC cohorts is necessary.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.