Background
In addition to being a major risk for breast cancer, age was also shown to be associated with increased mortality in women with breast cancer [
1]. The positive correlation between age and breast cancer incidence and mortality is believed to be due to the progressive accumulation of genetic and epigenetic alterations in breast epithelial cells as well as epigenetic changes in their microenvironment, which lead to changes of gene expression. However, the mechanism underlying the age effect on breast cancer remains poorly understood.
The genomic and epigenetic data of human adjacent normal tissues and tumor tissues, from public databases including The Cancer Genome Atlas (TCGA), have facilitated numerous findings in various cancers. Because age is a major risk for cancer and epigenetic alteration is one of hallmarks for both cancer and aging [
2‐
4], it appears logical that if the expression of a gene is upregulated during aging and higher in tumor tissues than in their matched adjacent normal tissues, it is likely oncogenic. Conversely, if its expression is downregulated during aging and lower in tumors than their matched adjacent normal tissues, it is likely tumor-suppressive. Following this logic, we should be able to identify the genes associated with both age and breast cancer.
However, whether these in silico-identified age- and cancer-associated genes contribute to cancer development and progression, or whether they are no more than accidental correlation as passengers instead of drivers, remains inconclusive from statistical data mining. To answer these questions, experiments, where potential tumor-promoting genes are knocked down and potential tumor-suppressive genes are overexpressed in various in vitro and in vivo models, are needed to validate their direct effect on cancer progression.
In this study, we identified 38 novel age- and breast cancer-associated genes, which we call them ABC genes, by mining TCGA breast cancer data and validating using GTEx normal breast tissue data. We selected two upregulated and two downregulated ABC genes with significant prognostic values for various subtypes of breast cancer and further demonstrated that the upregulated ABC genes, DYNLT3 and P4HA3, have a tumor-promoting function whereas the downregulated gene ALX4, but not WDR86, showed tumor-suppressive function by in vitro and in vivo experiments. As such, our study provides novel molecular links between aging and breast cancer development and progression.
Materials and methods
Processing of TCGA RNAseq data
The RSEM estimated raw read counts data were downloaded from TCGA Data Portal on July 25, 2014, and normalized by library sizes, log2 transformed, and further normalized by quantile normalization. Removal of 25% of genes with the lowest mean expressions resulted in subsequent analyses on 15,398 genes. In identifying genes associated with age (birth_days_to), the genes whose expression were considered to be correlated with age were selected by three criteria: (1) R2 value among the highest 5%, (2) absolute value of slope for age among the highest 5%, and (3) adjusted P value for the slope among the lowest 5%. In identifying genes differentially expressed comparing post-menopausal normal against pre-menopausal normal, and comparing matched tumors against match normal samples, a combination of methods using moderate hierarchical t test (R/limma) and moderate fitting based on negative binomial distribution (R/DESeq2), with and without surrogate variable analysis for potential confounders (R/SVA) with cutoff as FDR < 0.05, were implemented. The common overlapping genes from four methods were determined as the final list for differentially expressed genes.
Correlation between gene expression and subtypes of breast cancer
Two metrics were implemented to test the correlation between gene expression pattern of the 38 ABC genes and subtypes of breast cancer. The first metric was the sum of gene expression levels of 14 upregulated genes minus the sum of gene expression of 24 downregulated genes.
\( \mathrm{Score}=\sum \limits_{\mathrm{g}\;\mathrm{in}\kern0.17em \mathrm{genes}}\mathrm{Expression}\left(\mathrm{g}\right)\times \mathrm{Direction}\left(\mathrm{g}\right) \)
The second metric was Pearson’s correlation coefficient between expression levels of 38 genes with their expression direction, 1 for upregulated and − 1 for downregulated genes.
Score = Pearson Correlation (Expressions, Directions)
Both metrics were calculated for each patient then both metrics from all patients were examined against their subtypes.
Gene set enrichment analysis
Normalized log2 expression data from TCGA breast cancer cohort were rank-ordered by fold change between any two subtypes. The signature comprised two sets, one for 14 upregulated genes and the other one for 24 downregulated genes. Genes in the expression data were not collapsed but directly compared with the signature. Gene signatures were tested using default enrichment GSEA statistics with 1000 random permutations.
Survival analyses
Survival analyses, including disease/progression-free survival, were performed with “survival” package from R for patients in TCGA cohort. The split ratio to group patients into high and low groups based on gene expression was searched from 30% quantile to 70% quantile, and the best quantile which yields the lowest Cox regression P value was used as the best split ratio. To determine the prognostic value, Cox regression adjusted by diagnosis age was performed on two groups determined by the best split ratio.
For relapse-free survival on patients with aggregated microarray data,
kmplot.com was used [
5]. “Auto select best cutoff” was checked, and “only JetSet best probe set” was used to choose probeset for each queried gene. For the analysis on panel of genes, the weighted mean expression of all probeset was used to split patients.
For survival analysis of each individual gene, analyses were conducted on all patients and patients stratified by subtypes, in both TCGA RNASeq cohort and
kmplot.com microarray cohort. For analysis on a panel of genes, analyses were conducted on all patients.
Validation on GTEx dataset
The age information in unit of year of all donors in the GTEx database (
www.gtexportal.org/home/) was applied and downloaded via dbGaP-controlled access. Among them, RNAseq gene counts data from 90 donors with normal breast tissue donation were extracted. This expression data was then normalized by library sizes, log2 transformed, and further normalized by quantile normalization. The 38 ABC genes identified from TCGA dataset were then queried in the normalized GTEx data, by studying the correlation between the gene expression and age. Multi-testing was addressed by Benjamini-Hochberg correction.
Cell lines and culture conditions
The human breast cancer cell lines (MDA-MB-231, MDA-MB-468, ZR-75-1, SK-BR-3, Hs 578T, and BT-474) were purchased from the American Type Culture Collection (ATCC, Manassas, VA, USA), and MCF-7 cells were obtained from the Michigan Cancer Foundation in 1994. They were cultured in McCoy’s 5A medium supplemented with sodium pyruvate, amino acids, vitamins, penicillin, streptomycin, sodium bicarbonate, and 10% fetal bovine serum (FBS) in a humidified atmosphere containing 5% CO2 at 37 °C as previously described. These cells were authenticated in late 2016 with short tandem repeat assay by the DNA Lab in our university. They were routinely tested to be mycoplasma-free at least once a year with a luminescence-based mycoplasma detection kit from Lonza (Cat. # LT07-118).
Transfection of siRNAs and screening
Three siRNAs per gene were purchased from Sigma-Aldrich. A scrambled siRNA (Sigma-Aldrich) was used as a negative control. Dried siRNAs were reconstituted with DEPC-treated water (10 μM Stock). Lipofectamine RNAiMAX reagent (Invitrogen) was diluted by adding 0.3 to 25 μl basic medium in each well. siRNAs (0.75 μl at 10 μM) were diluted in 25 μl basic medium and then mixed with suspended cells to reach 50 nM working concentration. Diluted RNAiMAX and siRNAs were incubated for 20 min at 37 °C. Cell suspension in 100 μl containing 3000 MDA-MB-231 cells, 3000 MDA-MB-468 cells, 7000 MCF7 cells, 5000 ZR-75-1 cells, 7000 SK-BR-3 cells, or 9000 Hs 578T cells in culture medium with 15% FBS were added to pre-incubated transfection reagent. Cell proliferation was monitored, and confluency was analyzed every 3 h in IncuCyte for 1 week.
Molecular cloning and lentiviral vectors
MISSION® TRC2 pLKO.5-puro plasmids containing shRNAs against DYNLT3 and P4HA3 and MISSION® TRC2 pLKO.5-puro Empty Vector Control Plasmid were purchased from Sigma-Aldrich. MISSION® TRC2 pLKO.5-puro Empty Vector Control Plasmid was used as a control for these knockdowns. Plasmids harboring full-length CDS of ALX4 (HsCD00294887) and WDR86 (HsCD00436836) were purchased from DNASU Plasmid Repository (
dnasu.org). Full-length CDS of ALX4 and WDR86 were then cloned by PCR with primers. The plasmid pWPI (addgene: 12254) was inserted at PmeI site with linker harboring BamHI, MluI, and NdeI sites, denoted as pLenti-Control. The cloned CDS of ALX4 and WDR86 were then inserted into plasmid pLenti-Control linearized by BamHI and PmeI, to construct pLenti-ALX4 and pLenti-WDR86 correspondingly. The plasmid pLenti-Control was used as a control for the overexpression.
Viral production and infection of cell lines
293T packaging cells were transfected lentiviral vectors and packing vectors to generate lentiviral shRNAs (DYNLT3, P4HA3, and control) or CDS (ALX4, WDR86, and control) to infect MDA-MB-231 and BT-474 cells. Knockdown and overexpression were then examined with quantitative RT-PCR and Western blot analyses.
RNA isolation and quantitative RT-PCR
Cells were lysed with QiaGen RLT Lysis buffer, and total RNAs were extracted using RNeasy Mini Kit (QiaGen: Cat No./ID: 74104) following the manufacturer’s protocol. cDNA was generated using random primers and M-MLV reverse transcriptase from Invitrogen (Grand Island, NY), and real-time PCR was performed using SYBR reagent (Invitrogen) as described before [
6].
Western blot analysis
Whole cell lysates were obtained in Laemmli buffer containing protease inhibitors and processed as described previously [
6]. Antibody to DYNLT3 (Abcam, catalog # ab121209) was diluted 1:200 in 3% milk, to P4HA3 (Abcam, catalog # ab101657) was diluted 1:500 in 3% milk, to ALX4 (Biologicals, catalog # NBP2-45490) was diluted 1:1000 in 3% milk, to WDR86 (Thermo Fisher, catalog # PA5-48389) was diluted 1:1000 in 3% milk, to GAPDH (Calbiochem, catalog # 80602-840) was diluted 1:2500 in 3% BSA, and to tubulin (Sigma, Catalog # T4026) was diluted 1:5000 in TBST.
Cell viability assay
Knockdown of DYNLT3 and P4HA3 or overexpression of ALX4 and WDR86 in MDA-MB-231 and BT-474, and corresponding knockdown or overexpression control cells were plated in at least triplicate wells at 2000 MDA-MB-231 cells or 5000 BT-474 cells per well into 96-well plates. In the following 1 week, MTT solution (50 μl, 2 mg/ml in PBS) was added to each well and allowed to incubate for 2 to 4 h in a cell culture incubator. After removing the culture medium, 100 μl DMSO was then added into each well with the plates gently agitated on a shaker. The absorbance was measured at 595 nm with a microplate reader (BioTek Instrument, Winooski, VT), which is proportional to the number of viable cells.
Transwell migration assay
Cells were seeded at a density of 20,000 cells per well for MDA-MB-231 and 80,000 cells per well for BT-474 in serum-free medium in the upper 24-well Boyden chambers with 8-μm-size inserts (BD Biosciences, Durham, NC, USA). A serum-containing medium was added to the lower chamber. After allowing the cells to migrate for 16 to 18 h in cell culture incubator at 37 °C, the cells on the top of the membrane were removed and the migrated cells were fixed and stained with HEMA 3 STAIN SET from the Fisher Scientific Company. Picture of the stained cells on the insert was captured under a microscope at × 4 magnification for counting [
7].
Soft agar assay
Soft agar assays were carried out as described previously [
7]. Growth of cells in soft agar was determined by plating 10,000 cells in triplicate in 0.3% Noble agar in 6-well tissue culture plates. Three weeks after plating, soft agar plates were stained and counted as described previously [
7].
Xenograft experiment in mice
Animal experiments were conducted following appropriate guidelines. They were approved by the Institutional Animal Care and Use Committee and monitored by the Department of Laboratory Animal Resources at the University of Texas Health Science Center at San Antonio. Five million exponentially growing luciferase- and GFP-expressing BT-474 control or DYNLT3 knockdown, or P4HA3 knockdown cells in 0.1 ml volume were injected into the left inguinal mammary fat pad of ten 6-week-old female NOD scid gamma (NSG) mice purchased from The Jackson Laboratory. Similarly, nine 6-week-old female NSG mice were inoculated with 5 × 106 exponentially growing luciferase- and GFP-expressing BT-474 control or ALX4 OE, or WDR86 OE cells. About 10 days after inoculation, tumor size was recorded twice a week. Tumor volumes were calculated as tumor volume (mm3) = length × width × width/2. For the detection and quantification of lung metastasis biofluorescence imaging with IVIS, mice were euthanized at the termination of the experiment. d-luciferin (200 μl at 15 mg/ml) was injected intraperitoneally. Ten minutes later, the lung tissues from each mouse were excised and transferred to wells of a 24-well plate and were immediately imaged to obtain photo flux. Living image software was used for analysis. Tumor tissues were then excised, weighted, and cut into pieces to be frozen at − 80 °C or fixed in 10% formalin for various analyses.
Statistics and reproducibility
Results are represented as described in the figure legends, with mean ± sem (standard error of mean). For the experiments with two groups and normally distributed data, the difference of means was tested for significance with two-tailed unpaired Student’s t test with the assumption of unequal variance. For multiple independent groups, one-way ANOVA was used. All experiments were repeated at least three times, and representative images and figures were shown.
Discussion
Age is a well-known risk factor for breast cancer, and published studies have implicated potential relation between age and breast cancer [
12,
13]. Similar to our study, Pirone et al. collected 96 tissue specimens from patients with reduction mammoplasty with age ranging from 14 to 70 years old and studied the transcriptome profile by microarray [
14]. The age-associated 802 probes (481 increased, 321 decreased with increasing age) they identified were enriched in biological processes including cancer, immune response, and cellular proliferation. However, only 31 upregulated and 16 downregulated genes overlap between our study and that of Pirone et al. (Supplementary Figure
11 A, B). The small number of overlapping genes could be due to the difference between tumor-adjacent normal tissue and mammoplasty reduction tissue. The latter likely contained more adipose tissue. In addition, the transcriptome profile of the tumor-matched adjacent normal samples in TCGA dataset may be influenced by the presence of cancer [
15], which may contribute to the difference between the genes we identified and the age-associated genes by others. Nonetheless, while the age-associated genes we identified may be confounded by the cancer field effect, our main focus was to identify age-associated genes that contribute to tumorigenesis. Thus, in the last step of our transcriptome analysis, we contrasted the age-associated genes with genes differentially expressed in tumor versus their matched adjacent normal tissues resulting in a list of novel ABC genes with predictive power for the disease outcome as well as other unique properties discussed below.
In order to remove the menopause effect in age-related genes, we identified and removed menopause-associated differentially expressed genes from the age-associated genes. However, due to the wide range of age in the post-menopausal patients in the TCGA cohort, such comparison of gene expression between pre- and post-menopausal cohorts may be confounded by age. Therefore, we may lose some important targets in the list of “pure” age-dependent genes (148 upregulated and 189 downregulated genes in Supplementary Figure
1A), resulting in a higher false-negative rate. Menopause has been shown to elevate inflammation response [
16,
17]. Aging is associated with chronic inflammation which is also associated with the early development of breast cancer [
18]. Upon menopause, systemic hormone level alterations, especially estrogen, lead to the upregulation of estrogen receptor whose activity is critical in breast cancer. Thus, the delineation of the relationship among menopause, aging, and breast cancer development remains an important and challenging topic for further study.
In this study, we identified 38 genes that are both age and breast tumorigenesis related. Among them, fourteen genes are potential tumor-promoting genes that are upregulated during aging while twenty-four of them are potential tumor-suppressive genes that are downregulated during aging. While most of the upregulated genes have previously been shown to be involved in carcinogenesis, most of the downregulated genes have either not been studied for their biological functions or not been implicated in carcinogenesis. For example, the upregulated CLEC3A has been shown highly expressed on the tumor cell surface, and the cleavage of CLEC3A by MMP7 was reported to weaken tumor cell adhesion and migration [
19]. The frequently amplified gene CTHRC1 has been reported to be involved in cell proliferation, migration, invasion, and metastasis by regulating multiple signaling pathways including the TGFβ pathway and collagen synthesis pathway [
20‐
23]. The gene HSD11B2 regulated by progesterone, also known as hydroxysteroid (11-beta) dehydrogenase 2, has been frequently studied as a tumor promoter which promotes cancer progression by regulating interconversion of cortisol and cortisone [
24‐
26]. Among the downregulated ABC genes, DLK1 has been shown in many recent studies to function as a tumor suppressor [
27,
28]. On the other hand, it was recently shown as non-canonical Notch ligand to promote tumorigenesis by promoting epithelial-mesenchymal transition in ovarian cancer [
29]. The gene MAGI1 is short for membrane-associated guanylate kinase inverted 1. Previous studies have suggested its potential tumor suppressor role by interacting with beta-catenin to regulate cell-cell adhesion [
30,
31] and by regulating PTEN to inhibit cancer cell migration and invasion [
32]. Among the ABC genes that have not been studied in breast cancer, we used the selection criteria described above and selected four novel genes, DYNLT3, P4HA3, ALX4, and WDR86, for further investigation of their functional role in regulating the malignant properties of breast cancer cells for the first time.
With in vitro experiments, we showed that DYNLT3 and P4HA3 promoted breast cancer malignancy. DYNLT3 is short for dynein light chain Tctex-type 3, which is a member of a subclass of dynein light chains. DYNLT3 protein has been reported to form the light chain component of the cytoplasmic dynein motor protein complex [
33,
34]. Its paralog, DYNLT1, has been found to promote tumor progression [
35‐
37]. Thus, our current study and the published studies indicate that cytoplasmic dynein motor proteins play a critical role in tumor progression. P4HA3 is short for prolyl 4-hydroxylase subunit alpha 3, which is a component of prolyl 4-hydroxylase involved in collagen synthesis. Its paralog, P4HA1, has been found to promote tumor progression [
38,
39]. Alterations in collagen synthesis pathways have been shown in many cancer progressions [
40‐
43]. A recent study showed that P4HA3 appears to be a target effector of the TGFβ pathway mediating the tumor/migration-promoting activity of TGFβ [
44]. Consistently, TGFβ1 was found in our study to be significantly upregulated in post-menopausal patients in comparison with pre-menopausal patients in the TCGA dataset (data not shown). On the other side, we showed that overexpression of ALX4, a potential tumor suppressor, significantly inhibited the malignant properties of breast cancer cells. ALX4 is short for aristaless-like homeobox 4, which is a homeobox domain transcription factor. It has been shown in lung cancer [
9] and colorectal neoplasia [
10] that ALX4 is epigenetically silenced via hypermethylation. Moreover, its expression was previously reported to be significantly downregulated in breast cancer [
11]. As we were completing our experiments involving ALX4, Yang and co-workers also reported the tumor-suppressive function of ALX4 in breast cancer cells [
45]. These published studies together with ours support the conclusion that ALX4 acts as a putative tumor suppressor in subsets of breast cancer. Lastly, we also studied the extent to which WDR86 could play as a tumor suppressor. WDR86 is a novel gene with little known function. WDR86 is short for WD domain repeat 86. It has been characterized with 4 validated transcription isoforms and 11 predicted isoforms [
46]. It is proposed that one of the paralogs of WDR86 is TRAF7, which has been implicated as a tumor suppressor by inducing apoptosis [
47,
48]. Therefore, WDR86 may function as a potential tumor suppressor although we did not observe a malignancy-inhibitory activity of WDR86 with the assays we used. Further studies are needed to ascertain the role of WDR86 in breast carcinogenesis. In summary, our current study provides evidence demonstrating tumor-regulatory functions for the three of the four ABC genes. It is possible that some ABC genes such as these three genes may act in concert to regulate breast cancer progression because we found that in about 25% of the 82 cases, they appear co-regulated with a reasonable concurrence score (Supplementary Table
3). However, whether the expression alterations of these genes during aging truly contribute to mammary tumor initiation and progression requires further in vivo experiments using animal models.
One of the potential limitations of our study is that the genes we identified could be limited to the features of 82 patients in TCGA database. Among these 82 patients, there were 45 luminal A subtypes, 17 luminal B subtypes, 13 basal subtypes, 6 Her2 subtypes, and 1 unknown as of the date of analysis. Because of a relatively large variation in human samples, it was impossible to use data of only 45 luminal A subtypes to identify menopause-independent age-associated genes with enough power. Thus, we included all 82 patients for identifying age-associated genes, which could result in a bias toward more luminal-like samples. This may explain why the 38 ABC genes were more positively enriched in luminal A and B and Her2 subtypes of tumors compared to basal and normal-like subtypes. Thus, while the up- or downregulated ABC genes are interestingly associated with poor or favorable patient outcomes, respectively, because of high variation in human samples, more data and studies are needed to confirm the findings in our study.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.