Background
Immunotherapy with anti-programmed cell death ligand 1 (anti-PD-L1, atezolizumab) was recently approved by the Food and Drug Administration (FDA) for the indication of PD-L1-positive metastatic triple negative breast cancer (TNBC) [
1,
2]. However, novel combination immuno-oncology (I-O) therapies will be required to improve efficacy in other therapeutic settings, such as for PD-L1-negative disease, or for less immunogenic breast cancer subtypes such as luminal-type hormone receptor-positive cancers. In an era when numerous I-O agents are being developed clinically [
3‐
5], one promising avenue to accelerate drug development is to develop biomarkers to characterize immune cell (IC) and tumor cell (TC) infiltrates, enabling a comparison of pharmacodynamic effects of various I-O strategies. Here, we describe a methodology that employs multiplex immunofluorescence (mIF) in conjunction with statistical modeling to characterize IC infiltration and PD-L1 expression in the context of early-stage breast cancer (ESBC) I-O clinical trials.
The mIF assay is of particular interest in breast cancer because it may serve to complement two clinically developed I-O biomarkers, PD-L1 expression (by the Ventana SP142 assay), and the hematoxylin and eosin (H&E) stromal tumor-infiltrating lymphocyte (sTIL) score. The SP142 PD-L1 assay is FDA-approved to identify patients with PD-L1
+ TNBCs who could potentially benefit from the addition of atezolizumab to nab-paclitaxel [
1,
6]. The SP142 assay categorizes tumors as PD-L1
+ if at least 1% of the tumor area is occupied by PD-L1
+ immune cells (ICs). PD-L1 expression is thought to be dynamic, with biological conditions and/or therapeutic interventions potentially modifying the extent of PD-L1. While the binary designation of PD-L1 status is clinically useful, its ability to serve as a pharmacodynamic biomarker to assess for dynamic PD-L1 change is limited by its semi-quantitative nature and operator dependency. Likewise, the H&E sTIL score uses pathologist estimation of proportion of stromal area occupied by TILs on a single H&E slide as a general gauge of tumor immunogenicity [
7]. Similar to PD-L1 testing, sTIL may be clinically useful (as it correlates with survival, chemotherapy response, and potentially immunotherapy response) [
8‐
12]; however, several barriers limit its use as a pharmacodynamic biomarker, including suboptimal inter-observer concordance related to underlying intratumoral heterogeneity of sTILs [
13]. Both PD-L1 and sTILs have the limitation of being semi-quantitative assays and require a pathologist to visually estimate ICs, which may sometimes be present in low abundance.
mIF enables estimation of IC counts in high resolution across numerous high-powered magnification fields (hereafter called regions of interest, ROI) and therefore has the potential to produce more accurate and precise estimates of sTILs and PD-L1 expression, relative to the clinical assays. Furthermore, mIF permits more detailed characterization of IC/TC interactions via single-cell quantification of numerous phenotypic surface markers, and spatial localization of cells into various tissue compartments (i.e., tumor versus stroma). Here we use a 6-marker panel of CK, CD3, CD8, FoxP3, CD163, and PD-L1 to visualize, quantify, and phenotype ICs and TCs in ESBC specimens. IC densities and PD-L1 expression are repeatedly sampled across multiple ROIs on a single slide, providing the opportunity to characterize spatial heterogeneity. With appropriate statistical modeling, the repeated sampling of ROIs can be used to improve both accuracy and precision of IC density and PD-L1 expression estimates.
Here, we report mIF data from a phase Ib study of IRX-2, a loco-regional cytokine therapy in early-stage breast cancer (ESBC) [
14]. IRX-2 contains various cytokines (including interleukin (IL)-2, IL-1β, interferon-γ, tumor necrosis factor-α, among others) delivered subcutaneously in the distribution of regional lymphatics, and was previously shown to increase tumor lymphocyte infiltration in pre-operative head and neck carcinomas [
15‐
17]. In the phase Ib breast cancer trial, IRX-2 was injected in the peri-areolar tissue (modeled after sentinel lymph node mapping) and was found to be well tolerated, achieving the primary safety endpoint as well as showing evidence of enhanced IC infiltration and lymphocyte activation (measured by RNA sequencing) [
14]. Using the paired biopsy and surgical excision specimens from this trial, our objectives of this project were (1) to propose a method for harmonizing mIF with the PD-L1 SP142 and H&E sTIL clinical assays; (2) to illustrate how hierarchical linear modeling can enhance statistical precision of IC density/PD-L1 expression estimates; and (3) to evaluate the influence of ROI sample size on overall statistical power to detect changes in ICs/PD-L1 in the context of a clinical trial.
Discussion
Innumerable I-O strategies show promise in preclinical breast cancer models either as monotherapy or in combination with approved therapies (T cell agonists, trastuzumab, chemotherapy, radiotherapy, or targeted therapy) [
1‐
4,
31‐
33]. Pre-operative I-O clinical trials in ESBC provide the opportunity to efficiently compare pharmacodynamic activity using serial tissue-based comparative biomarkers, while also providing pathologic outcomes as a meaningful surrogate of disease-free recurrence [
34]. mIF has been proposed as a promising biomarker, as it has been shown to be concordant with clinical PD-L1 assays in ESBC [
35], and in a recent meta-analysis outperformed clinical PD-L1 testing, quantification of tumor mutational burden, or gene expression profiling in predicting immunotherapy response [
36]. Here, we provide additional guidance on how mIF can be used as a pharmacodynamic biomarker in the context of ESBC I-O pre-operative clinical trials. We show that mIF estimates of PD-L1 expression and sTIL/IC density correlate with the validated clinical assays, but with higher resolution to measure treatment-related pharmacodynamic changes. It has recently been suggested that both PD-L1 and sTIL clinical assays be co-analyzed to enhance predictive/prognostic performance [
37]. As illustrated in this manuscript, mIF provides granular detail on single-cell PD-L1 expression across cellular phenotypes, which can be used to categorize tumors based upon ratios of PD-L1-expressing cells, phenotypic predominance patterns of PD-L1+ cells (i.e., macrophage v. lymphocyte), and spatial patterning of PD-L1. As a future direction of investigation, we propose that clinical investigators prioritize the inclusion of mIF in clinical trials in tandem with the clinical assays, so the unique predictive/prognostic utility of these added data can be adequately interrogated.
We identified several aspects of mIF that might be useful in addressing the pitfalls of clinical H&E sTIL assessment, which were recently summarized from the RING studies [
13]. First, by H&E, it was found that non-lymphocyte cells or intraepithelial TILs could be misclassified for sTILs by pathologists. mIF could substantially mitigate this source of error, by employing multiple cell surface markers to accurately classify lymphocytes. Second, it was found that pathologists exhibited different set-points/scales for quantifying sTILs by H&E, resulting in substantial inter-observer discordance. This pitfall could be in the future be mitigated by mIF once the staining, imaging, and analysis workflow becomes harmonized across institutions. Efforts are ongoing via the National Institutes of Health Cancer Immune Monitoring and Analysis Centers (CIMAC) to standardize and validate a mIF workflow [
38]. A third source of error was heterogeneity of sTIL counts across areas of the tumor. One proposed solution to mitigate this error is to sample and average sTIL counts across multiple ROIs [
13]. Using mIF, it is feasible to estimate sTIL counts across a large number of ROIs, and we demonstrate that adequate ROI sampling is important for stabilizing estimates of treatment-related changes in sTILs in the context of clinical trials.
Statistical modeling has been underexplored as a method for improving accuracy and precision of sTIL/IC density estimation. To date, there is no universally adopted approach for the statistical treatment of mIF output data. By convention, many investigators collapse ROI IC density estimates into a mean per-sample score, which does not fully utilize the added information derived from repeated sampling across ROIs. As an alternative, we demonstrate statistical modeling can improve statistical power and minimize potential detrimental confounding effects of intratumoral heterogeneity. As illustrated in Table
3, statistical modeling was associated with a narrowing of confidence intervals of IC estimates, and smaller observed
p values. Furthermore, compared to conventional
t-testing of means, the hierarchical linear modeling method reduced the required patient enrollment size from
n = 25 to
n = 13 to show an effect of IRX-2 on sTILs.
We also illustrate how mIF can be also used to evaluate more complex hypotheses related to I-O treatment effect. For example, based upon preclinical models and previous trials data, it was hypothesized that locoregional cytokine perfusion (IRX-2) would increase lymphocyte trafficking and facilitate PD-L1 upregulation within the breast tumor via modulation of the JAK-STAT pathway [
17]. Using mIF, we confirmed that IRX-2 is associated with increases in sTILs and PD-L1 upregulation in the tumor microenvironment, as well as a shift in the ratio of cytotoxic T cells to CD163+ macrophages and regulatory T cells. These findings are corroborated by previously published data from gene expression profiling, clinical SP142/H&E sTIL assays, and T cell receptor DNA sequencing [
14]. Based upon these encouraging findings, we are conducting a trial to compare single-dose anti-PD-1 +/− IRX-2 (
n = 15 per arm) as an induction therapy to potentially enhance immune infiltration prior to neoadjuvant chemotherapy plus pembrolizumab in stage II-III TNBC (NCT04373031). In the future, the spatial output data derived from mIF can be used to evaluate spatial hypotheses, such as whether cytokine therapy permits aggregation or penetration of tumors into the tumor/stromal interface. Such a hypothesis could be evaluated by comparing pre versus post-treatment densities of buffer zones surrounding the tumor/stromal interface.
Our approach is not without limitations. First, because the assay is limited to 7 markers, B cell markers were not included, and this may have influenced the overall estimation of sTILs (since B cells would be included in the H&E sTIL score). It is possible to customize mIF with different markers; however, careful attention must be paid to ensure that each panel is properly validated using ESBC specimens, and therefore for this pilot study, we opted to use a previously validated panel for which we have extensive experience and publication [
21,
39]. Future improvements in technology are anticipated to allow for simultaneous measurement of additional markers. A second limitation is the lack of a treatment control, which precluded assessment of potential confounding effects of time and/or biopsy trauma. This will be addressed in the ongoing randomized phase II trial. A third limitation is the resource-intensive nature of our approach, which requires acquisition and analysis of all lymphocyte-bearing ROIs in a given sample. This process may require 24 h of processing or greater per specimen; however, we illustrate that the efforts are worthwhile in the context of clinical research as they may reduce sample size requirements. In clinical trials, per-patient expenses, time, and effort are likely to far outweigh the added time and cost required to sample more ROIs. Future advances in technology may permit more rapid acquisition and analysis of whole-slide data, for example using the PerkinElmer Polaris system, which is being validated by our group and others. The fourth limitation is that breast cancer subtypes and/or clinical settings may have unique histologic and immunologic features and therefore our power calculations may not be externally valid in other settings. For example, baseline sTIL levels and PD-L1 expression are lower in hormone-sensitive breast cancers relative to TNBC, and therefore when designing a clinical trial, the power analyses would have to be repeated or modified to account for these expected differences in baseline.
As a future direction, the described statistical modeling can be amended to incorporate data on spatial locations of each ROI and/or each cell, which has further potential to improve estimation. For example, because IC densities of immediately adjacent ROIs may be correlated, the accuracy of the model could be improved after adjustment for spatial autocorrelation. Similarly, topographical features such as leading invasive margin of the tumor are expected to influence IC densities and may be accounted for in the model [
13]. We are piloting advanced spatial modeling that would enable adjustment of IC densities according to spatial distance from observed topographical landmarks, as well as more advanced methods to exclude non-interpretable areas within ROI to improve accuracy.
.Acknowledgements
We acknowledge the following for contributing to patient enrollment: Kelly S. Perlewitz, MD, Janet Ruzich, MD, Alison Conlin, MD, Anupama Acheson, MD, Kristen Massimino, MD, Shaghayegh Aliabadi-Wahle, MD, and James Imatani, MD. We acknowledge Nicole Moxon, RN, and Staci Mellinger, RN, and Tracy Kelly for caring for patients on this trial. We acknowledge the Brooklyn ImmunoTherapeutics team (Lynn Sadowski-Mason, Monil Shah, and others) for their ongoing support and contributions to the study.
Competing interests
KS has no competing interests to declare.
IK has no competing interests to declare.
BC has no competing interests to declare.
JP has no competing interests to declare.
WR has received research funding from Bristol-Myers Squibb, Merck, Nektar Therapeutics, GSK, Galectin Therapeutics, Inhibrx, Oncosec, Aeglea Biotherapeutics, Veana Therapeutics, and MiNA Therapeutics. WR has also received advisory board honoraria from Nektar Therapeutics and serves on the advisory board of Vesselon, Inc.
WU has no competing interests to declare.
MM has no competing interests to declare.
YW has no competing interests to declare.
MC has no competing interests to declare.
ZS has no competing interests to declare.
GG has no competing interests to declare.
SC has no competing interests to declare.
BB has no competing interests to declare.
DP has received advisory board honoraria and institutional research funding support from Brooklyn ImmunoTherapeutics, Bristol-Myers Squibb, and Merck Laboratories. DP receives speaker bureau honoraria from Genentech and Novartis. Unrelated to this work, DP has received additional advisory board honoraria from other entities.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.