Get 20M+ Full-Text Papers For Less Than $1.50/day. Subscribe now for You or Your Team.

Learn More →

Predicting cancer outcomes from histology and genomics using convolutional networks

Predicting cancer outcomes from histology and genomics using convolutional networks Predicting cancer outcomes from histology and genomics using convolutional networks a a a b c Pooya Mobadersany , Safoora Yousefi , Mohamed Amgad , David A. Gutman , Jill S. Barnholtz-Sloan , d e a,f,g,1 José E. Velázquez Vega , Daniel J. Brat , and Lee A. D. Cooper a b Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, GA 30322; Department of Neurology, Emory University School of c d Medicine, Atlanta, GA 30322; Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH 44106; Department of Pathology and Laboratory Medicine, Emory University School of Medicine, Atlanta, GA 30322; Department of Pathology, Northwestern University f g Feinberg School of Medicine, Chicago, IL 60611; Winship Cancer Institute, Emory University, Atlanta, GA 30322; and Department of Biomedical Engineering, Emory University and Georgia Institute of Technology, Atlanta, GA 30322 Edited by Bert Vogelstein, Johns Hopkins University, Baltimore, MD, and approved February 13, 2018 (received for review October 4, 2017) Cancer histology reflects underlying molecular processes and disease these structures that are believed to be predictive and to use progression and contains rich phenotypic information that is predictive these features to train models that predict patient outcomes. In of patient outcomes. In this study, we show a computational approach contrast, the feature learning paradigm of CNNs adaptively for learning patient outcomes from digital pathology images using learns to transform images into highly predictive features for a deep learning to combine the power of adaptive machine learning specific learning objective. The images and patient labels are algorithms with traditional survival models. We illustrate how these presented to a network composed of interconnected layers of survival convolutional neural networks (SCNNs) can integrate infor- convolutional filters that highlight important patterns in the mation from both histology images and genomic biomarkers into a images, and the filters and other parameters of this network are single unified framework to predict time-to-event outcomes and show mathematically adapted to minimize prediction error. Feature prediction accuracy that surpasses the current clinical paradigm for learning avoids biased a priori definition of features and does not predicting the overall survival of patients diagnosed with glioma. We require the use of segmentation algorithms that are often con- use statistical sampling techniques to address challenges in learning founded by artifacts and natural variations in image color and survival from histology images, including tumor heterogeneity and intensity. While feature learning has become the dominant the need for large training cohorts. We also provide insights into the paradigm in general image analysis tasks, medical applications prediction mechanisms of SCNNs, using heat map visualization to pose unique challenges. Large amounts of labeled data are show that SCNNs recognize important structures, like microvascu- needed to train CNNs, and medical applications often suffer lar proliferation, that are related to prognosis and that are used by from data deficits that limit performance. As “black box” mod- pathologists in grading. These results highlight the emerging role els, CNNs are also difficult to deconstruct, and therefore, their of deep learning in precision medicine and suggest an expanding prediction mechanisms are difficult to interpret. Despite these utility for computational analysis of histology in the future practice of pathology. Significance artificial intelligence machine learning digital pathology | | | deep learning cancer | Predicting the expected outcome of patients diagnosed with cancer is a critical step in treatment. Advances in genomic and imaging technologies provide physicians with vast amounts of istology has been an important tool in cancer diagnosis and data, yet prognostication remains largely subjective, leading to Hprognostication for more than a century. Anatomic pathol- suboptimal clinical management. We developed a computa- ogists evaluate histology for characteristics, like nuclear atypia, tional approach based on deep learning to predict the overall mitotic activity, cellular density, and tissue architecture, in- survival of patients diagnosed with brain tumors from micro- corporating cytologic details and higher-order patterns to classify scopic images of tissue biopsies and genomic biomarkers. This and grade lesions. Although prognostication increasingly relies method uses adaptive feedback to simultaneously learn the on genomic biomarkers that measure genetic alterations, gene visual patterns and molecular biomarkers associated with pa- expression, and epigenetic modifications, histology remains an tient outcomes. Our approach surpasses the prognostic accu- important tool in predicting the future course of a patient’s racy of human experts using the current clinical standard for disease. The phenotypic information present in histology reflects classifying brain tumors and presents an innovative approach the aggregate effect of molecular alterations on cancer cell be- for objective, accurate, and integrated prediction of havior and provides a convenient visual readout of disease ag- patient outcomes. gressiveness. However, human assessments of histology are highly subjective and are not repeatable; hence, computational Author contributions: P.M., S.Y., M.A., D.A.G., D.J.B., and L.A.D.C. designed research; P.M., analysis of histology imaging has received significant attention. S.Y., J.E.V.V., and L.A.D.C. performed research; P.M., J.S.B.-S., and L.A.D.C. analyzed data; Aided by advances in slide scanning microscopes and computing, and P.M., M.A., D.A.G., J.S.B.-S., J.E.V.V., D.J.B., and L.A.D.C. wrote the paper. a number of image analysis algorithms have been developed for Conflict of interest statement: L.A.D.C. leads a research project that is financially sup- grading (1–4), classification (5–10), and identification of lymph ported by Ventana Medical Systems, Inc. While this project is not directly related to the node metastases (11) in multiple cancer types. manuscript, it is in the general area of digital pathology. Deep convolutional neural networks (CNNs) have emerged as This article is a PNAS Direct Submission. an important image analysis tool and have shattered perfor- This open access article is distributed under Creative Commons Attribution-NonCommercial- mance benchmarks in many challenging applications (12). The NoDerivatives License 4.0 (CC BY-NC-ND). ability of CNNs to learn predictive features from raw image data Data deposition: Software and other resources related to this paper have been deposited at GitHub, https://github.com/CancerDataScience/SCNN. is a paradigm shift that presents exciting opportunities in medical imaging (13–15). Medical image analysis applications have To whom correspondence should be addressed. Email: Lee.Cooper@Emory.edu. heavily relied on feature engineering approaches, where algorithm This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1717139115/-/DCSupplemental. pipelines are used to explicitly delineate structures of interest using segmentation algorithms to measure predefined features of Published online March 12, 2018. E2970–E2979 | PNAS | vol. 115 | no. 13 www.pnas.org/cgi/doi/10.1073/pnas.1717139115 challenges, CNNs have been successfully used extensively for heat map visualization techniques applied to whole-slide images medical image analysis (9, 11, 16–26). to show how SCNNs learn to recognize important histologic Many important problems in the clinical management of structures that neuropathologists use in grading diffuse gliomas cancer involve time-to-event prediction, including accurate pre- and suggest relevance for patterns with prognostic significance diction of overall survival and time to progression. Despite that is not currently appreciated. We systematically validate our overwhelming success in other applications, deep learning has approaches by predicting overall survival in gliomas using data not been widely applied to these problems. Survival analysis has from The Cancer Genome Atlas (TCGA) Lower-Grade Glioma often been approached as a binary classification problem by (LGG) and Glioblastoma (GBM) projects. predicting dichotomized outcomes at a specific time point (e.g., Results 5-y survival) (27). The classification approach has important limitations, as subjects with incomplete follow-up cannot be used Learning Patient Outcomes with Deep Survival Convolutional Neural in training, and binary classifiers do not model the probability of Networks. The SCNN model architecture is depicted in Fig. 1 survival at other times. Time-to-event models, like Cox re- (Fig. S1 shows a detailed diagram). H&E-stained tissue sections are gression, can utilize all subjects in training and model their first digitized to whole-slide images. These images are reviewed survival probabilities for a range of times with a single model. using a web-based platform to identify regions of interest (ROIs) Neural network-based Cox regression approaches were explored that contain viable tumor with representative histologic character- in early machine learning work using datasets containing tens of istics and that are free of artifacts (Methods) (34, 35). High-power features, but subsequent analysis found no improvement over fields (HPFs) from these ROIs are then used to train a deep con- basic linear Cox regression (28). More advanced “deep” neural volutional network that is seamlessly integrated with a Cox pro- networks that are composed of many layers were recently portional hazards model to predict patient outcomes. The network adapted to optimize Cox proportional hazard likelihood and is composed of interconnected layers of image processing opera- were shown to have equal or superior performance in predicting tions and nonlinear functions that sequentially transform the HPF survival using genomic profiles containing hundreds to tens of image into highly predictive prognostic features. Convolutional thousands of features (29, 30) and using basic clinical profiles layers first extract visual features from the HPF at multiple scales containing 14 features (31). using convolutional kernels and pooling operations. These image- Learning survival from histology is considerably more difficult, derived features feed into fully connected layers that perform ad- and a similar approach that combined Cox regression with CNNs ditional transformations, and then, a final Cox model layer outputs to predict survival from lung cancer histology achieved only a prediction of patient risk. The interconnection weights and con- marginally better than random accuracy (0.629 c index) (32). volutional kernels are trained by comparing risk predicted by the Time-to-event prediction faces many of the same challenges as network with survival or other time-to-event outcomes using a other applications where CNNs are used to analyze histology. backpropagation technique to optimize the statistical likelihood of Compared with genomic or clinical datasets, where features have the network (Methods). intrinsic meaning, a “feature” in an image is a pixel with meaning To improve the performance of SCNN models, we developed that depends entirely on context. Convolution operations can a sampling and risk filtering technique to address intratumoral learn these contexts, but the resulting networks are complex, heterogeneity and the limited availability of training samples often containing more than 100 million free parameters, and (Fig. 2). In training, new HPFs are randomly sampled from each thus, large cohorts are needed for training. This problem is in- ROI at the start of each training iteration, providing the SCNN tensified in time-to-event prediction, as clinical follow-up is often model with a fresh look at each patient’s histology and capturing difficult to obtain for large cohorts. Data augmentation tech- heterogeneity within the ROI. Each HPF is processed using niques have been adopted to address this problem, where ran- standard data augmentation techniques that randomly trans- domized rotations and transformations of contrast and form the field to reinforce network robustness to tissue orien- brightness are used to synthesize additional training data (9, 11, tation and variations in staining (33). The SCNN is trained 14, 15, 17, 19, 25, 26, 33). Intratumoral heterogeneity also pre- using multiple transformed HPFs for each patient (one for each sents a significant challenge in time-to-event prediction, as a ROI) to further account for intratumoral heterogeneity across tissue biopsy often contains a range of histologic patterns that ROIs. For prospective prediction, we first sample multiple correspond to varying degrees of disease progression or aggres- HPFs within each ROI to generate a representative collection siveness. The method for integrating information from hetero- of fields for the patient. The median risk is calculated within geneous regions within a sample is an important consideration in each ROI, and then, these median risks are sorted and filtered predicting outcomes. Furthermore, risk is often reflected in to predict a robust patient-level risk that reflects the aggres- subtle changes in multiple histologic criteria that can require siveness of their disease while rejecting any outlying risk pre- years of specialized training for human pathologists to recognize dictions. These sampling and filtering procedures are described and interpret. Developing an algorithm that can learn the con- in detail in Methods. tinuum of risks associated with histology can be more challenging than for other learning tasks, like cell or region classification. Assessing the Prognostic Accuracy of SCNN. To assess the prognostic In this paper, we present an approach called survival con- accuracy of SCNN, we assembled whole-slide image tissue sections volutional neural networks (SCNNs), which provide highly ac- from formalin-fixed, paraffin-embedded specimens and clinical curate prediction of time-to-event outcomes from histology follow-up for 769 gliomas from the TCGA (Dataset S1). This images. Using diffuse gliomas as a driving application, we show dataset comprises lower-grade gliomas (WHO grades II and III) how the predictive accuracy of SCNNs is comparable with and glioblastomas (WHO grade IV), contains both astrocytomas manual histologic grading by neuropathologists. We further ex- and oligodendrogliomas, and has overall survivals ranging from tended this approach to integrate both histology images and less than 1 to 14 y or more. A summary of demographics, grades, genomic biomarkers into a unified prediction framework that survival, and molecular subtypes for this cohort is presented in surpasses the prognostic accuracy of the current WHO paradigm Table S1. The Digital Slide Archive was used to identify ROIs in based on genomic classification and histologic grading. Our 1,061 H&E-stained whole-slide images from these tumors. SCNN framework uses an image sampling and risk filtering The prognostic accuracy of SCNN models was assessed using technique that significantly improves prediction accuracy by Monte Carlo cross-validation. We randomly split our cohort into mitigating the effects of intratumoral heterogeneity and deficits paired training (80%) and testing (20%) sets to generate in the availability of labeled data for training. Finally, we use 15 training/testing set pairs. We trained an SCNN model using Mobadersany et al. PNAS | vol. 115 | no. 13 | E2971 BIOPHYSICS AND MEDICAL SCIENCES PNAS PLUS COMPUTATIONAL BIOLOGY decreasing risk ... ... ... ... ... ... ... ... ... ... Whole-slide imaging Region of interest selection A B Resection / biopsy Whole-slide image Web viewer Regions of interest Survival Convolutional Neural Network (SCNN) Prediction error (negative log-likelihood) Patient Convolutional layers Fully connected layers survival Cox model ... High power field (20X objective) Pooling Convolution Rectifier Input Fig. 1. The SCNN model. The SCNN combines deep learning CNNs with traditional survival models to learn survival-related patterns from histology images. (A) Large whole-slide images are generated by digitizing H&E-stained glass slides. (B) A web-based viewer is used to manually identify representative ROIs in the image. (C) HPFs are sampled from these regions and used to train a neural network to predict patient survival. The SCNN consists of (i) convolutional layers that learn visual patterns related to survival using convolution and pooling operations, (ii) fully connected layers that provide additional nonlinear transformations of extracted image features, and (iii) a Cox proportional hazards layer that models time-to-event data, like overall survival or time to progression. Predictions are compared with patient outcomes to adaptively train the network weights that interconnect the layers. each training set and then, evaluated the prognostic accuracy of predicted risk and overall survival, and a c index of 0.5 corresponds these models on the paired testing sets, generating a total of to random concordance. 15 accuracy measurements (Methods and Dataset S1). Accuracy was For comparison, we also assessed the prognostic accuracy of measured using Harrell’s c index, a nonparametric statistic that baseline linear Cox models generated using the genomic bio- measures concordance between predicted risks and actual sur- markers and manual histologic grades from the WHO classifi- vival (36). A c index of 1 indicates perfect concordance between cation of gliomas (Fig. 3A). The WHO assigns the diffuse gliomas SCNN model training 1. Sample HPFs for training patient 2. Model training 1077 days, deceased SurvivalCNN Backpropagation Outcome: Sampled HPFs 1077 days, Training slides and regions of interest Randomized deceased transformations SCNN prediction 1. Sample HPFs for test patient 2. Calculate median risks in each region 3. Calculate patient risk 1,1 1,2 2 R R Trained 1,3 SurvivalCNN R 3 Median 1,4 risk Predicted risks Outcome? Testing slides and regions Sorted Sampled fields (one region) median risks (one region) Fig. 2. SCNN uses image sampling and filtering to improve the robustness of training and prediction. (A) During training, a single 256 × 256-pixel HPF is sampled from each region, producing multiple HPFs per patient. Each HPF is subjected to a series of random transformations and is then used as an in- dependent sample to update the network weights. New HPFs are sampled at each training epoch (one training pass through all patients). (B) When predicting the outcome of a newly diagnosed patient, nine HPFs are sampled from each ROI, and a risk is predicted for each field. The median HPF risk is calculated in each region, these median risks are then sorted, and the second highest value is selected as the patient risk. This sampling and filtering framework was designed to deal with tissue heterogeneity by emulating manual histologic evaluation, where prognostication is typically based on the most malignant region observed within a heterogeneous sample. Predictions based on the highest risk and the second highest risk had equal performance on average in our ex- periments, but the maximum risk produced some outliers with poor prediction accuracy. E2972 | www.pnas.org/cgi/doi/10.1073/pnas.1717139115 Mobadersany et al. 80K+ pixels Orientation, Output contrast, brightness Genomic and histologic classification of gliomas Histologic characteristics 10μm 10μm IDH mutation 5μm (N) 1p19q co-deletion (N) (Y) Microvascular Mitoses proliferation Astrocytoma IDH wild-type IDH mutant oligodendro- 5μm 50μm astrocytoma astrocytoma glioma 5μm II II II III III III IV IV N/A Pleomorphism Necrosis Oligodendroglioma better outcome SCNN prediction accuracy SCNN predictions by molecular subtype, histologic grade p=0.307 p=6.56e-20 p=4.68e-2 0.9 IDH-wt p=2.61e-3 astrocytoma p=6.55e-4 IDH-mutant 0.8 astrocytoma Oligodendro 0.7 -glioma -1 0.6 -2 -3 0.5 II I IV I I II I II IV II I II Histologic grade Comparing histologic grade and SCNN-based risk categories Fig. 3. Prognostication criteria for diffuse gliomas. (A) Prognosis in the diffuse gliomas is determined by genomic classification and manual histologic grading. Diffuse gliomas are first classified into one of three molecular subtypes based on IDH1/IDH2 mutations and the codeletion of chromosomes 1p and 19q. Grade is then determined within each subtype using histologic characteristics. Subtypes with an astrocytic lineage are split by IDH mutation status, and the combination of 1p/19q codeletion and IDH mutation defines an oligodendroglioma. These lineages have histologic differences; however, histologic evaluation is not a reliable predictor of molecular subtype (37). Histologic criteria used for grading range from nuclear morphology to higher-level patterns, like necrosis or the presence of abnormal microvascular structures. (B) Comparison of the prognostic accuracy of SCNN models with that of baseline models based on molecular subtype or molecular subtype and histologic grade. Models were evaluated over 15 independent training/testing sets with randomized patient assignments and with/without training and testing sampling. (C) The risks predicted by the SCNN models correlate with both histologic grade and molecular subtype, decreasing with grade and generally trending with the clinical aggressiveness of genomic subtypes. (D) Kaplan–Meier plots comparing manual histologic grading and SCNN predictions. Risk categories (low, intermediate, high) were generated by thresholding SCNN risks. N/A, not applicable. to three genomic subtypes defined by mutations in the isocitrate chromosomes 1p and 19q. Within these molecular subtypes, gliomas dehydrogenase (IDH) genes (IDH1/IDH2) and codeletion of are further assigned a histologic grade based on criteria that vary Mobadersany et al. PNAS | vol. 115 | no. 13 | E2973 Concordance index Grade histology Subtype genomics Subtype histology + grade + genomics Histologic Molecular grade subtype histology SCNN SCNN w/o resampling SCNN risk (z-scored) better outcome BIOPHYSICS AND MEDICAL SCIENCES PNAS PLUS COMPUTATIONAL BIOLOGY depending on cell of origin (either astrocytic or oligodendroglial). power in IDH WT astrocytomas (log rank P = 1.23e-12 vs. P = These criteria include mitotic activity, nuclear atypia, the presence 7.56e-11, respectively). In IDH mutant astrocytomas, both SCNN of necrosis, and the characteristics of microvascular structures risk categories and manual histologic grades have difficulty (microvascular proliferation). Histologic grade remains a significant separating Kaplan–Meier curves for grades II and III, but both determinant in planning treatment for gliomas, with grades III clearly distinguish grade IV as being associated with worse out- and IV typically being treated aggressively with radiation and comes. Discrimination for oligodendroglioma survival is also concomitant chemotherapy. similar between SCNN risk categories and manual histologic SCNN models showed substantial prognostic power, achieving grades (log rank P = 9.73e-7 vs. P = 8.63e-4, respectively). a median c index of 0.754 (Fig. 3B). SCNN models also per- Improving Prognostic Accuracy by Integrating Genomic Biomarkers. formed comparably with manual histologic-grade baseline To integrate both histologic and genomic data into a single models (median c index 0.745, P = 0.307) and with molecular unified prediction framework, we developed a genomic survival subtype baseline models (median c index 0.746, P = 4.68e-2). convolutional neural network (GSCNN model). The GSCNN Baseline models representing WHO classification that in- learns from genomics and histology simultaneously by incorporating tegrate both molecular subtype and manual histologic grade genomic data into the fully connected layers of the SCNN (Fig. 4). performed slightly better than SCNN, with a median c index of Both data are presented to the network during training, enabling 0.774 (Wilcoxon signed rank P = 2.61e-3). genomic variables to influence the patterns learned by the SCNN by We also evaluated the impact of the sampling and ranking providing molecular subtype information. procedures shown in Fig. 2 in improving the performance of We repeated our experiments using GSCNN models with SCNN models. Repeating the SCNN experiments without these histology images, IDH mutation status, and 1p/19q codeletion as sampling techniques reduced the median c index of SCNN inputs and found that the median c index improved from 0.754 to models to 0.696, significantly worse than for models where 0.801. The addition of genomic variables improved performance sampling was used (P = 6.55e-4). by 5% on average, and GSCNN models significantly outperform SCNN Predictions Correlate with Molecular Subtypes and Manual the baseline WHO subtype-grade model trained on equivalent Histologic Grade. To further investigate the relationship between data (signed rank P = 1.06e-2). To assess the value of integrating SCNN predictions and the WHO paradigm, we visualized how genomic variables directly into the network during training, we risks predicted by SCNN are distributed across molecular sub- compared GSCNN with a more superficial integration approach, type and histologic grade (Fig. 3C). SCNN predictions were where an SCNN model was first trained using histology images, highly correlated with both molecular subtype and grade and and then, the risks from this model were combined with IDH and were consistent with expected patient outcomes. First, within 1p/19q variables in a simple three-variable Cox model (Fig. S2). each molecular subtype, the risks predicted by SCNN increase Processing genomic variables in the fully connected layers and with histologic grade. Second, predicted risks are consistent with including them in training provided a statistically significant the published expected overall survivals associated with molec- benefit; models trained using the superficial approach performed ular subtypes (37). IDH WT astrocytomas are, for the most part, worse than GSCNN models with median c index decreasing to highly aggressive, having a median survival of 18 mo, and the 0.785 (signed rank P = 4.68e-2). collective predicted risks for these patients are higher than for To evaluate the independent prognostic power of risks pre- patients from other subtypes. IDH mutant astrocytomas are an- dicted by SCNN and GSCNN, we performed a multivariable Cox other subtype with considerably better overall survival ranging regression analysis (Table 1). In a multivariable regression that from 3 to 8 y, and the predicted risks for patients in this subtype included SCNN risks, subtype, grade, age, and sex, SCNN risks are more moderate. Notably, SCNN risks for IDH mutant as- had a hazard ratio of 3.05 and were prognostic when correcting trocytomas are not well-separated for grades II and III, consis- for all other features, including manual grade and molecular tent with reports of histologic grade being an inadequate subtype (P = 2.71e-12). Molecular subtype was also significant in predictor of outcome in this subtype (38). Infiltrating gliomas the SCNN multivariable regression model, but histologic grade with the combination of IDH mutations and codeletion of was not. We also performed a multivariable regression with chromosomes 1p/19q are classified as oligodendrogliomas in the GSCNN risks and found GSCNN to be significant (P = 9.69e-12) current WHO schema, and these have the lowest overall pre- with a hazard ratio of 8.83. In the GSCNN multivariable re- dicted risks consistent with overall survivals of 10+ y (37, 39). gression model, molecular subtype was not significant, but his- Finally, we noted a significant difference in predicted risks when tologic grade was marginally significant. We also used Kaplan– comparing the IDH mutant and IDH WT grade III astrocytomas Meier analysis to compare risk categories generated from SCNN (rank sum P = 6.56e-20). These subtypes share an astrocytic and GSCNN (Fig. S3). Survival curves for SCNN and GSCNN lineage and are graded using identical histologic criteria. Al- were very similar when evaluated on the entire cohort. In con- though some histologic features are more prevalent in IDH- trast, their abilities to discriminate survival within molecular mutant astrocytomas, these features are not highly specific or subtypes were notably different. sensitive to IDH mutant tumors and cannot be used to reliably predict IDH mutation status (40). Risks predicted by SCNN are Visualizing Histologic Patterns Associated with Prognosis. Deep consistent with worse outcomes for IDH WT astrocytomas in this learning networks are often criticized for being black box ap- case (median survival 1.7 vs. 6.3 y in the IDH mutant counter- proaches that do not reveal insights into their prediction mech- parts), suggesting that SCNN models can detect histologic dif- anisms. To investigate the visual patterns that SCNN models ferences associated with IDH mutations in astrocytomas. associate with poor outcomes, we used heat map visualizations to We also performed a Kaplan–Meier analysis to compare display the risks predicted by our network in different regions of manual histologic grading with “digital grades” based on SCNN whole-slide images. Transparent heat map overlays are fre- risk predictions (Fig. 3D). Low-, intermediate-, and high-risk quently used for visualization in digital pathology, and in our categories were established by setting thresholds on SCNN pre- study, these overlays enable pathologists to correlate the pre- dictions to reflect the proportions of manual histologic grades in dictions of highly accurate survival models with the underlying each molecular subtype (Methods). We observed that, within histology over the expanse of a whole-slide image. Heat maps each subtype, the differences in survival captured by SCNN risk were generated using a trained SCNN model to predict the risk categories are highly similar to manual histologic grading. SCNN for each nonoverlapping HPF in a whole-slide image. The pre- risk categories and manual histologic grades have similar prognostic dicted risks were used to generate a color-coded transparent E2974 | www.pnas.org/cgi/doi/10.1073/pnas.1717139115 Mobadersany et al. ... ... ... ... ... ... Genomic survival convolutional network (GSCNN) GSCNN prediction accuracy AB Fully 0.9 p=4.68e-2 p=1.06e-2 connected Cox Convolutional layers layers model 10μm 0.8 0.7 Image data 0.6 IDH mutation 1p/19q 0.5 co-deletion Genomic data Fig. 4. GSCNN models integrate genomic and imaging data for improved performance. (A) A hybrid architecture was developed to combine histology image and genomic data to make integrated predictions of patient survival. These models incorporate genomic variables as inputs to their fully connected layers. Here, we show the incorporation of genomic variables for gliomas; however, any number of genomic or proteomic measurements can be similarly used. (B) The GSCNN models significantly outperform SCNN models as well as the WHO paradigm based on genomic subtype and histologic grading. overlay, where red and blue indicate higher and lower SCNN current clinical standard based on genomic classification and risk, respectively. histologic grading of gliomas. In contrast to a previous study that A selection of risk heat maps from three patients is presented achieved only marginally better than random prediction accu- in Fig. 5, with inlays showing how SCNNs associate risk with racy, our approach rivals or exceeds the accuracy of highly important pathologic phenomena. For TCGA-DB-5273 (WHO trained human experts in predicting survival. Our study provides grade III, IDH mutant astrocytoma), the SCNN heat map clearly insights into applications of deep learning in medicine and the and specifically highlights regions of early microvascular pro- integration of histology and genomic data and provides methods liferation, an advanced form of angiogenesis that is a hallmark of for dealing with intratumoral heterogeneity and training data malignant progression, as being associated with high risk. Risk in deficits when using deep learning algorithms to predict survival this heat map also increases with cellularity, heterogeneity in from histology images. Using visualization techniques to gain nuclear shape and size (pleomorphism), and the presence of insights into SCNN prediction mechanisms, we found that abnormal microvascular structures. Regions in TCGA-S9- SCNNs clearly recognize known and time-honored histologic predictors of poor prognosis and that SCNN predictions suggest A7J0 have varying extents of tumor infiltration ranging from normal brain to sparsely infiltrated adjacent normal regions prognostic relevance for histologic patterns with significance that exhibiting satellitosis (where neoplastic cells cluster around is not currently appreciated by neuropathologists. neurons) to moderately and highly infiltrated regions. This heat Our study investigated the ability to predict overall survival in map correctly associates the lowest risks with normal brain re- diffuse gliomas, a disease with wide variations in outcomes and gions and can distinguish normal brain from adjacent regions an ideal test case where histologic grading and genomic classi- that are sparsely infiltrated. Interestingly, higher risks are fications have independent prognostic power. Treatment plan- assigned to sparsely infiltrated regions (region 1, Top) than to ning for gliomas is dependent on many factors, including patient regions containing relatively more tumor infiltration (region 2, age and grade, but gliomas assigned to WHO grades III and IV Top). We observed a similar pattern in TCGA-TM-A84G, where are typically treated very aggressively with radiation and con- edematous regions (region 1, Bottom) adjacent to moderately comitant chemotherapy, whereas WHO grade II gliomas may be cellular tumor regions (region 1, Top) are also assigned higher treated with chemotherapy or even followed in some cases (41). risks. These latter examples provide risk features embedded Histologic diagnosis and grading of gliomas have been limited by within histologic sections that have been previously unrecognized considerable intra- and interobserver variability (42). While the and could inform and improve pathology practice. emergence of molecular subtyping has resolved uncertainty re- lated to lineage, criteria for grading need to be redefined in the Discussion context of molecular subtyping. For example, some morphologic We developed a deep learning approach for learning survival features used to assess grade (e.g., mitotic activity) are no longer directly from histological images and created a unified frame- prognostic in IDH mutant astrocytomas (38). The field of neuro- work for integrating histology and genomic biomarkers for pre- oncology is currently awaiting features that can better discriminate dicting time-to-event outcomes. We systematically evaluated the more aggressive gliomas from those that are more indolent. Im- prognostic accuracy of our approaches in the context of the proving the accuracy and objectivity of grading will directly impact Table 1. Hazard ratios for single- and multiple-variable Cox regression models Single variable Multivariable (SCNN) Multivariable (GSCNN) Variable c Index Hazard ratio 95% CI P value Hazard ratio 95% CI P value Hazard ratio 95% CI P value SCNN 0.741 7.15 5.64, 9.07 2.08e-61 3.05 2.22, 4.19 2.71e-12 —— — GSCNN 0.781 12.60 9.34, 17.0 3.08e-64 —— — 8.83 4.66, 16.74 9.69e-12 IDH WT astrocytoma 0.726 9.21 6.88, 12.34 3.48e-52 4.73 2.57, 8.70 3.49e-7 0.97 0.43, 2.17 0.93 IDH mutant astrocytoma — 0.23 0.170, 0.324 2.70e-19 2.35 1.27, 4.34 5.36e-3 1.67 0.90, 3.12 0.10 Histologic grade IV 0.721 7.25 5.58, 9.43 2.68e-51 1.52 0.839, 2.743 0.159 1.98 1.11, 3.51 0.017 Histologic grade III — 0.44 0.332, 0.591 1.66e-08 1.57 0.934, 2.638 0.0820 1.78 1.07, 2.97 0.024 Age 0.744 1.77 1.63, 1.93 2.52e-42 1.33 1.20, 1.47 9.57e-9 1.34 1.22, 1.48 9.30e-10 Sex, female 0.552 0.89 0.706, 1.112 0.29 0.85 0.67, 1.08 0.168 0.86 0.68, 1.08 0.18 Bold indicates statistical significance (P < 5e-2). Mobadersany et al. PNAS | vol. 115 | no. 13 | E2975 Concordance index Subtype + grade GSCNN SCNN + subtype SCNN BIOPHYSICS AND MEDICAL SCIENCES PNAS PLUS COMPUTATIONAL BIOLOGY TCGA-DB-5273 (IDH-mut astrocytoma) Early microvascular proliferation 50μm 50μm 2.5mm 100μm TCGA-S9-A7J0 (IDH-mut astrocytoma) Infiltration 50μm Normal cortex 200μm 2.5mm Adjacent tumor TCGA-TM-A84G (Oligodendroglioma) 50μm 3 Edema 200μm 2.5mm Fig. 5. Visualizing risk with whole-slide SCNN heat maps. We performed SCNN predictions exhaustively within whole-slide images to generate heat map overlays of the risks that SCNN associates with different histologic patterns. Red indicates relatively higher risk, and blue indicates lower risk (the scale for each slide is different). (Top) In TCGA-DB-5273, SCNN clearly and specifically predicts high risks for regions of early microvascular proliferation (region 1) and also, higher risks with increasing tumor infiltration and cell density (region 2 vs. 3). (Middle) In TCGA-S9-A7J0, SCNN can appropriately discriminate between normal cortex (region 1in Bottom) and adjacent regions infiltrated by tumor (region 1 in Top). Highly cellular regions containing prominent microvascular structures (region 3) are again assigned higher risks than lower-density regions of tumor (region 2). Interestingly, low-density infiltrate in the cortex was associated with high risk (region 1 in Top). (Bottom) In TCGA-TM-A84G, SCNN assigns high risks to edematous regions (region 1 in Bottom) that are adjacent to tumor (region 1 in Top). patient care by identifying patients who can benefit from more of the associations between SCNN risk predictions, molecular aggressive therapeutic regimens and by sparing those with less subtypes, and histologic grades revealed that SCNN can effec- aggressive disease from unnecessary treatment. tively discriminate outcomes within each molecular subtype, ef- Remarkably, SCNN performed as well as manual histologic fectively performing digital histologic grading. Furthermore, grading or molecular subtyping in predicting overall survival in SCNN can effectively recognize histologic differences associated our dataset, despite using only a very small portion of each his- with IDH mutations in astrocytomas and predict outcomes for tology image for training and prediction. Additional investigation these patients accordingly. SCNNs correctly predicted lower E2976 | www.pnas.org/cgi/doi/10.1073/pnas.1717139115 Mobadersany et al. risks for WHO grade III IDH mutant astrocytomas compared cell density and nuclear pleomorphism were also associated with with WHO grade III IDH WT astrocytomas, consistent with the increased risk in all examples. SCNN also assigned high risks to considerably longer median survival for patients with IDH mu- regions that do not contain well-recognized features associated tant astrocytoma (6.3 vs. 1.7 y). While there are histologic fea- with a higher grade or poor prognosis. In region 1 of slide tures of astrocytomas that are understood to be more prevalent TCGA-S9-A7J0, SCNN assigns higher risk to sparsely infiltrated in IDH mutant astrocytomas, including the presence of micro- cerebral cortex than to region 2, which is infiltrated by a higher cysts and the rounded nuclear morphology of neoplastic nuclei, density of tumor cells (normal cortex in region 1 is properly these are not reliable predictors of IDH mutations (40). assigned a very low risk). Widespread infiltration into distant To integrate genomic information in prognostication, we de- sites of the brain is a hallmark of gliomas and results in treatment veloped a hybrid network that can learn simultaneously from failure, since surgical resection of visible tumor often leaves re- both histology images and genomic biomarkers. The GSCNN sidual neoplastic infiltrates. Similarly, region 1 of slide TCGA- presented in our study significantly outperforms the WHO TM-A84G illustrates a high risk associated with low-cellularity standard based on identical inputs. We compared the perfor- edematous regions compared with adjacent oligodendroglioma mance of GSCNN and SCNN in several ways to evaluate their with much higher cellularity. Edema is frequently observed ability to predict survival and to assess the relative importance of within gliomas and in adjacent brain, and its degree may be re- histology and genomic data in GSCNN. GSCNN had signifi- lated to the rate of growth (43), but its histologic presence has cantly higher c index scores due to the inclusion of genomic not been previously recognized as a feature of aggressive be- variables in the training process. Performance significantly de- havior or incorporated into grading paradigms. While it is not clined when using a superficial integration method that combines entirely clear why SCNN assigns higher risks to the regions in the genomic biomarkers with a pretrained SCNN model. sparsely infiltrated or edematous regions, these examples con- In multivariable regression analyses, GSCNN has a much firm that SCNN risks are not purely a function of cellular density higher hazard ratio than SCNN (8.83 vs. 3.05). Examining the or nuclear atypia. Our human interpretations of these findings other variables in the regression models, we noticed an in- provide possible explanations for why SCNN unexpectedly pre- teresting relationship between the significance of histologic- dicts high risks in these regions, but these findings need addi- grade and molecular subtype variables. In the SCNN regression tional investigation to better understand what specific features analysis, histologic-grade variables were not significant, but the SCNN network perceives in these regions. Nevertheless, this molecular subtype variables were highly significant, indicating shows that SCNN can be used to identify potentially practice- that SCNN could capture histologic information from image data changing features associated with increased risk that are em- but could not learn molecular subtype information entirely from bedded within pathology images. histology. In contrast, molecular subtype information was not Although our study provides insights into the application of significant in the GSCNN regression analysis. Interestingly, deep learning in precision medicine, it has some important histologic-grade variables were marginally significant, suggesting limitations. A relatively small portion of each slide was used for that some prognostic value in the histology images remained training and prediction, and the selection of ROIs within each untapped by GSCNN. slide required expert guidance. Future studies will explore more Kaplan–Meier analysis showed remarkable similarity in the advanced methods for automatic selection of regions and for discriminative power of SCNN and GSCNN. Additional Kaplan– incorporating a higher proportion of each slide in training and Meier analysis of risk categories within molecular subtypes prediction to better account for intratumoral heterogeneity. We revealed interesting trends that are consistent with the regression also plan to pursue the development of enhanced GSCNN models analyses presented in Table 1. SCNN clearly separates outcomes that incorporate additional molecular features and to evaluate the within each molecular subtype based on histology. Survival value added of histology in these more complex models. In our curves for GSCNN risk categories, however, overlap significantly Kaplan–Meier analysis, the thresholds used to define risk cate- in each subtype. Since SCNN models do not have access to ge- gories were determined in a subjective manner using the pro- nomic data when making predictions, their ability to discriminate portion of manual histologic grades in the TCGA cohort, and a outcomes was worse in general when assessed by c index or larger dataset would permit a more rigorous definition of these multivariable regression. thresholds to optimize survival stratification. The interpretation of Integration of genomic and histology data into a single pre- risk heat maps was based on subjective evaluation by neuropa- diction framework remains a challenge in the clinical implementa- thologists, and we plan to pursue studies that evaluate heat maps tion of computational pathology. Our previous work in developing in a more objective manner to discover and validate histologic deep learning survival models from genomic data has shown that features associated with poor outcomes. Finally, while we have accurate survival predictions can be learned from high-dimensional applied our techniques to gliomas, validation of these approaches genomic and protein expression signatures (29). Incorporating ad- in other diseases is needed and could provide additional insights. ditional genomic variables into GSCNN models is an area for future In fact, our methods are not specific to histology imaging or cancer research and requires larger datasets that combine histology images applications and could be adapted to other medical imaging mo- with rich genomic and clinical annotations. dalities and biomedical applications. While deep learning methods frequently deliver outstanding performance, the interpretability of black box deep learning Methods models is limited and remains a significant barrier in their vali- Data and Image Curation. Whole-slide images and clinical and genomic data dation and adoption. Heat map analysis provides insights into were obtained from TCGA via the Genomic Data Commons (https://gdc. the histologic patterns associated with increased risk and can also cancer.gov/). Images of diagnostic H&E-stained, formalin-fixed, paraffin- embedded sections from the Brain LGG and the GBM cohorts were serve as a practical tool to guide pathologists to tissue regions reviewed to remove images containing tissue-processing artifacts, including associated with worse prognosis. The heat maps suggest that bubbles, section folds, pen markings, and poor staining. Representative ROIs SCNN can learn visual patterns known to be associated with containing primarily tumor nuclei were manually identified for each slide histologic features related to prognosis and used in grading, in- that passed a quality control review. This review identified whole-slide im- cluding microvascular proliferation, cell density, and nuclear ages with poor image quality arising from imaging artifacts or tissue pro- morphology. Microvascular prominence and proliferation are cessing (bubbles, significant tissue section folds, overstaining, understaining) associated with disease progression in all forms of diffuse glioma, where suitable ROIs could not be selected. In the case of grade IV disease, and these features are clearly delineated as high risk in the heat some regions include microvascular proliferation, as this feature was map presented for slide TCGA-DB-5273. Likewise, increases in exhibited throughout tumor regions. Regions containing geographic necrosis Mobadersany et al. PNAS | vol. 115 | no. 13 | E2977 BIOPHYSICS AND MEDICAL SCIENCES PNAS PLUS COMPUTATIONAL BIOLOGY were excluded. A total of 1,061 whole-slide images from 769 unique patients the robustness of the network to tissue orientation. Similar approaches for were analyzed. training data augmentation have shown considerable improvements in general imaging applications (33). ROI images (1,024 × 1,024 pixels) were cropped at 20× objective magni- fication using OpenSlide and color-normalized to a gold standard H&E cal- ibration image to improve consistency of color characteristics across slides. Testing Sampling, Risk Filtering, and Model Averaging. Sampling was also HPFs at 256 × 256 pixels were sampled from these regions and used for performed to increase the robustness and stability of predictions. (i) Nine training and testing as described below. HPFs are first sampled from each region j corresponding to patient m.(ii) j,k The risk of the kth HPF in region j for patient m, denoted R , is then cal- Network Architecture and Training Procedures. The SCNN combines elements culated using the trained SCNN model. (iii) The median risk of the 19-layer Visual Geometry Group (VGG) convolutional network archi- j j,k R = median fR g is calculated for region j using the aforementioned HPFs m k m tecture with a Cox proportional hazards model to predict time-to-event data to reject outlying risks. (iv) These median risks are then sorted from highest from images (Fig. S1) (44). Image feature extraction is achieved by four c 1 c 2 c 3 to lowest R > R > R .. ., where the superscript index now corresponds to m m m groups of convolutional layers. (i) The first group contains two convolutional the risk rank. (v) The risk prediction for patient m is then selected as the layers with 64 3 × 3 kernels interleaved with local normalization layers and * c 2 then followed with a single maximum pooling layer. (ii) The second group second highest risk R = R . This filtering procedure was designed to emu- contains two convolutional layers (128 3 × 3 kernels) interleaved with two late how a pathologist integrates information from multiple areas within a local normalization layers followed by a single maximum pooling layer. (iii) slide, determining prognosis based on the region associated with the worst The third group interleaves four convolutional layers (256 3 × 3 kernels) with prognosis. Selection of the second highest risk (as opposed to the highest four local normalization layers followed by a single maximum pooling layer. risk) introduces robustness to outliers or high risks that may occur due to (iv) The fourth group contains interleaves of eight convolutional (512 3 × some imaging or tissue-processing artifact. 3 kernels) and eight local normalization layers, with an intermediate pooling Since the accuracy of our models can vary significantly from one epoch to layer and a terminal maximum pooling layer. These four groups are fol- another, largely due to the training sampling and randomized minibatch lowed by a sequence of three fully connected layers containing 1,000, 1,000, assignments, a model-averaging technique was used to reduce prediction and 256 nodes, respectively. variance. To obtain final risk predictions for the testing patients that are The terminal fully connected layer outputs a prediction of risk R = β X stable, we perform model averaging using the models from epochs 96 to 256×1 100 to smooth variations across epochs and increase stability. Formally, the associated with the input image, where β ∈R are the terminal layer 256×1 model-averaged risk for patient m is calculated as weights and X ∈R are the inputs to this layer. To provide an error signal for backpropagation, these risks are input to a Cox proportional hazards layer to calculate the negative partial log likelihood: * * R = R , [2] mðγÞ γ=96 X X T β X Lðβ, XÞ =− β X − log e , [1] * where R denotes the predicted risk for patient m in training epoch γ. mðγÞ i∈U j∈Ω Validation Procedures. Patients were randomly assigned to nonoverlapping where β X is the risk associated with HPF i, U is the set of right-censored training (80%) and test (20%) sets that were used to train models and evaluate samples, and Ω is the set of “at-risk” samples with event or follow-up times their performance. If a patient was assigned to training, then all slides corre- Ω = fjjY ≥ Y g (where Y is the event or last follow-up time of patient i). i j i i sponding to that patient were assigned to the training set and likewise, for the The adagrad algorithm was used to minimize the negative partial log testing set. This ensures that no data from any one patient are represented in likelihood via backpropagation to optimize model weights, biases, and both training and testing sets to avoid overfitting and optimistic estimates of convolutional kernels (45). Parameters to adagrad include the initial accu- generalization accuracy. We repeated the randomized assignment of patients mulator value = 0.1, initial learning rate = 0.001, and an exponential training/testing sets 15 times and used each of these training/testing sets to learning rate decay factor = 0.1. Model weights were initialized using the train and evaluate a model. The same training/testing assignments were used in variance scaling method (46), and a weight decay was applied to the fully each model (SCNN, GSCNN, baseline) for comparability. Prediction accuracy was connected layers during training (decay rate = 4e-4). Models were trained for measured using Harrell’s c index to measure the concordance between pre- 100 epochs (1 epoch is one complete cycle through all training samples) using dicted risk and actual survival for testing samples (36). minibatches consisting of 14 HPFs each. Each minibatch produces a model update, resulting in multiple updates per epoch. Calculation of the Cox partial Statistical Analyses. The c indices generated by Monte Carlo cross-validation likelihood requires access to the predicted risks of all samples, which are not were performed using the Wilcoxon signed rank test. This paired test was available within any single minibatch, and therefore, Cox likelihood was cal- chosen, because each method was evaluated using identical training/testing culated locally within each minibatch to perform updates (U and Ω were re- sets. Comparisons of SCNN risk values across grade were performed using the stricted to samples within each minibatch). Local likelihood calculation can be Wilcoxon rank sum test. Cox univariable and multivariable regression analyses very sensitive to how samples are assigned to minibatches, and therefore, we were performed using predicted SCNN risk values for all training and testing randomize the minibatch sample assignments at the beginning of each epoch samples in randomized training/testing set 1. Analyses of the correlation of to improve robustness. Mild regularization was applied during training by grade, molecular subtype, and SCNN risk predictions were performed by randomly dropping out 5% of weights in the last fully connected layer in each pooling predicted risks for testing samples across all experiments. SCNN risks minibatch during training to mitigate overfitting. were normalized within each experiment by z score before pooling. Grade analysis was performed by determining “digital”-grade thresholds for SCNN Training Sampling. Each patient has possibly multiple slides and multiple risks in each subtype. Thresholds were objectively selected to match the pro- regions within each slide that can be used to sample HPFs. During training, a portions of samples in each histologic grade in each subtype. Statistical analysis single HPF was sampled from each region, and these HPFs were treated as of Kaplan–Meier plots was performed using the log rank test. semiindependent training samples. Each HPF was paired with patient out- come for training, duplicating outcomes for patients containing multiple Hardware and Software. Prediction models were trained using TensorFlow regions/HPFs. The HPFs are sampled at the beginning of each training epoch (v0.12.0) on servers equipped with dual Intel(R) Xeon(R) CPU E5-2630L v2 @ to generate an entirely new set of HPFs. Randomized transforms were also 2.40 GHz CPUs, 128 GB RAM, and dual NVIDIA K80 graphics cards. Image data applied to these HPFs to improve robustness to tissue orientation and color were extracted from Aperio .svs whole-slide image formats using OpenSlide variations. Since the visual patterns in tissues can often be anisotropic, we (openslide.org/). Basic image analysis operations were performed using randomly apply a mirror transform to each HPF. We also generate random HistomicsTK (https://github.com/DigitalSlideArchive/HistomicsTK), a Python transformations of contrast and brightness using the “random_contrast” and package for histology image analysis. “random_brightness” TensorFlow operations. The contrast factor was ran- domly selected in the interval [0.2, 1.8], and the brightness was randomly selected in the interval [−63, 63]. These sampling and transformation pro- Data Availability. This paper was produced using large volumes of publicly cedures along with the use of multiple HPFs for each patient have the effect available genomic and imaging data. The authors have made every effort to of augmenting the effective size of the labeled training data. In tissues with make available links to these resources as well as make publicly available the pronounced anisotropy, including adenocarcinomas that exhibit prominent software methods and information used to produce the datasets, analyses, glandular structures, these mirror transformations are intended to improve and summary information. E2978 | www.pnas.org/cgi/doi/10.1073/pnas.1717139115 Mobadersany et al. ACKNOWLEDGMENTS. This work was supported by US NIH National Li- tional Cancer Institute Grant U24CA194362 and by the National Brain brary of Medicine Career Development Award K22LM011576 and Na- Tumor Society. 1. Kong J, et al. (2008) Computer-assisted grading of neuroblastic differentiation. Arch 25. Turkki R, Linder N, Kovanen PE, Pellinen T, Lundin J (2016) Antibody-supervised deep Pathol Lab Med 132:903–904, author reply 904. learning for quantification of tumor-infiltrating immune cells in hematoxylin and 2. Niazi MKK, et al. (2017) Visually meaningful histopathological features for automatic eosin stained breast cancer samples. J Pathol Inform 7:38. grading of prostate cancer. IEEE J Biomed Health Inform 21:1027–1038. 26. Bychkov D, Turkki R, Haglund C, Linder N, Lundin J (2016) Deep learning for tissue 3. Naik S, et al. (2008) Automated gland and nuclei segmentation for grading of pros- microarray image-based outcome prediction in patients with colorectal cancer. SPIE Medical Imaging, eds Gurcan MN, Madabhushi A (International Society for Optics and tate and breast cancer histopathology. Proceedings of the 2008 5th IEEE International Photonics, Bellingham, WA), p 6. Symposium on Biomedical Imaging: From Nano to Macro (IEEE, Piscataway, NJ), pp 27. Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis DI (2014) Machine 284–287. learning applications in cancer prognosis and prediction. Comput Struct Biotechnol J 4. Ren J, et al. (2015) Computer aided analysis of prostate histopathology images 13:8–17. Gleason grading especially for Gleason score 7. Conf Proc IEEE Eng Med Biol Soc 2015: 28. Xiang A, Lapuerta P, Ryutov A, Buckley J, Azen S (2000) Comparison of the perfor- 3013–3016. mance of neural network methods and Cox regression for censored survival data. 5. Kothari S, Phan JH, Young AN, Wang MD (2013) Histological image classification Comput Stat Data Anal 34:243–257. using biologically interpretable shape-based features. BMC Med Imaging 13:9. 29. Yousefi S, et al. (2017) Predicting clinical outcomes from large scale cancer genomic 6. Sertel O, et al. (2009) Computer-aided prognosis of neuroblastoma on whole-slide profiles with deep survival models. Sci Rep 7:11707. images: Classification of stromal development. Pattern Recognit 42:1093–1103. 30. Yousefi S, Congzheng S, Nelson N, Cooper LAD (2016) Learning genomic represen- 7. Fauzi MF, et al. (2015) Classification of follicular lymphoma: the effect of computer tations to predict clinical outcomes in cancer. arXiv:1609.08663. aid on pathologists grading. BMC Med Inform Decis Mak 15:115. 31. Katzman J, et al. (2016) DeepSurv: Personalized treatment recommender system us- 8. Dundar MM, et al. (2011) Computerized classification of intraductal breast lesions ing A Cox proportional hazards deep neural network. arXiv:1606.00931. using histopathological images. IEEE Trans Biomed Eng 58:1977–1984. 32. Zhu X, Yao J, Huang J (2016) Deep convolutional neural network for survival analysis 9. Hou L, et al. (2016) Patch-based convolutional neural network for whole slide tissue with pathological images. Proceedings of the 2016 IEEE International Conference on image classification. Proceedings of the 2016 IEEE Conference on Computer Vision Bioinformatics and Biomedicine (IEEE, Piscataway, NJ), pp 544–547. and Pattern Recognition (CVPR) (IEEE, Piscataway, NJ), pp 2424–2433. 33. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep con- 10. Kong J, et al. (2013) Machine-based morphologic analysis of glioblastoma using volutional neural networks. Advances in Neural Information Processing Systems, eds whole-slide pathology images uncovers clinically relevant molecular correlates. PLoS Pereira F, Burges CJC, Bottou L, Weinberger KQ (Neural Information Processing Sys- One 8:e81049. tems Foundation, Inc., La Jolla, CA), pp 1097–1105. 11. Wang D, Khosla A, Gargeya R, Irshad H, Beck AH (2016) Deep learning for identifying 34. Gutman DA, et al. (2013) Cancer Digital Slide Archive: an informatics resource to metastatic breast cancer. arXiv:1606.05718. support integrated in silico analysis of TCGA pathology data. J Am Med Inform Assoc 12. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444. 20:1091–1098. 13. Greenspan H, van Ginneken B, Summers RM (2016) Guest editorial deep learning in 35. Gutman DA, et al. (2017) The digital slide archive: A software platform for man- medical imaging: Overview and future promise of an exciting new technique. IEEE agement, integration, and analysis of histology for cancer research. Cancer Res 77: Trans Med Imaging 35:1153–1159. e75–e78. 14. Janowczyk A, Madabhushi A (2016) Deep learning for digital pathology image 36. Harrell FE, Jr, Califf RM, Pryor DB, Lee KL, Rosati RA (1982) Evaluating the yield of analysis: A comprehensive tutorial with selected use cases. J Pathol Inform 7:29. medical tests. JAMA 247:2543–2546. 15. Litjens G, et al. (2016) Deep learning as a tool for increased accuracy and efficiency of 37. Brat DJ, et al.; Cancer Genome Atlas Research Network (2015) Comprehensive, in- histopathological diagnosis. Sci Rep 6:26286. tegrative genomic analysis of diffuse lower-grade gliomas. N Engl J Med 372: 16. Chen T, Chefd’hotel C (2014) Deep learning based automatic immune cell detection 2481–2498. for immunohistochemistry images. Machine Learning in Medical Imaging (Springer, 38. Reuss DE, et al. (2015) IDH mutant diffuse and anaplastic astrocytomas have similar Berlin), pp 17–24. age at presentation and little difference in survival: a grading problem for WHO. Acta 17. Cruz-Roa A, et al. (2017) Accurate and reproducible invasive breast cancer detection Neuropathol 129:867–873. in whole-slide images: A Deep Learning approach for quantifying tumor extent. Sci 39. Leeper HE, et al. (2015) IDH mutation, 1p19q codeletion and ATRX loss in WHO grade Rep 7:46450. II gliomas. Oncotarget 6:30295–30305. 18. Pereira S, Pinto A, Alves V, Silva CA (2016) Brain tumor segmentation using con- 40. Nguyen DN, et al. (2013) Molecular and morphologic correlates of the alternative volutional neural networks in MRI images. IEEE Trans Med Imaging 35:1240–1251. lengthening of telomeres phenotype in high-grade astrocytomas. Brain Pathol 23: 19. Sirinukunwattana K, et al. (2016) Locality sensitive deep learning for detection and 237–243. classification of nuclei in routine colon cancer histology images. IEEE Trans Med 41. Wijnenga MMJ, et al. (2018) The impact of surgery in molecularly defined low-grade Imaging 35:1196–1206. glioma: an integrated clinical, radiological, and molecular analysis. Neuro-oncol 20: 20. Esteva A, et al. (2017) Dermatologist-level classification of skin cancer with deep 103–112. neural networks. Nature 542:115–118. 42. van den Bent MJ (2010) Interobserver variation of the histopathological diagnosis in 21. Gulshan V, et al. (2016) Development and validation of a deep learning algorithm for clinical trials on glioma: a clinician’s perspective. Acta Neuropathol 120:297–304. detection of diabetic retinopathy in retinal fundus photographs. JAMA 316: 43. Pope WB, et al. (2005) MR imaging correlates of survival in patients with high-grade 2402–2410. gliomas. AJNR Am J Neuroradiol 26:2466–2474. 22. Havaei M, et al. (2017) Brain tumor segmentation with deep neural networks. Med 44. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale Image Anal 35:18–31. image recognition. arXiv:1409.1556. 23. Huynh BQ, Li H, Giger ML (2016) Digital mammographic tumor classification using 45. Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning transfer learning from deep convolutional neural networks. J Med Imaging and stochastic optimization. J Mach Learn Res 12:2121–2159. (Bellingham) 3:034501. 46. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human- 24. Kamnitsas K, et al. (2017) Efficient multi-scale 3D CNN with fully connected CRF for level performance on imagenet classification. IEEE International Conference on accurate brain lesion segmentation. Med Image Anal 36:61–78. Computer Vision (IEEE, Piscataway, NJ), pp 1026–1034. Mobadersany et al. PNAS | vol. 115 | no. 13 | E2979 BIOPHYSICS AND MEDICAL SCIENCES PNAS PLUS COMPUTATIONAL BIOLOGY http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Proceedings of the National Academy of Sciences of the United States of America Pubmed Central

Predicting cancer outcomes from histology and genomics using convolutional networks

Proceedings of the National Academy of Sciences of the United States of America , Volume 115 (13) – Mar 12, 2018

Loading next page...
 
/lp/pubmed-central/predicting-cancer-outcomes-from-histology-and-genomics-using-gygpkj7s8L

References (54)

Publisher
Pubmed Central
Copyright
Copyright © 2018 the Author(s). Published by PNAS.
ISSN
0027-8424
eISSN
1091-6490
DOI
10.1073/pnas.1717139115
Publisher site
See Article on Publisher Site

Abstract

Predicting cancer outcomes from histology and genomics using convolutional networks a a a b c Pooya Mobadersany , Safoora Yousefi , Mohamed Amgad , David A. Gutman , Jill S. Barnholtz-Sloan , d e a,f,g,1 José E. Velázquez Vega , Daniel J. Brat , and Lee A. D. Cooper a b Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, GA 30322; Department of Neurology, Emory University School of c d Medicine, Atlanta, GA 30322; Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH 44106; Department of Pathology and Laboratory Medicine, Emory University School of Medicine, Atlanta, GA 30322; Department of Pathology, Northwestern University f g Feinberg School of Medicine, Chicago, IL 60611; Winship Cancer Institute, Emory University, Atlanta, GA 30322; and Department of Biomedical Engineering, Emory University and Georgia Institute of Technology, Atlanta, GA 30322 Edited by Bert Vogelstein, Johns Hopkins University, Baltimore, MD, and approved February 13, 2018 (received for review October 4, 2017) Cancer histology reflects underlying molecular processes and disease these structures that are believed to be predictive and to use progression and contains rich phenotypic information that is predictive these features to train models that predict patient outcomes. In of patient outcomes. In this study, we show a computational approach contrast, the feature learning paradigm of CNNs adaptively for learning patient outcomes from digital pathology images using learns to transform images into highly predictive features for a deep learning to combine the power of adaptive machine learning specific learning objective. The images and patient labels are algorithms with traditional survival models. We illustrate how these presented to a network composed of interconnected layers of survival convolutional neural networks (SCNNs) can integrate infor- convolutional filters that highlight important patterns in the mation from both histology images and genomic biomarkers into a images, and the filters and other parameters of this network are single unified framework to predict time-to-event outcomes and show mathematically adapted to minimize prediction error. Feature prediction accuracy that surpasses the current clinical paradigm for learning avoids biased a priori definition of features and does not predicting the overall survival of patients diagnosed with glioma. We require the use of segmentation algorithms that are often con- use statistical sampling techniques to address challenges in learning founded by artifacts and natural variations in image color and survival from histology images, including tumor heterogeneity and intensity. While feature learning has become the dominant the need for large training cohorts. We also provide insights into the paradigm in general image analysis tasks, medical applications prediction mechanisms of SCNNs, using heat map visualization to pose unique challenges. Large amounts of labeled data are show that SCNNs recognize important structures, like microvascu- needed to train CNNs, and medical applications often suffer lar proliferation, that are related to prognosis and that are used by from data deficits that limit performance. As “black box” mod- pathologists in grading. These results highlight the emerging role els, CNNs are also difficult to deconstruct, and therefore, their of deep learning in precision medicine and suggest an expanding prediction mechanisms are difficult to interpret. Despite these utility for computational analysis of histology in the future practice of pathology. Significance artificial intelligence machine learning digital pathology | | | deep learning cancer | Predicting the expected outcome of patients diagnosed with cancer is a critical step in treatment. Advances in genomic and imaging technologies provide physicians with vast amounts of istology has been an important tool in cancer diagnosis and data, yet prognostication remains largely subjective, leading to Hprognostication for more than a century. Anatomic pathol- suboptimal clinical management. We developed a computa- ogists evaluate histology for characteristics, like nuclear atypia, tional approach based on deep learning to predict the overall mitotic activity, cellular density, and tissue architecture, in- survival of patients diagnosed with brain tumors from micro- corporating cytologic details and higher-order patterns to classify scopic images of tissue biopsies and genomic biomarkers. This and grade lesions. Although prognostication increasingly relies method uses adaptive feedback to simultaneously learn the on genomic biomarkers that measure genetic alterations, gene visual patterns and molecular biomarkers associated with pa- expression, and epigenetic modifications, histology remains an tient outcomes. Our approach surpasses the prognostic accu- important tool in predicting the future course of a patient’s racy of human experts using the current clinical standard for disease. The phenotypic information present in histology reflects classifying brain tumors and presents an innovative approach the aggregate effect of molecular alterations on cancer cell be- for objective, accurate, and integrated prediction of havior and provides a convenient visual readout of disease ag- patient outcomes. gressiveness. However, human assessments of histology are highly subjective and are not repeatable; hence, computational Author contributions: P.M., S.Y., M.A., D.A.G., D.J.B., and L.A.D.C. designed research; P.M., analysis of histology imaging has received significant attention. S.Y., J.E.V.V., and L.A.D.C. performed research; P.M., J.S.B.-S., and L.A.D.C. analyzed data; Aided by advances in slide scanning microscopes and computing, and P.M., M.A., D.A.G., J.S.B.-S., J.E.V.V., D.J.B., and L.A.D.C. wrote the paper. a number of image analysis algorithms have been developed for Conflict of interest statement: L.A.D.C. leads a research project that is financially sup- grading (1–4), classification (5–10), and identification of lymph ported by Ventana Medical Systems, Inc. While this project is not directly related to the node metastases (11) in multiple cancer types. manuscript, it is in the general area of digital pathology. Deep convolutional neural networks (CNNs) have emerged as This article is a PNAS Direct Submission. an important image analysis tool and have shattered perfor- This open access article is distributed under Creative Commons Attribution-NonCommercial- mance benchmarks in many challenging applications (12). The NoDerivatives License 4.0 (CC BY-NC-ND). ability of CNNs to learn predictive features from raw image data Data deposition: Software and other resources related to this paper have been deposited at GitHub, https://github.com/CancerDataScience/SCNN. is a paradigm shift that presents exciting opportunities in medical imaging (13–15). Medical image analysis applications have To whom correspondence should be addressed. Email: Lee.Cooper@Emory.edu. heavily relied on feature engineering approaches, where algorithm This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1717139115/-/DCSupplemental. pipelines are used to explicitly delineate structures of interest using segmentation algorithms to measure predefined features of Published online March 12, 2018. E2970–E2979 | PNAS | vol. 115 | no. 13 www.pnas.org/cgi/doi/10.1073/pnas.1717139115 challenges, CNNs have been successfully used extensively for heat map visualization techniques applied to whole-slide images medical image analysis (9, 11, 16–26). to show how SCNNs learn to recognize important histologic Many important problems in the clinical management of structures that neuropathologists use in grading diffuse gliomas cancer involve time-to-event prediction, including accurate pre- and suggest relevance for patterns with prognostic significance diction of overall survival and time to progression. Despite that is not currently appreciated. We systematically validate our overwhelming success in other applications, deep learning has approaches by predicting overall survival in gliomas using data not been widely applied to these problems. Survival analysis has from The Cancer Genome Atlas (TCGA) Lower-Grade Glioma often been approached as a binary classification problem by (LGG) and Glioblastoma (GBM) projects. predicting dichotomized outcomes at a specific time point (e.g., Results 5-y survival) (27). The classification approach has important limitations, as subjects with incomplete follow-up cannot be used Learning Patient Outcomes with Deep Survival Convolutional Neural in training, and binary classifiers do not model the probability of Networks. The SCNN model architecture is depicted in Fig. 1 survival at other times. Time-to-event models, like Cox re- (Fig. S1 shows a detailed diagram). H&E-stained tissue sections are gression, can utilize all subjects in training and model their first digitized to whole-slide images. These images are reviewed survival probabilities for a range of times with a single model. using a web-based platform to identify regions of interest (ROIs) Neural network-based Cox regression approaches were explored that contain viable tumor with representative histologic character- in early machine learning work using datasets containing tens of istics and that are free of artifacts (Methods) (34, 35). High-power features, but subsequent analysis found no improvement over fields (HPFs) from these ROIs are then used to train a deep con- basic linear Cox regression (28). More advanced “deep” neural volutional network that is seamlessly integrated with a Cox pro- networks that are composed of many layers were recently portional hazards model to predict patient outcomes. The network adapted to optimize Cox proportional hazard likelihood and is composed of interconnected layers of image processing opera- were shown to have equal or superior performance in predicting tions and nonlinear functions that sequentially transform the HPF survival using genomic profiles containing hundreds to tens of image into highly predictive prognostic features. Convolutional thousands of features (29, 30) and using basic clinical profiles layers first extract visual features from the HPF at multiple scales containing 14 features (31). using convolutional kernels and pooling operations. These image- Learning survival from histology is considerably more difficult, derived features feed into fully connected layers that perform ad- and a similar approach that combined Cox regression with CNNs ditional transformations, and then, a final Cox model layer outputs to predict survival from lung cancer histology achieved only a prediction of patient risk. The interconnection weights and con- marginally better than random accuracy (0.629 c index) (32). volutional kernels are trained by comparing risk predicted by the Time-to-event prediction faces many of the same challenges as network with survival or other time-to-event outcomes using a other applications where CNNs are used to analyze histology. backpropagation technique to optimize the statistical likelihood of Compared with genomic or clinical datasets, where features have the network (Methods). intrinsic meaning, a “feature” in an image is a pixel with meaning To improve the performance of SCNN models, we developed that depends entirely on context. Convolution operations can a sampling and risk filtering technique to address intratumoral learn these contexts, but the resulting networks are complex, heterogeneity and the limited availability of training samples often containing more than 100 million free parameters, and (Fig. 2). In training, new HPFs are randomly sampled from each thus, large cohorts are needed for training. This problem is in- ROI at the start of each training iteration, providing the SCNN tensified in time-to-event prediction, as clinical follow-up is often model with a fresh look at each patient’s histology and capturing difficult to obtain for large cohorts. Data augmentation tech- heterogeneity within the ROI. Each HPF is processed using niques have been adopted to address this problem, where ran- standard data augmentation techniques that randomly trans- domized rotations and transformations of contrast and form the field to reinforce network robustness to tissue orien- brightness are used to synthesize additional training data (9, 11, tation and variations in staining (33). The SCNN is trained 14, 15, 17, 19, 25, 26, 33). Intratumoral heterogeneity also pre- using multiple transformed HPFs for each patient (one for each sents a significant challenge in time-to-event prediction, as a ROI) to further account for intratumoral heterogeneity across tissue biopsy often contains a range of histologic patterns that ROIs. For prospective prediction, we first sample multiple correspond to varying degrees of disease progression or aggres- HPFs within each ROI to generate a representative collection siveness. The method for integrating information from hetero- of fields for the patient. The median risk is calculated within geneous regions within a sample is an important consideration in each ROI, and then, these median risks are sorted and filtered predicting outcomes. Furthermore, risk is often reflected in to predict a robust patient-level risk that reflects the aggres- subtle changes in multiple histologic criteria that can require siveness of their disease while rejecting any outlying risk pre- years of specialized training for human pathologists to recognize dictions. These sampling and filtering procedures are described and interpret. Developing an algorithm that can learn the con- in detail in Methods. tinuum of risks associated with histology can be more challenging than for other learning tasks, like cell or region classification. Assessing the Prognostic Accuracy of SCNN. To assess the prognostic In this paper, we present an approach called survival con- accuracy of SCNN, we assembled whole-slide image tissue sections volutional neural networks (SCNNs), which provide highly ac- from formalin-fixed, paraffin-embedded specimens and clinical curate prediction of time-to-event outcomes from histology follow-up for 769 gliomas from the TCGA (Dataset S1). This images. Using diffuse gliomas as a driving application, we show dataset comprises lower-grade gliomas (WHO grades II and III) how the predictive accuracy of SCNNs is comparable with and glioblastomas (WHO grade IV), contains both astrocytomas manual histologic grading by neuropathologists. We further ex- and oligodendrogliomas, and has overall survivals ranging from tended this approach to integrate both histology images and less than 1 to 14 y or more. A summary of demographics, grades, genomic biomarkers into a unified prediction framework that survival, and molecular subtypes for this cohort is presented in surpasses the prognostic accuracy of the current WHO paradigm Table S1. The Digital Slide Archive was used to identify ROIs in based on genomic classification and histologic grading. Our 1,061 H&E-stained whole-slide images from these tumors. SCNN framework uses an image sampling and risk filtering The prognostic accuracy of SCNN models was assessed using technique that significantly improves prediction accuracy by Monte Carlo cross-validation. We randomly split our cohort into mitigating the effects of intratumoral heterogeneity and deficits paired training (80%) and testing (20%) sets to generate in the availability of labeled data for training. Finally, we use 15 training/testing set pairs. We trained an SCNN model using Mobadersany et al. PNAS | vol. 115 | no. 13 | E2971 BIOPHYSICS AND MEDICAL SCIENCES PNAS PLUS COMPUTATIONAL BIOLOGY decreasing risk ... ... ... ... ... ... ... ... ... ... Whole-slide imaging Region of interest selection A B Resection / biopsy Whole-slide image Web viewer Regions of interest Survival Convolutional Neural Network (SCNN) Prediction error (negative log-likelihood) Patient Convolutional layers Fully connected layers survival Cox model ... High power field (20X objective) Pooling Convolution Rectifier Input Fig. 1. The SCNN model. The SCNN combines deep learning CNNs with traditional survival models to learn survival-related patterns from histology images. (A) Large whole-slide images are generated by digitizing H&E-stained glass slides. (B) A web-based viewer is used to manually identify representative ROIs in the image. (C) HPFs are sampled from these regions and used to train a neural network to predict patient survival. The SCNN consists of (i) convolutional layers that learn visual patterns related to survival using convolution and pooling operations, (ii) fully connected layers that provide additional nonlinear transformations of extracted image features, and (iii) a Cox proportional hazards layer that models time-to-event data, like overall survival or time to progression. Predictions are compared with patient outcomes to adaptively train the network weights that interconnect the layers. each training set and then, evaluated the prognostic accuracy of predicted risk and overall survival, and a c index of 0.5 corresponds these models on the paired testing sets, generating a total of to random concordance. 15 accuracy measurements (Methods and Dataset S1). Accuracy was For comparison, we also assessed the prognostic accuracy of measured using Harrell’s c index, a nonparametric statistic that baseline linear Cox models generated using the genomic bio- measures concordance between predicted risks and actual sur- markers and manual histologic grades from the WHO classifi- vival (36). A c index of 1 indicates perfect concordance between cation of gliomas (Fig. 3A). The WHO assigns the diffuse gliomas SCNN model training 1. Sample HPFs for training patient 2. Model training 1077 days, deceased SurvivalCNN Backpropagation Outcome: Sampled HPFs 1077 days, Training slides and regions of interest Randomized deceased transformations SCNN prediction 1. Sample HPFs for test patient 2. Calculate median risks in each region 3. Calculate patient risk 1,1 1,2 2 R R Trained 1,3 SurvivalCNN R 3 Median 1,4 risk Predicted risks Outcome? Testing slides and regions Sorted Sampled fields (one region) median risks (one region) Fig. 2. SCNN uses image sampling and filtering to improve the robustness of training and prediction. (A) During training, a single 256 × 256-pixel HPF is sampled from each region, producing multiple HPFs per patient. Each HPF is subjected to a series of random transformations and is then used as an in- dependent sample to update the network weights. New HPFs are sampled at each training epoch (one training pass through all patients). (B) When predicting the outcome of a newly diagnosed patient, nine HPFs are sampled from each ROI, and a risk is predicted for each field. The median HPF risk is calculated in each region, these median risks are then sorted, and the second highest value is selected as the patient risk. This sampling and filtering framework was designed to deal with tissue heterogeneity by emulating manual histologic evaluation, where prognostication is typically based on the most malignant region observed within a heterogeneous sample. Predictions based on the highest risk and the second highest risk had equal performance on average in our ex- periments, but the maximum risk produced some outliers with poor prediction accuracy. E2972 | www.pnas.org/cgi/doi/10.1073/pnas.1717139115 Mobadersany et al. 80K+ pixels Orientation, Output contrast, brightness Genomic and histologic classification of gliomas Histologic characteristics 10μm 10μm IDH mutation 5μm (N) 1p19q co-deletion (N) (Y) Microvascular Mitoses proliferation Astrocytoma IDH wild-type IDH mutant oligodendro- 5μm 50μm astrocytoma astrocytoma glioma 5μm II II II III III III IV IV N/A Pleomorphism Necrosis Oligodendroglioma better outcome SCNN prediction accuracy SCNN predictions by molecular subtype, histologic grade p=0.307 p=6.56e-20 p=4.68e-2 0.9 IDH-wt p=2.61e-3 astrocytoma p=6.55e-4 IDH-mutant 0.8 astrocytoma Oligodendro 0.7 -glioma -1 0.6 -2 -3 0.5 II I IV I I II I II IV II I II Histologic grade Comparing histologic grade and SCNN-based risk categories Fig. 3. Prognostication criteria for diffuse gliomas. (A) Prognosis in the diffuse gliomas is determined by genomic classification and manual histologic grading. Diffuse gliomas are first classified into one of three molecular subtypes based on IDH1/IDH2 mutations and the codeletion of chromosomes 1p and 19q. Grade is then determined within each subtype using histologic characteristics. Subtypes with an astrocytic lineage are split by IDH mutation status, and the combination of 1p/19q codeletion and IDH mutation defines an oligodendroglioma. These lineages have histologic differences; however, histologic evaluation is not a reliable predictor of molecular subtype (37). Histologic criteria used for grading range from nuclear morphology to higher-level patterns, like necrosis or the presence of abnormal microvascular structures. (B) Comparison of the prognostic accuracy of SCNN models with that of baseline models based on molecular subtype or molecular subtype and histologic grade. Models were evaluated over 15 independent training/testing sets with randomized patient assignments and with/without training and testing sampling. (C) The risks predicted by the SCNN models correlate with both histologic grade and molecular subtype, decreasing with grade and generally trending with the clinical aggressiveness of genomic subtypes. (D) Kaplan–Meier plots comparing manual histologic grading and SCNN predictions. Risk categories (low, intermediate, high) were generated by thresholding SCNN risks. N/A, not applicable. to three genomic subtypes defined by mutations in the isocitrate chromosomes 1p and 19q. Within these molecular subtypes, gliomas dehydrogenase (IDH) genes (IDH1/IDH2) and codeletion of are further assigned a histologic grade based on criteria that vary Mobadersany et al. PNAS | vol. 115 | no. 13 | E2973 Concordance index Grade histology Subtype genomics Subtype histology + grade + genomics Histologic Molecular grade subtype histology SCNN SCNN w/o resampling SCNN risk (z-scored) better outcome BIOPHYSICS AND MEDICAL SCIENCES PNAS PLUS COMPUTATIONAL BIOLOGY depending on cell of origin (either astrocytic or oligodendroglial). power in IDH WT astrocytomas (log rank P = 1.23e-12 vs. P = These criteria include mitotic activity, nuclear atypia, the presence 7.56e-11, respectively). In IDH mutant astrocytomas, both SCNN of necrosis, and the characteristics of microvascular structures risk categories and manual histologic grades have difficulty (microvascular proliferation). Histologic grade remains a significant separating Kaplan–Meier curves for grades II and III, but both determinant in planning treatment for gliomas, with grades III clearly distinguish grade IV as being associated with worse out- and IV typically being treated aggressively with radiation and comes. Discrimination for oligodendroglioma survival is also concomitant chemotherapy. similar between SCNN risk categories and manual histologic SCNN models showed substantial prognostic power, achieving grades (log rank P = 9.73e-7 vs. P = 8.63e-4, respectively). a median c index of 0.754 (Fig. 3B). SCNN models also per- Improving Prognostic Accuracy by Integrating Genomic Biomarkers. formed comparably with manual histologic-grade baseline To integrate both histologic and genomic data into a single models (median c index 0.745, P = 0.307) and with molecular unified prediction framework, we developed a genomic survival subtype baseline models (median c index 0.746, P = 4.68e-2). convolutional neural network (GSCNN model). The GSCNN Baseline models representing WHO classification that in- learns from genomics and histology simultaneously by incorporating tegrate both molecular subtype and manual histologic grade genomic data into the fully connected layers of the SCNN (Fig. 4). performed slightly better than SCNN, with a median c index of Both data are presented to the network during training, enabling 0.774 (Wilcoxon signed rank P = 2.61e-3). genomic variables to influence the patterns learned by the SCNN by We also evaluated the impact of the sampling and ranking providing molecular subtype information. procedures shown in Fig. 2 in improving the performance of We repeated our experiments using GSCNN models with SCNN models. Repeating the SCNN experiments without these histology images, IDH mutation status, and 1p/19q codeletion as sampling techniques reduced the median c index of SCNN inputs and found that the median c index improved from 0.754 to models to 0.696, significantly worse than for models where 0.801. The addition of genomic variables improved performance sampling was used (P = 6.55e-4). by 5% on average, and GSCNN models significantly outperform SCNN Predictions Correlate with Molecular Subtypes and Manual the baseline WHO subtype-grade model trained on equivalent Histologic Grade. To further investigate the relationship between data (signed rank P = 1.06e-2). To assess the value of integrating SCNN predictions and the WHO paradigm, we visualized how genomic variables directly into the network during training, we risks predicted by SCNN are distributed across molecular sub- compared GSCNN with a more superficial integration approach, type and histologic grade (Fig. 3C). SCNN predictions were where an SCNN model was first trained using histology images, highly correlated with both molecular subtype and grade and and then, the risks from this model were combined with IDH and were consistent with expected patient outcomes. First, within 1p/19q variables in a simple three-variable Cox model (Fig. S2). each molecular subtype, the risks predicted by SCNN increase Processing genomic variables in the fully connected layers and with histologic grade. Second, predicted risks are consistent with including them in training provided a statistically significant the published expected overall survivals associated with molec- benefit; models trained using the superficial approach performed ular subtypes (37). IDH WT astrocytomas are, for the most part, worse than GSCNN models with median c index decreasing to highly aggressive, having a median survival of 18 mo, and the 0.785 (signed rank P = 4.68e-2). collective predicted risks for these patients are higher than for To evaluate the independent prognostic power of risks pre- patients from other subtypes. IDH mutant astrocytomas are an- dicted by SCNN and GSCNN, we performed a multivariable Cox other subtype with considerably better overall survival ranging regression analysis (Table 1). In a multivariable regression that from 3 to 8 y, and the predicted risks for patients in this subtype included SCNN risks, subtype, grade, age, and sex, SCNN risks are more moderate. Notably, SCNN risks for IDH mutant as- had a hazard ratio of 3.05 and were prognostic when correcting trocytomas are not well-separated for grades II and III, consis- for all other features, including manual grade and molecular tent with reports of histologic grade being an inadequate subtype (P = 2.71e-12). Molecular subtype was also significant in predictor of outcome in this subtype (38). Infiltrating gliomas the SCNN multivariable regression model, but histologic grade with the combination of IDH mutations and codeletion of was not. We also performed a multivariable regression with chromosomes 1p/19q are classified as oligodendrogliomas in the GSCNN risks and found GSCNN to be significant (P = 9.69e-12) current WHO schema, and these have the lowest overall pre- with a hazard ratio of 8.83. In the GSCNN multivariable re- dicted risks consistent with overall survivals of 10+ y (37, 39). gression model, molecular subtype was not significant, but his- Finally, we noted a significant difference in predicted risks when tologic grade was marginally significant. We also used Kaplan– comparing the IDH mutant and IDH WT grade III astrocytomas Meier analysis to compare risk categories generated from SCNN (rank sum P = 6.56e-20). These subtypes share an astrocytic and GSCNN (Fig. S3). Survival curves for SCNN and GSCNN lineage and are graded using identical histologic criteria. Al- were very similar when evaluated on the entire cohort. In con- though some histologic features are more prevalent in IDH- trast, their abilities to discriminate survival within molecular mutant astrocytomas, these features are not highly specific or subtypes were notably different. sensitive to IDH mutant tumors and cannot be used to reliably predict IDH mutation status (40). Risks predicted by SCNN are Visualizing Histologic Patterns Associated with Prognosis. Deep consistent with worse outcomes for IDH WT astrocytomas in this learning networks are often criticized for being black box ap- case (median survival 1.7 vs. 6.3 y in the IDH mutant counter- proaches that do not reveal insights into their prediction mech- parts), suggesting that SCNN models can detect histologic dif- anisms. To investigate the visual patterns that SCNN models ferences associated with IDH mutations in astrocytomas. associate with poor outcomes, we used heat map visualizations to We also performed a Kaplan–Meier analysis to compare display the risks predicted by our network in different regions of manual histologic grading with “digital grades” based on SCNN whole-slide images. Transparent heat map overlays are fre- risk predictions (Fig. 3D). Low-, intermediate-, and high-risk quently used for visualization in digital pathology, and in our categories were established by setting thresholds on SCNN pre- study, these overlays enable pathologists to correlate the pre- dictions to reflect the proportions of manual histologic grades in dictions of highly accurate survival models with the underlying each molecular subtype (Methods). We observed that, within histology over the expanse of a whole-slide image. Heat maps each subtype, the differences in survival captured by SCNN risk were generated using a trained SCNN model to predict the risk categories are highly similar to manual histologic grading. SCNN for each nonoverlapping HPF in a whole-slide image. The pre- risk categories and manual histologic grades have similar prognostic dicted risks were used to generate a color-coded transparent E2974 | www.pnas.org/cgi/doi/10.1073/pnas.1717139115 Mobadersany et al. ... ... ... ... ... ... Genomic survival convolutional network (GSCNN) GSCNN prediction accuracy AB Fully 0.9 p=4.68e-2 p=1.06e-2 connected Cox Convolutional layers layers model 10μm 0.8 0.7 Image data 0.6 IDH mutation 1p/19q 0.5 co-deletion Genomic data Fig. 4. GSCNN models integrate genomic and imaging data for improved performance. (A) A hybrid architecture was developed to combine histology image and genomic data to make integrated predictions of patient survival. These models incorporate genomic variables as inputs to their fully connected layers. Here, we show the incorporation of genomic variables for gliomas; however, any number of genomic or proteomic measurements can be similarly used. (B) The GSCNN models significantly outperform SCNN models as well as the WHO paradigm based on genomic subtype and histologic grading. overlay, where red and blue indicate higher and lower SCNN current clinical standard based on genomic classification and risk, respectively. histologic grading of gliomas. In contrast to a previous study that A selection of risk heat maps from three patients is presented achieved only marginally better than random prediction accu- in Fig. 5, with inlays showing how SCNNs associate risk with racy, our approach rivals or exceeds the accuracy of highly important pathologic phenomena. For TCGA-DB-5273 (WHO trained human experts in predicting survival. Our study provides grade III, IDH mutant astrocytoma), the SCNN heat map clearly insights into applications of deep learning in medicine and the and specifically highlights regions of early microvascular pro- integration of histology and genomic data and provides methods liferation, an advanced form of angiogenesis that is a hallmark of for dealing with intratumoral heterogeneity and training data malignant progression, as being associated with high risk. Risk in deficits when using deep learning algorithms to predict survival this heat map also increases with cellularity, heterogeneity in from histology images. Using visualization techniques to gain nuclear shape and size (pleomorphism), and the presence of insights into SCNN prediction mechanisms, we found that abnormal microvascular structures. Regions in TCGA-S9- SCNNs clearly recognize known and time-honored histologic predictors of poor prognosis and that SCNN predictions suggest A7J0 have varying extents of tumor infiltration ranging from normal brain to sparsely infiltrated adjacent normal regions prognostic relevance for histologic patterns with significance that exhibiting satellitosis (where neoplastic cells cluster around is not currently appreciated by neuropathologists. neurons) to moderately and highly infiltrated regions. This heat Our study investigated the ability to predict overall survival in map correctly associates the lowest risks with normal brain re- diffuse gliomas, a disease with wide variations in outcomes and gions and can distinguish normal brain from adjacent regions an ideal test case where histologic grading and genomic classi- that are sparsely infiltrated. Interestingly, higher risks are fications have independent prognostic power. Treatment plan- assigned to sparsely infiltrated regions (region 1, Top) than to ning for gliomas is dependent on many factors, including patient regions containing relatively more tumor infiltration (region 2, age and grade, but gliomas assigned to WHO grades III and IV Top). We observed a similar pattern in TCGA-TM-A84G, where are typically treated very aggressively with radiation and con- edematous regions (region 1, Bottom) adjacent to moderately comitant chemotherapy, whereas WHO grade II gliomas may be cellular tumor regions (region 1, Top) are also assigned higher treated with chemotherapy or even followed in some cases (41). risks. These latter examples provide risk features embedded Histologic diagnosis and grading of gliomas have been limited by within histologic sections that have been previously unrecognized considerable intra- and interobserver variability (42). While the and could inform and improve pathology practice. emergence of molecular subtyping has resolved uncertainty re- lated to lineage, criteria for grading need to be redefined in the Discussion context of molecular subtyping. For example, some morphologic We developed a deep learning approach for learning survival features used to assess grade (e.g., mitotic activity) are no longer directly from histological images and created a unified frame- prognostic in IDH mutant astrocytomas (38). The field of neuro- work for integrating histology and genomic biomarkers for pre- oncology is currently awaiting features that can better discriminate dicting time-to-event outcomes. We systematically evaluated the more aggressive gliomas from those that are more indolent. Im- prognostic accuracy of our approaches in the context of the proving the accuracy and objectivity of grading will directly impact Table 1. Hazard ratios for single- and multiple-variable Cox regression models Single variable Multivariable (SCNN) Multivariable (GSCNN) Variable c Index Hazard ratio 95% CI P value Hazard ratio 95% CI P value Hazard ratio 95% CI P value SCNN 0.741 7.15 5.64, 9.07 2.08e-61 3.05 2.22, 4.19 2.71e-12 —— — GSCNN 0.781 12.60 9.34, 17.0 3.08e-64 —— — 8.83 4.66, 16.74 9.69e-12 IDH WT astrocytoma 0.726 9.21 6.88, 12.34 3.48e-52 4.73 2.57, 8.70 3.49e-7 0.97 0.43, 2.17 0.93 IDH mutant astrocytoma — 0.23 0.170, 0.324 2.70e-19 2.35 1.27, 4.34 5.36e-3 1.67 0.90, 3.12 0.10 Histologic grade IV 0.721 7.25 5.58, 9.43 2.68e-51 1.52 0.839, 2.743 0.159 1.98 1.11, 3.51 0.017 Histologic grade III — 0.44 0.332, 0.591 1.66e-08 1.57 0.934, 2.638 0.0820 1.78 1.07, 2.97 0.024 Age 0.744 1.77 1.63, 1.93 2.52e-42 1.33 1.20, 1.47 9.57e-9 1.34 1.22, 1.48 9.30e-10 Sex, female 0.552 0.89 0.706, 1.112 0.29 0.85 0.67, 1.08 0.168 0.86 0.68, 1.08 0.18 Bold indicates statistical significance (P < 5e-2). Mobadersany et al. PNAS | vol. 115 | no. 13 | E2975 Concordance index Subtype + grade GSCNN SCNN + subtype SCNN BIOPHYSICS AND MEDICAL SCIENCES PNAS PLUS COMPUTATIONAL BIOLOGY TCGA-DB-5273 (IDH-mut astrocytoma) Early microvascular proliferation 50μm 50μm 2.5mm 100μm TCGA-S9-A7J0 (IDH-mut astrocytoma) Infiltration 50μm Normal cortex 200μm 2.5mm Adjacent tumor TCGA-TM-A84G (Oligodendroglioma) 50μm 3 Edema 200μm 2.5mm Fig. 5. Visualizing risk with whole-slide SCNN heat maps. We performed SCNN predictions exhaustively within whole-slide images to generate heat map overlays of the risks that SCNN associates with different histologic patterns. Red indicates relatively higher risk, and blue indicates lower risk (the scale for each slide is different). (Top) In TCGA-DB-5273, SCNN clearly and specifically predicts high risks for regions of early microvascular proliferation (region 1) and also, higher risks with increasing tumor infiltration and cell density (region 2 vs. 3). (Middle) In TCGA-S9-A7J0, SCNN can appropriately discriminate between normal cortex (region 1in Bottom) and adjacent regions infiltrated by tumor (region 1 in Top). Highly cellular regions containing prominent microvascular structures (region 3) are again assigned higher risks than lower-density regions of tumor (region 2). Interestingly, low-density infiltrate in the cortex was associated with high risk (region 1 in Top). (Bottom) In TCGA-TM-A84G, SCNN assigns high risks to edematous regions (region 1 in Bottom) that are adjacent to tumor (region 1 in Top). patient care by identifying patients who can benefit from more of the associations between SCNN risk predictions, molecular aggressive therapeutic regimens and by sparing those with less subtypes, and histologic grades revealed that SCNN can effec- aggressive disease from unnecessary treatment. tively discriminate outcomes within each molecular subtype, ef- Remarkably, SCNN performed as well as manual histologic fectively performing digital histologic grading. Furthermore, grading or molecular subtyping in predicting overall survival in SCNN can effectively recognize histologic differences associated our dataset, despite using only a very small portion of each his- with IDH mutations in astrocytomas and predict outcomes for tology image for training and prediction. Additional investigation these patients accordingly. SCNNs correctly predicted lower E2976 | www.pnas.org/cgi/doi/10.1073/pnas.1717139115 Mobadersany et al. risks for WHO grade III IDH mutant astrocytomas compared cell density and nuclear pleomorphism were also associated with with WHO grade III IDH WT astrocytomas, consistent with the increased risk in all examples. SCNN also assigned high risks to considerably longer median survival for patients with IDH mu- regions that do not contain well-recognized features associated tant astrocytoma (6.3 vs. 1.7 y). While there are histologic fea- with a higher grade or poor prognosis. In region 1 of slide tures of astrocytomas that are understood to be more prevalent TCGA-S9-A7J0, SCNN assigns higher risk to sparsely infiltrated in IDH mutant astrocytomas, including the presence of micro- cerebral cortex than to region 2, which is infiltrated by a higher cysts and the rounded nuclear morphology of neoplastic nuclei, density of tumor cells (normal cortex in region 1 is properly these are not reliable predictors of IDH mutations (40). assigned a very low risk). Widespread infiltration into distant To integrate genomic information in prognostication, we de- sites of the brain is a hallmark of gliomas and results in treatment veloped a hybrid network that can learn simultaneously from failure, since surgical resection of visible tumor often leaves re- both histology images and genomic biomarkers. The GSCNN sidual neoplastic infiltrates. Similarly, region 1 of slide TCGA- presented in our study significantly outperforms the WHO TM-A84G illustrates a high risk associated with low-cellularity standard based on identical inputs. We compared the perfor- edematous regions compared with adjacent oligodendroglioma mance of GSCNN and SCNN in several ways to evaluate their with much higher cellularity. Edema is frequently observed ability to predict survival and to assess the relative importance of within gliomas and in adjacent brain, and its degree may be re- histology and genomic data in GSCNN. GSCNN had signifi- lated to the rate of growth (43), but its histologic presence has cantly higher c index scores due to the inclusion of genomic not been previously recognized as a feature of aggressive be- variables in the training process. Performance significantly de- havior or incorporated into grading paradigms. While it is not clined when using a superficial integration method that combines entirely clear why SCNN assigns higher risks to the regions in the genomic biomarkers with a pretrained SCNN model. sparsely infiltrated or edematous regions, these examples con- In multivariable regression analyses, GSCNN has a much firm that SCNN risks are not purely a function of cellular density higher hazard ratio than SCNN (8.83 vs. 3.05). Examining the or nuclear atypia. Our human interpretations of these findings other variables in the regression models, we noticed an in- provide possible explanations for why SCNN unexpectedly pre- teresting relationship between the significance of histologic- dicts high risks in these regions, but these findings need addi- grade and molecular subtype variables. In the SCNN regression tional investigation to better understand what specific features analysis, histologic-grade variables were not significant, but the SCNN network perceives in these regions. Nevertheless, this molecular subtype variables were highly significant, indicating shows that SCNN can be used to identify potentially practice- that SCNN could capture histologic information from image data changing features associated with increased risk that are em- but could not learn molecular subtype information entirely from bedded within pathology images. histology. In contrast, molecular subtype information was not Although our study provides insights into the application of significant in the GSCNN regression analysis. Interestingly, deep learning in precision medicine, it has some important histologic-grade variables were marginally significant, suggesting limitations. A relatively small portion of each slide was used for that some prognostic value in the histology images remained training and prediction, and the selection of ROIs within each untapped by GSCNN. slide required expert guidance. Future studies will explore more Kaplan–Meier analysis showed remarkable similarity in the advanced methods for automatic selection of regions and for discriminative power of SCNN and GSCNN. Additional Kaplan– incorporating a higher proportion of each slide in training and Meier analysis of risk categories within molecular subtypes prediction to better account for intratumoral heterogeneity. We revealed interesting trends that are consistent with the regression also plan to pursue the development of enhanced GSCNN models analyses presented in Table 1. SCNN clearly separates outcomes that incorporate additional molecular features and to evaluate the within each molecular subtype based on histology. Survival value added of histology in these more complex models. In our curves for GSCNN risk categories, however, overlap significantly Kaplan–Meier analysis, the thresholds used to define risk cate- in each subtype. Since SCNN models do not have access to ge- gories were determined in a subjective manner using the pro- nomic data when making predictions, their ability to discriminate portion of manual histologic grades in the TCGA cohort, and a outcomes was worse in general when assessed by c index or larger dataset would permit a more rigorous definition of these multivariable regression. thresholds to optimize survival stratification. The interpretation of Integration of genomic and histology data into a single pre- risk heat maps was based on subjective evaluation by neuropa- diction framework remains a challenge in the clinical implementa- thologists, and we plan to pursue studies that evaluate heat maps tion of computational pathology. Our previous work in developing in a more objective manner to discover and validate histologic deep learning survival models from genomic data has shown that features associated with poor outcomes. Finally, while we have accurate survival predictions can be learned from high-dimensional applied our techniques to gliomas, validation of these approaches genomic and protein expression signatures (29). Incorporating ad- in other diseases is needed and could provide additional insights. ditional genomic variables into GSCNN models is an area for future In fact, our methods are not specific to histology imaging or cancer research and requires larger datasets that combine histology images applications and could be adapted to other medical imaging mo- with rich genomic and clinical annotations. dalities and biomedical applications. While deep learning methods frequently deliver outstanding performance, the interpretability of black box deep learning Methods models is limited and remains a significant barrier in their vali- Data and Image Curation. Whole-slide images and clinical and genomic data dation and adoption. Heat map analysis provides insights into were obtained from TCGA via the Genomic Data Commons (https://gdc. the histologic patterns associated with increased risk and can also cancer.gov/). Images of diagnostic H&E-stained, formalin-fixed, paraffin- embedded sections from the Brain LGG and the GBM cohorts were serve as a practical tool to guide pathologists to tissue regions reviewed to remove images containing tissue-processing artifacts, including associated with worse prognosis. The heat maps suggest that bubbles, section folds, pen markings, and poor staining. Representative ROIs SCNN can learn visual patterns known to be associated with containing primarily tumor nuclei were manually identified for each slide histologic features related to prognosis and used in grading, in- that passed a quality control review. This review identified whole-slide im- cluding microvascular proliferation, cell density, and nuclear ages with poor image quality arising from imaging artifacts or tissue pro- morphology. Microvascular prominence and proliferation are cessing (bubbles, significant tissue section folds, overstaining, understaining) associated with disease progression in all forms of diffuse glioma, where suitable ROIs could not be selected. In the case of grade IV disease, and these features are clearly delineated as high risk in the heat some regions include microvascular proliferation, as this feature was map presented for slide TCGA-DB-5273. Likewise, increases in exhibited throughout tumor regions. Regions containing geographic necrosis Mobadersany et al. PNAS | vol. 115 | no. 13 | E2977 BIOPHYSICS AND MEDICAL SCIENCES PNAS PLUS COMPUTATIONAL BIOLOGY were excluded. A total of 1,061 whole-slide images from 769 unique patients the robustness of the network to tissue orientation. Similar approaches for were analyzed. training data augmentation have shown considerable improvements in general imaging applications (33). ROI images (1,024 × 1,024 pixels) were cropped at 20× objective magni- fication using OpenSlide and color-normalized to a gold standard H&E cal- ibration image to improve consistency of color characteristics across slides. Testing Sampling, Risk Filtering, and Model Averaging. Sampling was also HPFs at 256 × 256 pixels were sampled from these regions and used for performed to increase the robustness and stability of predictions. (i) Nine training and testing as described below. HPFs are first sampled from each region j corresponding to patient m.(ii) j,k The risk of the kth HPF in region j for patient m, denoted R , is then cal- Network Architecture and Training Procedures. The SCNN combines elements culated using the trained SCNN model. (iii) The median risk of the 19-layer Visual Geometry Group (VGG) convolutional network archi- j j,k R = median fR g is calculated for region j using the aforementioned HPFs m k m tecture with a Cox proportional hazards model to predict time-to-event data to reject outlying risks. (iv) These median risks are then sorted from highest from images (Fig. S1) (44). Image feature extraction is achieved by four c 1 c 2 c 3 to lowest R > R > R .. ., where the superscript index now corresponds to m m m groups of convolutional layers. (i) The first group contains two convolutional the risk rank. (v) The risk prediction for patient m is then selected as the layers with 64 3 × 3 kernels interleaved with local normalization layers and * c 2 then followed with a single maximum pooling layer. (ii) The second group second highest risk R = R . This filtering procedure was designed to emu- contains two convolutional layers (128 3 × 3 kernels) interleaved with two late how a pathologist integrates information from multiple areas within a local normalization layers followed by a single maximum pooling layer. (iii) slide, determining prognosis based on the region associated with the worst The third group interleaves four convolutional layers (256 3 × 3 kernels) with prognosis. Selection of the second highest risk (as opposed to the highest four local normalization layers followed by a single maximum pooling layer. risk) introduces robustness to outliers or high risks that may occur due to (iv) The fourth group contains interleaves of eight convolutional (512 3 × some imaging or tissue-processing artifact. 3 kernels) and eight local normalization layers, with an intermediate pooling Since the accuracy of our models can vary significantly from one epoch to layer and a terminal maximum pooling layer. These four groups are fol- another, largely due to the training sampling and randomized minibatch lowed by a sequence of three fully connected layers containing 1,000, 1,000, assignments, a model-averaging technique was used to reduce prediction and 256 nodes, respectively. variance. To obtain final risk predictions for the testing patients that are The terminal fully connected layer outputs a prediction of risk R = β X stable, we perform model averaging using the models from epochs 96 to 256×1 100 to smooth variations across epochs and increase stability. Formally, the associated with the input image, where β ∈R are the terminal layer 256×1 model-averaged risk for patient m is calculated as weights and X ∈R are the inputs to this layer. To provide an error signal for backpropagation, these risks are input to a Cox proportional hazards layer to calculate the negative partial log likelihood: * * R = R , [2] mðγÞ γ=96 X X T β X Lðβ, XÞ =− β X − log e , [1] * where R denotes the predicted risk for patient m in training epoch γ. mðγÞ i∈U j∈Ω Validation Procedures. Patients were randomly assigned to nonoverlapping where β X is the risk associated with HPF i, U is the set of right-censored training (80%) and test (20%) sets that were used to train models and evaluate samples, and Ω is the set of “at-risk” samples with event or follow-up times their performance. If a patient was assigned to training, then all slides corre- Ω = fjjY ≥ Y g (where Y is the event or last follow-up time of patient i). i j i i sponding to that patient were assigned to the training set and likewise, for the The adagrad algorithm was used to minimize the negative partial log testing set. This ensures that no data from any one patient are represented in likelihood via backpropagation to optimize model weights, biases, and both training and testing sets to avoid overfitting and optimistic estimates of convolutional kernels (45). Parameters to adagrad include the initial accu- generalization accuracy. We repeated the randomized assignment of patients mulator value = 0.1, initial learning rate = 0.001, and an exponential training/testing sets 15 times and used each of these training/testing sets to learning rate decay factor = 0.1. Model weights were initialized using the train and evaluate a model. The same training/testing assignments were used in variance scaling method (46), and a weight decay was applied to the fully each model (SCNN, GSCNN, baseline) for comparability. Prediction accuracy was connected layers during training (decay rate = 4e-4). Models were trained for measured using Harrell’s c index to measure the concordance between pre- 100 epochs (1 epoch is one complete cycle through all training samples) using dicted risk and actual survival for testing samples (36). minibatches consisting of 14 HPFs each. Each minibatch produces a model update, resulting in multiple updates per epoch. Calculation of the Cox partial Statistical Analyses. The c indices generated by Monte Carlo cross-validation likelihood requires access to the predicted risks of all samples, which are not were performed using the Wilcoxon signed rank test. This paired test was available within any single minibatch, and therefore, Cox likelihood was cal- chosen, because each method was evaluated using identical training/testing culated locally within each minibatch to perform updates (U and Ω were re- sets. Comparisons of SCNN risk values across grade were performed using the stricted to samples within each minibatch). Local likelihood calculation can be Wilcoxon rank sum test. Cox univariable and multivariable regression analyses very sensitive to how samples are assigned to minibatches, and therefore, we were performed using predicted SCNN risk values for all training and testing randomize the minibatch sample assignments at the beginning of each epoch samples in randomized training/testing set 1. Analyses of the correlation of to improve robustness. Mild regularization was applied during training by grade, molecular subtype, and SCNN risk predictions were performed by randomly dropping out 5% of weights in the last fully connected layer in each pooling predicted risks for testing samples across all experiments. SCNN risks minibatch during training to mitigate overfitting. were normalized within each experiment by z score before pooling. Grade analysis was performed by determining “digital”-grade thresholds for SCNN Training Sampling. Each patient has possibly multiple slides and multiple risks in each subtype. Thresholds were objectively selected to match the pro- regions within each slide that can be used to sample HPFs. During training, a portions of samples in each histologic grade in each subtype. Statistical analysis single HPF was sampled from each region, and these HPFs were treated as of Kaplan–Meier plots was performed using the log rank test. semiindependent training samples. Each HPF was paired with patient out- come for training, duplicating outcomes for patients containing multiple Hardware and Software. Prediction models were trained using TensorFlow regions/HPFs. The HPFs are sampled at the beginning of each training epoch (v0.12.0) on servers equipped with dual Intel(R) Xeon(R) CPU E5-2630L v2 @ to generate an entirely new set of HPFs. Randomized transforms were also 2.40 GHz CPUs, 128 GB RAM, and dual NVIDIA K80 graphics cards. Image data applied to these HPFs to improve robustness to tissue orientation and color were extracted from Aperio .svs whole-slide image formats using OpenSlide variations. Since the visual patterns in tissues can often be anisotropic, we (openslide.org/). Basic image analysis operations were performed using randomly apply a mirror transform to each HPF. We also generate random HistomicsTK (https://github.com/DigitalSlideArchive/HistomicsTK), a Python transformations of contrast and brightness using the “random_contrast” and package for histology image analysis. “random_brightness” TensorFlow operations. The contrast factor was ran- domly selected in the interval [0.2, 1.8], and the brightness was randomly selected in the interval [−63, 63]. These sampling and transformation pro- Data Availability. This paper was produced using large volumes of publicly cedures along with the use of multiple HPFs for each patient have the effect available genomic and imaging data. The authors have made every effort to of augmenting the effective size of the labeled training data. In tissues with make available links to these resources as well as make publicly available the pronounced anisotropy, including adenocarcinomas that exhibit prominent software methods and information used to produce the datasets, analyses, glandular structures, these mirror transformations are intended to improve and summary information. E2978 | www.pnas.org/cgi/doi/10.1073/pnas.1717139115 Mobadersany et al. ACKNOWLEDGMENTS. This work was supported by US NIH National Li- tional Cancer Institute Grant U24CA194362 and by the National Brain brary of Medicine Career Development Award K22LM011576 and Na- Tumor Society. 1. Kong J, et al. (2008) Computer-assisted grading of neuroblastic differentiation. Arch 25. Turkki R, Linder N, Kovanen PE, Pellinen T, Lundin J (2016) Antibody-supervised deep Pathol Lab Med 132:903–904, author reply 904. learning for quantification of tumor-infiltrating immune cells in hematoxylin and 2. Niazi MKK, et al. (2017) Visually meaningful histopathological features for automatic eosin stained breast cancer samples. J Pathol Inform 7:38. grading of prostate cancer. IEEE J Biomed Health Inform 21:1027–1038. 26. Bychkov D, Turkki R, Haglund C, Linder N, Lundin J (2016) Deep learning for tissue 3. Naik S, et al. (2008) Automated gland and nuclei segmentation for grading of pros- microarray image-based outcome prediction in patients with colorectal cancer. SPIE Medical Imaging, eds Gurcan MN, Madabhushi A (International Society for Optics and tate and breast cancer histopathology. Proceedings of the 2008 5th IEEE International Photonics, Bellingham, WA), p 6. Symposium on Biomedical Imaging: From Nano to Macro (IEEE, Piscataway, NJ), pp 27. Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis DI (2014) Machine 284–287. learning applications in cancer prognosis and prediction. Comput Struct Biotechnol J 4. Ren J, et al. (2015) Computer aided analysis of prostate histopathology images 13:8–17. Gleason grading especially for Gleason score 7. Conf Proc IEEE Eng Med Biol Soc 2015: 28. Xiang A, Lapuerta P, Ryutov A, Buckley J, Azen S (2000) Comparison of the perfor- 3013–3016. mance of neural network methods and Cox regression for censored survival data. 5. Kothari S, Phan JH, Young AN, Wang MD (2013) Histological image classification Comput Stat Data Anal 34:243–257. using biologically interpretable shape-based features. BMC Med Imaging 13:9. 29. Yousefi S, et al. (2017) Predicting clinical outcomes from large scale cancer genomic 6. Sertel O, et al. (2009) Computer-aided prognosis of neuroblastoma on whole-slide profiles with deep survival models. Sci Rep 7:11707. images: Classification of stromal development. Pattern Recognit 42:1093–1103. 30. Yousefi S, Congzheng S, Nelson N, Cooper LAD (2016) Learning genomic represen- 7. Fauzi MF, et al. (2015) Classification of follicular lymphoma: the effect of computer tations to predict clinical outcomes in cancer. arXiv:1609.08663. aid on pathologists grading. BMC Med Inform Decis Mak 15:115. 31. Katzman J, et al. (2016) DeepSurv: Personalized treatment recommender system us- 8. Dundar MM, et al. (2011) Computerized classification of intraductal breast lesions ing A Cox proportional hazards deep neural network. arXiv:1606.00931. using histopathological images. IEEE Trans Biomed Eng 58:1977–1984. 32. Zhu X, Yao J, Huang J (2016) Deep convolutional neural network for survival analysis 9. Hou L, et al. (2016) Patch-based convolutional neural network for whole slide tissue with pathological images. Proceedings of the 2016 IEEE International Conference on image classification. Proceedings of the 2016 IEEE Conference on Computer Vision Bioinformatics and Biomedicine (IEEE, Piscataway, NJ), pp 544–547. and Pattern Recognition (CVPR) (IEEE, Piscataway, NJ), pp 2424–2433. 33. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep con- 10. Kong J, et al. (2013) Machine-based morphologic analysis of glioblastoma using volutional neural networks. Advances in Neural Information Processing Systems, eds whole-slide pathology images uncovers clinically relevant molecular correlates. PLoS Pereira F, Burges CJC, Bottou L, Weinberger KQ (Neural Information Processing Sys- One 8:e81049. tems Foundation, Inc., La Jolla, CA), pp 1097–1105. 11. Wang D, Khosla A, Gargeya R, Irshad H, Beck AH (2016) Deep learning for identifying 34. Gutman DA, et al. (2013) Cancer Digital Slide Archive: an informatics resource to metastatic breast cancer. arXiv:1606.05718. support integrated in silico analysis of TCGA pathology data. J Am Med Inform Assoc 12. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444. 20:1091–1098. 13. Greenspan H, van Ginneken B, Summers RM (2016) Guest editorial deep learning in 35. Gutman DA, et al. (2017) The digital slide archive: A software platform for man- medical imaging: Overview and future promise of an exciting new technique. IEEE agement, integration, and analysis of histology for cancer research. Cancer Res 77: Trans Med Imaging 35:1153–1159. e75–e78. 14. Janowczyk A, Madabhushi A (2016) Deep learning for digital pathology image 36. Harrell FE, Jr, Califf RM, Pryor DB, Lee KL, Rosati RA (1982) Evaluating the yield of analysis: A comprehensive tutorial with selected use cases. J Pathol Inform 7:29. medical tests. JAMA 247:2543–2546. 15. Litjens G, et al. (2016) Deep learning as a tool for increased accuracy and efficiency of 37. Brat DJ, et al.; Cancer Genome Atlas Research Network (2015) Comprehensive, in- histopathological diagnosis. Sci Rep 6:26286. tegrative genomic analysis of diffuse lower-grade gliomas. N Engl J Med 372: 16. Chen T, Chefd’hotel C (2014) Deep learning based automatic immune cell detection 2481–2498. for immunohistochemistry images. Machine Learning in Medical Imaging (Springer, 38. Reuss DE, et al. (2015) IDH mutant diffuse and anaplastic astrocytomas have similar Berlin), pp 17–24. age at presentation and little difference in survival: a grading problem for WHO. Acta 17. Cruz-Roa A, et al. (2017) Accurate and reproducible invasive breast cancer detection Neuropathol 129:867–873. in whole-slide images: A Deep Learning approach for quantifying tumor extent. Sci 39. Leeper HE, et al. (2015) IDH mutation, 1p19q codeletion and ATRX loss in WHO grade Rep 7:46450. II gliomas. Oncotarget 6:30295–30305. 18. Pereira S, Pinto A, Alves V, Silva CA (2016) Brain tumor segmentation using con- 40. Nguyen DN, et al. (2013) Molecular and morphologic correlates of the alternative volutional neural networks in MRI images. IEEE Trans Med Imaging 35:1240–1251. lengthening of telomeres phenotype in high-grade astrocytomas. Brain Pathol 23: 19. Sirinukunwattana K, et al. (2016) Locality sensitive deep learning for detection and 237–243. classification of nuclei in routine colon cancer histology images. IEEE Trans Med 41. Wijnenga MMJ, et al. (2018) The impact of surgery in molecularly defined low-grade Imaging 35:1196–1206. glioma: an integrated clinical, radiological, and molecular analysis. Neuro-oncol 20: 20. Esteva A, et al. (2017) Dermatologist-level classification of skin cancer with deep 103–112. neural networks. Nature 542:115–118. 42. van den Bent MJ (2010) Interobserver variation of the histopathological diagnosis in 21. Gulshan V, et al. (2016) Development and validation of a deep learning algorithm for clinical trials on glioma: a clinician’s perspective. Acta Neuropathol 120:297–304. detection of diabetic retinopathy in retinal fundus photographs. JAMA 316: 43. Pope WB, et al. (2005) MR imaging correlates of survival in patients with high-grade 2402–2410. gliomas. AJNR Am J Neuroradiol 26:2466–2474. 22. Havaei M, et al. (2017) Brain tumor segmentation with deep neural networks. Med 44. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale Image Anal 35:18–31. image recognition. arXiv:1409.1556. 23. Huynh BQ, Li H, Giger ML (2016) Digital mammographic tumor classification using 45. Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning transfer learning from deep convolutional neural networks. J Med Imaging and stochastic optimization. J Mach Learn Res 12:2121–2159. (Bellingham) 3:034501. 46. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human- 24. Kamnitsas K, et al. (2017) Efficient multi-scale 3D CNN with fully connected CRF for level performance on imagenet classification. IEEE International Conference on accurate brain lesion segmentation. Med Image Anal 36:61–78. Computer Vision (IEEE, Piscataway, NJ), pp 1026–1034. Mobadersany et al. PNAS | vol. 115 | no. 13 | E2979 BIOPHYSICS AND MEDICAL SCIENCES PNAS PLUS COMPUTATIONAL BIOLOGY

Journal

Proceedings of the National Academy of Sciences of the United States of AmericaPubmed Central

Published: Mar 12, 2018

There are no references for this article.