Open Advanced Search
Get 20M+ Full-Text Papers For Less Than $1.50/day.
Subscribe now for You or Your Team.
Learn More →
Cox-nnet: An artificial neural network method for prognosis prediction of high-throughput omics data
Cox-nnet: An artificial neural network method for prognosis prediction of high-throughput omics data
a1111111111 a1111111111 Artificial neural networks (ANN) are computing architectures with many interconnections of simple neural-inspired computing elements, and have been applied to biomedical fields such as imaging analysis and diagnosis. We have developed a new ANN framework called Cox-nnet to predict patient prognosis from high throughput transcriptomics data. In 10 OPENACCESS TCGA RNA-Seq data sets, Cox-nnet achieves the same or better predictive accuracy com- pared to other methods, including Cox-proportional hazards regression (with LASSO, ridge, Citation: Ching T, Zhu X, Garmire LX (2018) Cox- nnet: An artificial neural network method for and mimimax concave penalty), Random Forests Survival and CoxBoost. Cox-nnet also prognosis prediction of high-throughput omics reveals richer biological information, at both the pathway and gene levels. The outputs from data. PLoS Comput Biol 14(4): e1006076. https:// the hidden layer node provide an alternative approach for survival-sensitive dimension doi.org/10.1371/journal.pcbi.1006076 reduction. In summary, we have developed a new method for accurate and efficient progno- Editor: Florian Markowetz, University of sis prediction on high throughput data, with functional biological insights. The source code is Cambridge, UNITED KINGDOM freely available at https://github.com/lanagarmire/cox-nnet. Received: December 16, 2016 Accepted: March 7, 2018 Published: April 10, 2018 Author summary Copyright:© 2018 Ching et al. This is an open access article distributed under the terms of the The increasing application of high-througput transcriptomics data to predict patient Creative Commons Attribution License, which prognosis demands modern computational methods. With the re-gaining popularity of permits unrestricted use, distribution, and artificial neural networks, we asked if a refined neural network model could be used to reproduction in any medium, provided the original predict patient survival, as an alternative to the conventional methods, such as Cox pro- author and source are credited. portional hazards (Cox-PH) methods with LASSO or ridge penalization. To this end, we Data Availability Statement: Data used in this have developed a neural network extension of the Cox regression model, called Cox-nnet. publication come from The Cancer Genome Atlas, It is optimized for survival prediction from high throughput gene expression data, with managed by NIH (https://cancergenome.nih.gov/). Readers can access the TCGA RNA-Seq data by comparable or better performance than other conventional methods. More importantly, visiting https://portal.gdc.cancer.gov/legacy- Cox-nnet reveals much richer biological information, at both the pathway and gene levels, archive/search/f, selecting ªgene expression dataº by analyzing features represented in the hidden layer nodes in Cox-nnet. Additionally, we and downloading using the downloading propose to use hidden node features as a new approach for dimension reduction during application. The data from TCGA is detailed in the survival data analysis. methods section of the manuscript. Code used in this publication is freely available at https://github. com/lanagarmire/cox-nnet. PLOS Computational Biology | https://doi.org/10.1371/journal.pcbi.1006076 April 10, 2018 1 / 18 Artificial neural network for prognosis Funding: This research was supported by grants Introduction K01ES025434 awarded by NIEHS through funds provided by the trans-NIH Big Data to Knowledge With the wide application of genomics technologies, gene expression data of patients are often (BD2K) initiative (www.bd2k.nih.gov), P20 COBRE used as inputs to predict patients' survival. Computationally, survival prediction is usually GM103457 awarded by NIH/NIGMS, R01 framed as a regression problem to model patients' survival time (or other event time). The LM012373 awarded by NLM, R01 HD084633 awarded by NICHD, and Hawaii Community most common method is the Cox-PH model, a semi-parametric proportional hazards model, Foundation Medical Research Grant 14ADVC- where the covariates of the models explain the relative risks of the patients, termed hazard 64566 to LXG. The funders had no role in study ratios . Given the large amount of input features in gene expression data, penalization meth- design, data collection and analysis, decision to ods such as LASSO (L1 norm), ridge (L2 norm) and MCP  regularizations are often used to publish, or preparation of the manuscript. help select representative feature in Cox-PH models. A modification of Cox-PH model is Cox- Competing interests: The authors have declared Boost . It is an iterative ªgradient boostingº method, where the parameters are separated that no competing interests exist. into individual partitions. The partition that leads to the largest improvement in the penalized partial log likelihood is selected and in subsequent iterations, the model selects another block and refits those parameters by maximizing the penalized partial log likelihood . Random Forests Survival (RF-S) is a tree-based, non-linear, ensemble method , rather than a propor- tional hazards model. For each tree in the forest, data is bootstrapped, and nodes are split by maximizing the log-rank statistic. The cumulative hazard function (CHF) is estimated in each tree and a patient's CHF is calculated as an average over all the trees in the ensemble. Besides these methods above, Artificial Neural networks (ANNs), a type of model that is based on the idea of neurons in processing information, could be trained to predict survival as well. Developed in 1943, ANNs were used to model non-linear behavior . In an ANN, hid- den units, termed as neurons or nodes, may be activated or deactivated, depending on the input signals, based their own linear weight and bias parameters. The data are fed forward through the network, and for each hidden unit these weight and bias parameters are learned through back propagation along the gradient of the loss function. In recent years, ANNs have caught renewed attention to solve problems in genomics field [6, 7], thanks to increased paral- lel computing power and the promise of deep learning . For example, Alipanhi et al. used deep learning in order to better predict the bind of RNA and DNA to proteins . Ciresan et al. used convolutional neural networks to detect cell mitosis in histological breast cancer images . However, relative to these new areas, survival prediction using ANN has been lag- ging behind. The first ANN model to predict survival was done by Faraggi and Simon, who used four clinical input parameters to model prostate cancer survival. However, their simple model was not suitable for high throughput input data, where tens of thousands of features are pres- ent per patient. Subsequently, other authors attempted to implement ANN methods to predict patient survival. One study applied ANNs to high dimensional survival data by simplifying the regression as a binary classification problem [12, 13], and another study fit continuous vari- ables of survival time to discrete variables through binning [12, 13]. These approaches poten- tially led to loss of accuracy in prediction. Another study used time as an additional input in order to predict patient survival or censoring status , which will overfit when the survival and censoring are correlated. Thus far, an ANN model based on proportional hazards to ana- lyze high throughput data in the genomics era is lacking. To address all the issues of ANN based predictions as mentioned earlier, we have developed a new software package, named Cox-nnet. We use a two layer neural network: one hidden layer and the output layer. Rather than approximating survival as a classification problem, we used the output layer to perform Cox regression based on the activation levels of the hidden layer. Cox-nnet also computes feature importance scores, so that the relative importance of specific genes to prognosis outcome can be assessed. More importantly, the hidden layer node structure in the ANN can be analyzed to reveal more useful information regarding relevant PLOS Computational Biology | https://doi.org/10.1371/journal.pcbi.1006076 April 10, 2018 2 / 18 Artificial neural network for prognosis genes and pathways, compared to other methods in the study. A similar idea for classification (rather than survival analysis) was recently explored in dimension reduction of single cell RNA-Seq data, in which a set of input genes with high weights to the hidden nodes of the neu- ral network, in single cell RNA-Seq was analyzed using GO analysis . Overall, Cox-nnet is a desirable survival analysis method with both excellent predictive power and ability to eluci- date biological functions related to prognosis. Results Cox-nnet structure and optimization The neural network model used in this paper is shown in Fig 1 and an overview of modules in the Cox-nnet package is shown in S1 Fig. The current ANN architecture is composed of the input, one fully connected hidden layer (143 nodes) and an output ªproportional hazardsº layer. Cox-nnet performs cross-validation (CV) to find the optimal regularization parameter. Due to the large number of parameters, overfitting is a potential problem in ANNs, particu- larly for small datasets. Thus for regularization, we experimented with a range of regularization methods, including ridge, dropout , and the combination of ridge and dropout (see details Fig 1. An overview of the optimized Cox-nnet neural network architecture used in this study. Cox-nnet is composed of one hidden layer and an output ªCox-regressionº layer. It is optimized to work on high dimensional gene expression data. The model is trained to minimize the partial log likelihood using back-propagation. https://doi.org/10.1371/journal.pcbi.1006076.g001 PLOS Computational Biology | https://doi.org/10.1371/journal.pcbi.1006076 April 10, 2018 3 / 18 Artificial neural network for prognosis in Methods). We found that dropout regularization offered overall the best model (S2A and S2B Fig). Furthermore, we compared Cox-nnet structures within no hidden layer (a standard Cox-PH model), one hidden layer (143 nodes) and two hidden layers (143 nodes in both lay- ers) under dropout regularization (S3 Fig). We found that a single hidden layer Cox-nnet per- formed slightly better than those with no hidden layer (standard Cox-PH) or two hidden layers (S3A and S3B Fig). Thus, we used the single hidden-layer Cox-nnet with dropout regu- larization (average dropout rate = 7.75 +/- 0.042), for comparison with other survival methods in all following analyses. Many other functions are implemented to improve the usability of the package (S1 Fig). Among them, the optimizers for adapting the learning rate include momentum gradient descent  and Nesterov accelerated gradient . A comparison of these descent methods is shown in S4A Fig. We chose Nesterov accelerated gradient search method for this report. Other parameterization details of Cox-nnet are described in the Methods section. Performance comparison of survival prediction methods We compared four methods, including Cox-nnet, Cox-PH (including Ridge, LASSO and MCP penalizations), CoxBoost and RF-S on 10 datasets from The Cancer Genome Atlas (TCGA). These datasets were selected for having at least 50 death events (S1 Table). For each dataset, we trained the model on 80% of the randomly selected samples and determined the regularization parameter using 5-fold CV on the training set. We evaluated the performance on the remaining 20% holdout test set. We replicated this evaluation 10 times in order to assess the average distribution of each method. We used four accuracy metrics to evaluate the performance of the model. The first one is C-IPCW (inverse probability of censoring weighted) . This metric aims to overcome the inaccuracy of the unweighted concordance index when censoring time is correlated with the patient's hazard score. The second metric is Harrell's concordance index (C-harrel) , which is an unweighted concordance index that evaluates the relative ordering of the samples, comparing the prognostic index (i.e., log hazard ratio) of each patient with the survival times. The third metric is the log-ranked p-value from Kaplan-Meier survival curves of two different survival risk groups. This is done by using the median Prognosis Index (PI), the output of Cox-nnet, to dichotomize the patients into high risk and low risk groups, similar to our earlier reports [21, 22]. A log-ranked p-value is then computed to differentiate the Kaplan-Meier sur- vival curves from these two groups. It is worth noticing that the dichotomization of patients ignores the differences within each dichotomized group, thus may lead to less accuracy com- pared to C-index and IPCW metrics. Finally, the Integrated Brier Score (Brier) was also calcu- lated. This score calculates the squared error between the predicted survival probability and the actual survival of patients at each time point [22±24]. The comparison of C-IPCW among the four methods over the 10 TCGA datasets is shown in Fig 2. Based on the C-IPCW score, Cox-nnet has better overall rankings than other methods (Fig 2B), but the improvement over Cox-PH is lacking statistically significance in most cases (Fig 2A). Note among the three penalization methods applied to Cox-PH, ridge penalization has the best overall accuracy (S5 Fig), and thus Cox-PH with ridge penalization is chosen to compare with the other methods. However, when using C-harrel (S6 Fig) and the log-rank p- value metrics (S7 Fig), Cox-nnet had significantly improved performance compared to all other methods. Based on the Brier score metric, Cox-nnet had significantly higher predictive accuracy compared to RF-S (S8 Fig). Overall, the other non-linear method (RF-S), an ensem- ble-based method consistently ranks worse than Cox-nnet and Cox-PH (Fig 2, S6, S7 and S8 Figs). PLOS Computational Biology | https://doi.org/10.1371/journal.pcbi.1006076 April 10, 2018 4 / 18 Artificial neural network for prognosis Fig 2. A. Boxplot of the C-IPCW of the 10 TCGA datasets using four prognosis-predicting methods: Cox-nnet (dropout), CoxBoost, Cox-PH (ridge) and RF-S. The data were randomly split into 80% training and 20% testing sets, and repeated 10 times. Average C-IPCWs are presented as the metric. For ªoverallº condition, all 10 TCGA cancer datasets are combined as one ªcancerº dataset. Sign indicates statistical significance (p < 0.05). B. Heatmap of the performance rank of each dataset, based on the order of the average C-IPCW scores. Ranks 1, 2, 3, and 4 indicate the descending performance of each computational method. https://doi.org/10.1371/journal.pcbi.1006076.g002 Hidden layer nodes of Cox-nnet are surrogate prognostic features To explore the biological relevance of the hidden nodes of Cox-nnet, we used the TCGA Kid- ney Renal Cell Carcinoma (KIRC) dataset as an example. We first extracted the contribution of each hidden node to the PI score for each patient (Fig 3A). The contribution was calculated as the output value of each hidden node weighted by the corresponding coefficient at the Cox regression output layer. As expected, the value of the hidden nodes strongly correlated to the PI score. However, there is still significant heterogeneity among the nodes, suggesting that individual nodes may reflect different biological processes. PLOS Computational Biology | https://doi.org/10.1371/journal.pcbi.1006076 April 10, 2018 5 / 18 Artificial neural network for prognosis Fig 3. A. Hidden node activation weighted by the corresponding Cox layer coefficients of the TCGA KIRC dataset. The columns represent individual patient scores, ordered by their Prognostic Index. The rows represent the node activations. B. t-SNE plot of the top 20 nodes (left) and t-SNE of differentially expressed genes between the two groups with low and high prognostic index, respecitively (right). C. Gene Set Enrichment Analysis: significantly enriched KEGG pathways of the top 20 hidden nodes (adjusted p-value < 0.05). https://doi.org/10.1371/journal.pcbi.1006076.g003 We hypothesize that the top (most variable) nodes may serve as surrogate features to dis- criminate patient survival. To explore this idea, we selected the top 20 nodes with the highest variances and presented the patients PI scores using t-SNE (Fig 3B). t-SNE is a non-linear dimensionality reduction method that embeds high-dimensional datasets into a low dimen- sional space (usually two or three dimensions). This method has been widely used to visualize data with large number of features, by enhancing the separation among samples. The hid- den nodes represent a dimension reduction of the original data and they clearly discriminate samples by their PI scores, as shown by the t-SNE plot (Fig 3B, left). As comparison, we per- formed t-SNE using all differentially expressed genes of patients with low prognostic index and high prognostic index (Fig 3B, right). The t-SNE plots demonstrates that the nodes in Cox-nnet effectively capture the survival information. Therefore, the top node PI scores can be used as features for dimension reduction in survival analysis. PLOS Computational Biology | https://doi.org/10.1371/journal.pcbi.1006076 April 10, 2018 6 / 18 Artificial neural network for prognosis Hidden layer nodes of Cox-nnet are associated with biological functions To further explore the biological relevance of the top 20 hidden nodes, we conducted Gene Set Enrichment Analysis (GSEA)  using KEGG pathways , as described in the Meth- ods section. Briefly, we calculated significantly enriched pathways using Pearson's correla- tion between the log transformed gene expression input and the output score of each node across all patients in the KIRC dataset (Fig 3C and S2 Table). We compared these enriched pathways to those from GSEA of the Cox-PH (ridge) model (S3 Table), the competing model with the second best prognosis prediction. A total of 110 (out of 187) significantly enriched pathways (S2 Table) were identified in at least one node, including seven pathways enriched in all 20 nodes that were not found by the Cox-PH method (Table 1). In contrast, Cox-PH only identified 30 significantly enriched pathways using the same significance threshold. We also used the genes values from CoxBoost and RF-S, however they did not produce any significantly enriched pathways. Among the seven pathways enriched in all 20 nodes from Cox-nnet, the p53 signaling pathway stands out as an important biologically rel- evant pathway (S9 Fig), since it was shown to be highly prognostic of patient survival in kid- ney cancer . Next, we estimated the predictive accuracies of the leading edge genes  enriched in the GSEA from Cox-nnet vs. those enriched in Cox-PH model. Leading edge genes are those genes in the pathway of interest that contribute positively to the enrichment score in GSEA. We used the C-IPCW of each leading edge gene, obtained from single-variable analysis (Fig 4). Collectively, leading edge genes from Cox-nnet have significantly higher C-IPCW scores (p = 1.253e-05) than those from Cox-PH, suggesting that Cox-nnet has selected more informa- tive features. In order to visualize these gene level and pathway level differences between Cox- nnet and Cox-PH, we reconstructed a bipartite graph between leading edge genes for Cox- nnet or feature genes (for Cox-PH) and their corresponding enriched pathways (Fig 5). Besides the p53 pathway mentioned earlier that is specific to Cox-nnet, several other pathways, such as insulin signaling pathway, endocytosis and adherens junction, also have many more genes enriched in Cox-nnet. Among these genes specific to Cox-nnet, many have been previ- ously reported to relevant to renal carcinoma development and prognosis, such as CASP9, TGFBR2, KDR (VEGFR). These results suggest that Cox-nnet model reveals richer biological information than Cox-PH. Additionally, we compared the partial derivative of the hidden nodes (rather than the Cox- nnet output), with respect to the input genes. We first calculated the gradient for each patient and calculated the average partial derivatives and replicated the GSEA analysis as for the previ- ous analysis. However, we found that fewer pathways are significant, and are less relevant to cancer using this approach. Table 1. Cox-nnet node-associated pathways. Significantly enriched pathways from common to all 20 hidden nodes that are not found in the Cox-PH Gene Set Enrichment Analysis (Adjusted P < 0.05). Pathway P.value P.adjusted Nodes KEGG adherens junction 0.000 0.001 1-20 KEGG endocytosis 0.000 0.001 1-20 KEGG insulin signaling pathway 0.000 0.001 1-20 KEGG lysine degradation 0.000 0.003 1-20 KEGG p53 signaling pathway 0.000 0.003 1-20 KEGG pyruvate metabolism 0.000 0.001 1-20 KEGG sphingolipid metabolism 0.001 0.005 1-20 https://doi.org/10.1371/journal.pcbi.1006076.t001 PLOS Computational Biology | https://doi.org/10.1371/journal.pcbi.1006076 April 10, 2018 7 / 18 Artificial neural network for prognosis Fig 4. Single variable C-IPCW scores of the leading edge genes from Cox-nnet and Cox-PH. The leading edge genes are obtained using Gene-Set Enrichment Analysis, and they are genes contributing positively to the maximum value of the pathway enrichment score. Cox-nnet has significantly higher C-IPCW scores (p = 1.253e-05). https://doi.org/10.1371/journal.pcbi.1006076.g004 Evaluation of gene input relative to survival in Cox-nnet To further examine the importance of each gene relative to the survival outcome, we calculated the average partial derivative of the output of the model (i.e., the log hazard ratio) with respect to each input gene value across all patients. As demonstrated by the leading edge genes in seven common pathways of all nodes in Cox-nnet, the feature importance scores from Cox- nnet appear to be more biologically insightful compared to the feature importance values from the Cox-PH model (S9 Fig). For example, the feature importance for the BAI1 gene in the p53 pathway is much higher in the Cox-nnet model compared to the Cox-PH model. Correspond- ing to our finding, the BAI gene family was found to be involved in several types of cancers including renal cancer . BAI1 acts as an inhibitor to angiogenesis and is transcriptionally regulated by p53 [33±36]. Its expression level was significantly decreased in tumor vs. normal kidney tissue, and was even lower in advanced stage renal carcinoma. Mice kidney cancer PLOS Computational Biology | https://doi.org/10.1371/journal.pcbi.1006076 April 10, 2018 8 / 18 Artificial neural network for prognosis Fig 5. Enriched pathway-gene bipartite network from the leading edge genes and significantly enriched pathways. Significantly enriched pathways common to all 20 hidden nodes are labeled in green. Leading edge genes found uniquely in Cox-nnet are labeled in orange, and genes found in both Cox-nnet and Cox-PH are labeled in blue. https://doi.org/10.1371/journal.pcbi.1006076.g005 models treated with BAI1 showed slower tumor growth and proliferation . Additionally, the MAPK1 gene (also known as ERK2), annotated in two pathways identified by Cox-nnet (the Adherens Junction and Insulin Signaling pathway), has a much higher feature importance score in Cox-nnet compared to Cox-PH. MAPK1 is one of the key kinases in intra-cellular transduction, and was found constitutively activated in renal cell carcinoma . Drugs inhib- iting the MAPK cascade have been targeted for development. We list the top 20 genes from each method in S4 Table. Discussion We have implemented Cox-nnet, a new ANN method, to predict patient survival from high throughput omics data. Cox-nnet is an alternative to the standard Cox-PH regression, enabling automatic discovery of biological features at both the pathway and gene levels. The hidden nodes in the Cox-nnet model have distinct activation patterns, and can serve as surro- gate features for survival-sensitive dimension reduction. More significantly enriched KEGG pathways that correlate with top nodes in Cox-nnet are identified, as compared to those from PLOS Computational Biology | https://doi.org/10.1371/journal.pcbi.1006076 April 10, 2018 9 / 18 Artificial neural network for prognosis the Cox-PH model, suggesting that Cox-nnet reveals more relevant biological information. We show how a critical pathway for renal cancer development, p53 pathway is identified only by cox-nnet but not Cox-PH model in TCGA KIRC dataset. Other pathways, including insulin signaling pathway, endocytosis and adherens junction, have many more genes enriched by Cox-nnet. Moreover, leading edge genes obtained from these KEGG pathways identified as enriched by Cox-nnet (which are a fraction of the gene features considered by the model) have collectively higher associations with survival. Enrichment analysis on the top genes from Ran- dom Forest and CoxBoost did not produce any significant pathways. As a promising new pre- dictive method for prognosis, the current Cox-nnet implementation has some limitations. Its architecture is relatively simple, including one or two hidden layers and an output Cox regres- sion layer. It is possible to incorporate other more sophisticated architecture into the model, such as including more layers of neurons or more sophisticated hidden layers. However, deeper ANN is not necessarily more beneficial (S3 Fig), when compared to the regularization methods. This suggests that ANN may overfit the small size of the genomics data tested. New variations of neural networks, such as convolutional neural network approach or a recurrent network approach as those reported showed good performance in processing imaging or other types of positional data , and they could be used as input to a proportional hazards output layer. Additionally, it is possible to embed a priori biological pathway information into the net- work architecture, e.g., by connecting genes in a pathway to a common node in the next hid- den layer of neurons . In the future, we plan to further analyze how different neural network architectures affect the performance of Cox-nnet and compare the biological insights from the various models. Methods Datasets We analyzed 10 TCGA datasets which were combined into a pan-cancer dataset. The TCGA datasets included the following cancer types: Bladder Urothelial Carcinoma (BLCA), Breast invasive carcinoma (BRCA), Head and Neck squamous cell carcinoma (HNSC), Kidney renal clear cell carcinoma (KIRC), Brain Lower Grade Glioma (LGG), Liver hepatocellular carci- noma (LIHC), Lung adenocarcinoma (LUAD), Lung squamous cell carcinoma (LUSC), Ovar- ian serous cystadenocarcinoma (OV) and Stomach adenocarcinoma (STAD). RNA-Seq expression and clinical data were downloaded from the Broad Institute GDAC . Overall survival time and censoring information were extracted from the clinical follow-up data. Raw count data were normalized using the DESeq2 R package  and then log-transformed. Data- sets were selected from TCGA based on the following criteria: > 300 samples with both RNA- Seq and survival data and > 50 survival events. In total, 5031 patient samples were used (see S1 Table for a patient tabulation by individual dataset). Cox-PH, CoxBoost and Random Forest Survival (RF-S) models Cox-nnet is an extension to the Cox-PH model. Individual hazard, an instantaneous measure of the likelihood of an event, is estimated based on a set of features . The hazard function is: h tjx h t exp y 1 i 0 i y x β 2 i i PLOS Computational Biology | https://doi.org/10.1371/journal.pcbi.1006076 April 10, 2018 10 / 18 Artificial neural network for prognosis Where θ is the log hazard ratio for patient i. This model uses partial log-likelihood as the cost function P P pl β θ log exp θ 3 C i1 i t t j i j Since gene expression data have tens of thousands in initial features, penalization methods are usually implemented along with Cox-PH. We experimented with 3 penalization methods, namely LASSO (L1 norm), ridge (L2 norm), and mimimax concave penalty (MCP). MCP attempts to moderate the biased large penalty for large coefficients in LASSO  (S5 Fig). MCP reduces the regularization for large coefficients and plateaus at a value selected through cross-validation. LASSO and ridge regularization were performed using the Glmnet R package  and the MCP regularization was performed with the Ncvreg R package . CoxBoost, is an iterative ªgradient boostingº method modified from the Cox-PH model . In CoxBoost, parameters are separated into individual partitions, and the partition that leads to the largest improvement in the penalized partial log likelihood is selected for that itera- tion. In subsequent boosting iteration, the model selects another block and refits those param- eters by maximizing the penalized likelihood function. In this method, the number of boosting iterations is used as the complexity parameter in CoxBoost and optimized via cross-validation (CV). Random Forests Survival (RF-S) is a tree-based, non-linear, ensemble method , rather than a proportional hazards model. For each tree in the forest, samples are boostrapped, and at each node in a tree, features are boostrapped, and the node is split by selecting the feature that maximizes the log-rank statistic. At the leaf nodes, the cumulative hazard function (CHF) is estimated and a patient's CHF is calculated as an average over all the trees in the ensemble. Cox-nnet Cox-nnet is a neural network whose output layer is a cox regression. In a Cox-nnet model, x in Eq (2) is replaced by the output of the hidden layer, and the linear predictor is: y G Wx b β 4 i i Where W is the coefficient weight matrix between the input and hidden layer with the size H x J, b is the bias term for each hidden node and G is the activation function (applied ele- ment-wise on a vector). In this manuscript, the tanh activation function is used: exp z