Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

pROC: an open-source package for R and S+ to analyze and compare ROC curves

pROC: an open-source package for R and S+ to analyze and compare ROC curves Background: Receiver operating characteristic (ROC) curves are useful tools to evaluate classifiers in biomedical and bioinformatics applications. However, conclusions are often reached through inconsistent use or insufficient statistical analysis. To support researchers in their ROC curves analysis we developed pROC, a package for R and S+ that contains a set of tools displaying, analyzing, smoothing and comparing ROC curves in a user-friendly, object- oriented and flexible interface. Results: With data previously imported into the R or S+ environment, the pROC package builds ROC curves and includes functions for computing confidence intervals, statistical tests for comparing total or partial area under the curve or the operating points of different classifiers, and methods for smoothing ROC curves. Intermediary and final results are visualised in user-friendly interfaces. A case study based on published clinical and biomarker data shows how to perform a typical ROC analysis with pROC. Conclusions: pROC is a package for R and S+ specifically dedicated to ROC analysis. It proposes multiple statistical tests to compare ROC curves, and in particular partial areas under the curve, allowing proper ROC interpretation. pROC is available in two versions: in the R programming language or with a graphical user interface in the S+ statistical software. It is accessible at http://expasy.org/tools/pROC/ under the GNU General Public License. It is also distributed through the CRAN and CSAN public repositories, facilitating its installation. Background learning, evaluating biomarker performances or compar- A ROC plot displays the performance of a binary classi- ing scoring methods [2,4]. fication method with continuous or discrete ordinal out- In the ROC context, the area under the curve (AUC) put. It shows the sensitivity (the proportion of correctly measures the performance of a classifier and is fre- classified positive observations) and specificity (the pro- quently applied for method comparison. A higher AUC portion of correctly classified negative observations) as means a better classification. However, comparison the output threshold is moved over the range of all pos- between AUCs is often performed without a proper sta- sible values. ROC curves do not depend on class prob- tistical analysis partially due to the lack of relevant, abilities, facilitating their interpretation and comparison accessible and easy-to-use tools providing such tests. across different data sets. Originally invented for the Small differences in AUCs can be significant if ROC detection of radar signals, they were soon applied to curves are strongly correlated, and without statistical testing two AUCs can be incorrectly labelled as similar. psychology [1] and medical fields such as radiology [2]. They are now commonly used in medical decision mak- In contrast a larger difference can be non significant in ing, bioinformatics [3], data mining and machine small samples, as shown by Hanczar et al. [5], who also provide an analytical expression for the variance of AUC’sasa function ofthe samplesize. We recently * Correspondence: Xavier.Robin@unige.ch; markus.mueller@isb-sib.ch identified this lack of proper statistical comparison as a Biomedical Proteomics Research Group, Department of Structural Biology potential cause for the poor acceptance of biomarkers as and Bioinformatics, Medical University Centre, Geneva, Switzerland Swiss Institute of Bioinformatics, Medical University Centre, Geneva, diagnostic tools in medical applications [6]. Evaluating a Switzerland classifier by means of total AUC is not suitable when Full list of author information is available at the end of the article © 2011 Robin et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Robin et al. BMC Bioinformatics 2011, 12:77 Page 2 of 8 http://www.biomedcentral.com/1471-2105/12/77 the performance assessment only takes place in high between two ROC curves. Pcvsuite, ROCR and ROC can specificity or high sensitivity regions [6]. To account for compute AUC or pAUC, but the pAUC can only be these cases, the partial AUC (pAUC) was introduced as defined as a portion of specificity. a local comparative approach that focuses only on a The pROC package was designed in order to facilitate portion of the ROC curve [7-9]. ROC curve analysis and apply proper statistical tests for Software for ROC analysis already exists. A previous their comparison. It provides a consistent and user- review [10] compared eight ROC programs and found friendly set of functions building and plotting a ROC curve, several methods smoothing the curve, computing that there is a need for a tool performing valid and stan- the full or partial AUC over any range of specificity or dardized statistical tests with good data import and plot functions. sensitivity, as well as computing and visualizing various The R [11] and S+ (TIBCO Spotfire S+ 8.2, 2010, Palo CIs. It includes tests for the statistical comparison of two Alto, CA) statistical environments provide an extensible ROC curves as well as their AUCs and pAUCs. The soft- framework upon which software can be built. No ROC ware comes with an extensive documentation and relies tool is implemented in S+ yet while four R packages on the underlying R and S+ systems for data input and computing ROC curves are available: plots. Finally, a graphical user interface (GUI) was devel- 1) ROCR [12] provides tools computing the perfor- oped for S+ for users unfamiliar with programming. mance of predictions by means of precision/recall plots, lift charts, cost curves as well as ROC plots and AUCs. Implementation Confidence intervals (CI) are supported for ROC analy- AUC and pAUC sis but the user must supply the bootstrapped curves. In pROC, the ROC curves are empirical curves in the 2) The verification package [13] is not specifically sensitivity and specificity space. AUCs are computed aimed at ROC analysis; nonetheless it can plot ROC with trapezoids [4]. The method is extended for pAUCs curves, compute the AUC and smooth a ROC curve by ignoring trapezoids outside the partial range and with the binomial model. A Wilcoxon test for a single adding partial trapezoids with linear interpolation when ROC curve is also implemented, but no test comparing necessary. The pAUC region can be defined either as a two ROC curves is included. portion of specificity, as originally described by McClish 3) Bioconductor includes the ROC package [14] which [7], or as a portion of sensitivity, as proposed later by can only compute the AUC and plot the ROC curve. Jiang et al. [8]. Any section of the curve pAUC(t ,t ) 0 1 4) Pcvsuite [15] is an advanced package for ROC can be analyzed, and not only portions anchored at curves which features advanced functions such as cov- 100% specificity or 100% sensitivity. Optionally, pAUC ariate adjustment and ROC regression. It was originally can be standardized with the formula by McClish [7]: designed for Stata and ported to R. It is not available on the CRAN (comprehensive R archive network), but can 1 pAUC − min 1+ , (1) be downloaded for Windows and MacOS from http:// 2 max− min labs.fhcrc.org/pepe/dabs/rocbasic.html. Table 1 summarizes the differences between these where min is the pAUC over the same region of the packages. Only pcvsuite enables the statistical comparison diagonal ROC curve, and max is thepAUCoverthe Table 1 Features of the R packages for ROC anaylsis Package name ROCR Verification ROC (Bioconductor) pcvsuite pROC Smoothing No Yes No Yes Yes Partial AUC Only No Only SP Only SP SP and SE SP 2 3 4 Confidence intervals Partial Partial No Partial Yes Plotting Confidence Yes Yes No Yes Yes Intervals Statistical tests No AUC (one No AUC, pAUC, SP AUC, pAUC, SP, SE, sample) ROC Available on CRAN Yes Yes No, http://www.bioconductor. No, http://labs.fhcrc.org/pepe/ Yes org/ dabs/ Partial AUC only between 100% and a specified cutoff of specificity. Bootstrapped ROC curves must be computed by the user. Only threshold averaging. Only at a given specificity or inverse ROC. Robin et al. BMC Bioinformatics 2011, 12:77 Page 3 of 8 http://www.biomedcentral.com/1471-2105/12/77 same region of the perfect ROC curve. The result is a opposed to their respective AUCs. Their method evalu- standardized pAUC which is always 1 for a perfect ROC ates the integrated absolutedifferencebetweenthetwo curve and 0.5 for a non-discriminant ROC curve, what- ROC curves, and a permutation distribution is generated ever the partial region defined. to compute the statistical significance of this difference. As the measurements leading to the two ROC curves may be performed on different scales, they are not gen- Comparison erally exchangeable between two samples. Therefore, the Two ROC curves are “paired” (or sometimes termed permutations are based on ranks, and ranks are recom- “correlated” in the literature) if they derive from multi- puted as described in [20] to break the ties generated by ple measurements on the same sample. Several tests the permutation. existto compare paired[16-22]orunpaired[23]ROC Finally a test based on bootstrap is implemented to curves. The comparison can be based on AUC compare the ROC curve at a given level of specificity or [16-19,21], ROC shape [20,22,23], a given specificity [15] sensitivity as proposed by Pepe et al. [15]. It works or confidence bands [3,24]. Several tests are implemen- similar to the (p)AUC test, but instead of computing the ted in pROC. Three of them are implemented without (p)AUC at each iteration, the sensitivity (or specificity) modification from the literature [17,20,23], and the corresponding to the given specificity (or respectively others are based on the bootstrap percentile method. sensitivity) is computed. This test is equivalent to a The bootstrap test to compare AUC or pAUC in pAUC test with a very small pAUC range. pROC implements the method originally described by Hanley and McNeil [16]. They define Z as Confidence intervals θ − θ 1 2 CIs are computed with Delong’smethod[17]for AUCs Z = , (2) sd (θ − θ ) 1 2 and with bootstrap for pAUCs [26]. The CIs of the thresholds or the sensitivity and specificity values are where θ and θ are the two (partial) AUCs. Unlike Han- 1 2 computed with bootstrap resampling and the averaging ley and McNeil, we compute sd(θ - θ ) with N (defaults to 1 2 methods described by Fawcett [4]. In all bootstrap CIs, 2000) bootstrap replicates. In each replicate r, the original patients are resampled and the modified curve is built measurements are resampled with replacement; both new before the statistics of interest is computed. As in the ROC curves corresponding to this new sample are built, bootstrap comparison test, the resampling is done in a the resampled AUCs θ and θ and their difference D = 1,r 2,r r stratified manner by default. θ - θ are computed. Finally, we compute sd(θ - θ )= 1,r 2,r 1 2 sd(D). As Z approximately follows a normal distribution, Smoothing one or two-tailed p-values are calculated accordingly. This Several methods to smooth a ROC curve are also imple- bootstrap test is very flexible and can be applied to AUC, mented. Binormal smoothing relies on the assumption pAUC and smoothed ROC curves. that there exists a monotone transformation to make Bootstrap is stratified by default; in this case the same both case and control values normally distributed [2]. number of case and control observations than in the original Under this condition a simple linear relationship (Equa- sample will be selected in each bootstrap replicate. Stratifica- tion 4) holds between the normal quantile function () tion can be disabled and observations will be resampled values of sensitivities and specificities. In our implemen- regardless of their class labels. Repeats for the bootstrap and tation, a linear regression between all quantile values progress bars are handled by the plyr package [25]. defines a and b, which then define the smoothed curve. The second method to compare AUCs implemented in pROC was developed by DeLong et al. [17] based on −1 −1 (4) φ (SE)= a + bφ (SP) U-statistics theory and asymptotic normality. As this test does not require bootstrapping, it runs significantly This is different from the method described by Metz faster, but it cannot handle pAUC or smoothed ROC et al. [27] who use maximum likelihood estimation of a curves. For both tests, since the variance depends on the and b. Binormal smoothing was previously shown to be covariance of the ROC curves (Equation 3), strongly robust and to provide good fits in many situations even correlated ROC curves can have similar AUC values and when the deviation from basic assumptions is quite still be significantly different. strong [28]. For continuous data we also include meth- ods for kernel (density) smoothing [29], or to fit various var (θ − θ ) = var (θ ) + var (θ ) − 2 cov (θ , θ ) (3) 1 2 1 2 1 2 known distributions to the class densities with fitdistr in Venkatraman and Begg [20] and Venkatraman [23] the MASS package [30]. If a user would like to run a introduced tests to compare two actual ROC curves as custom smoothing algorithm that is optimized for the Robin et al. BMC Bioinformatics 2011, 12:77 Page 4 of 8 http://www.biomedcentral.com/1471-2105/12/77 analysed data, then pROC also accepts class densities or the customized smoothing function as input. CI and sta- tistical tests of smoothed AUCs are done with bootstrap. Results and Discussion We first evaluate the accuracy of the ROC comparison tests. Results in Additional File 1 show that all unpaired tests give uniform p-values under a null hypothesis (Addi- tional Files 1 and 2) and that there is a very good correla- tion between DeLong’s and bootstrap tests (Additional Files 1 and 3). The relation between Venkatraman’sand the other tests is also investigated (Additional Files 1 and 4). We now present how to perform a typical ROC analy- sis with pROC. In a recent study [31], we analyzed the level of several biomarkers in the blood of patients at hospital admission after aneurysmal subarachnoid hae- morrhage (aSAH) to predict the 6-month outcome. The 141 patients collected were classified according to their outcome with a standard neurological scale, the Glasgow outcome scale (GOS). The biomarker performances Figure 1 ROC curves of WFNS and S100b.ROC curves of WFNS were compared with the well established neurological (blue) and S100b (green). The black bars are the confidence scale of the World Federation of Neurological Surgeons intervals of WFNS for the threshold 4.5 and the light green area is (WFNS), also obtained at admission. the confidence interval shape of S100b. The vertical light grey shape corresponds to the pAUC region. The pAUC of both empirical Case study on clinical aSAH data curves is printed in the middle of the plot, with the p-value of the difference computed by a bootstrap test on the right. Thepurposeof thecasepresented hereis to identify patients at risk of poor post-aSAH outcome, as they require specific healthcare management; therefore the In the rest of this paper, we report only not standar- clinical test must be highly specific. Detailed results of dized pAUCs. the study are reported in [31]. We only outline the fea- CI tures relevant to the ROC analysis. Given the pAUC of WFNS, it makes sense to compute a ROC curves were generated in pROC for five biomar- 95% CI of the pAUC to assess the variability of the mea- kers (H-FABP, S100b, Troponin I, NKDA and UFD-1) sure. In this case, we performed 10000 bootstrap repli- and three clinical factors (WFNS, Modified Fisher score cates and obtained the 1.6-5.0% interval. In our and age). experience, 10000 replicates give a fair estimate of the AUC and pAUC second significant digit. A lower number of replicates Since we are interested in a clinical test with a high spe- (for example 2000, the default) gives a good estimate of cificity, we focused on partial AUC between 90% and the first significant digit only. Other confidence intervals 100% specificity. can be computed. The threshold with the point farthest The best pAUC is obtained by WFNS, with 3.1%, clo- to the diagonal line in the specified region was deter- sely followed by S100b with 3.0% (Figure 1). A perfect mined with pROC to be 4.5 with the coords function. A clinical test within the same region corresponds to a rectangular confidence interval can be computed and pAUC of 10%, while a ROC curve without any discrimi- the bounds are 89.0-98.9 in specificity and 26.0-54.0 in nation power would yield only 0.5%. In the case of sensitivity (Figure 1). If the variability of sensitivity at WFNS, we computed a standardized pAUC of 63.7% 90% specificity is considered more relevant than at a with McClish’s formula (Equation 1). Of these 63.9%, specific threshold, the interval of sensitivity is computed 50% are due to the small portion (0.5% non-standardized) as 32.8-68.8. As shown in Figure 1 for S100b,a CI of the ROC curve below the identity line, and the remain- shape can be obtained by simply computing the CI’sof ing 13.9% are made of the larger part (2.6% non-standar- the sensitivities over several constantly spaced levels of dized) above the curve. In the R version of pROC,the specificity, and these CI bounds are then joined to gen- standardized pAUC of WFNS can be computed with: erate the shape. The following R code calculates the roc(response = aSAH$outcome, predictor = confidence shape: aSAH$wfns, partial.auc = c(100, 90), par- plot(x = roc(response = aSAH$outcome, tial.auc.correct = TRUE, percent = TRUE) predictor = aSAH$s100, percent = TRUE, ci = Robin et al. BMC Bioinformatics 2011, 12:77 Page 5 of 8 http://www.biomedcentral.com/1471-2105/12/77 TRUE, of = “se”, sp = seq(0, 100, 5)), ci. type="shape”) The confidence intervals of a threshold or of a prede- fined level of sensitivity or specificity answer different questions. For instance, it would be wrong to compute the CI of the threshold 4.5 and report only the CI bound of sensitivity without reporting the CI bound of specificity as well. Similarly, determining the sensitivity and specificity of the cut-off 4.5 and then computing both CIs separately would also be inaccurate. Statistical comparison The second best pAUC is that of S100b with 3.0%. The difference to WFNS is very small and the bootstrap test of pROC indicates that it is not significant (p = 0.8, Fig- ure 1). Surprisingly, a Venkatraman’s test (over the total ROC curve) indicates a difference in the shape of the ROCcurves(p= 0.004),andindeedatestevaluating pAUCs in the high sensitivity region (90-100% sensitiv- ity) would highlight a significant difference (p = 0.005, pAUC = 4.3 and 1.4 for WFNS and S100b respectively). Figure 2 ROC curve of WFNS and smoothing. Empirical ROC However, since we are not interested in the high sensi- curve of WFNS is shown in grey with three smoothing methods: tivity region of the AUC there is no significant differ- binormal (blue), density (green) and normal distribution fit (red). ence between WFNS and S100b. In pROC pairwise comparison of ROC curves is implemented. Multiple testing is not accounted for and (iii) The binormal smoothing (blue) gives a slightly in the event of running several tests, the user is but not significantly higher AUC than the empirical reminded that as with any statistical test, multiple tests ROC curve (Δ = +2.4, p = 0.3). It is probably the best should be performed with care, and if necessary appro- of the 3 smoothing estimates in this case (as mentioned priate corrections should be applied [32]. earlier we were expecting a higher AUC as the empiri- The bootstrap test can be performed with the follow- cal AUC of WFNS was underestimated). For compari- ing code in R: son, Additional File 5 displays both our implementation roc.test(response = aSAH$outcome, predic- of binormal smoothing with the one implemented in tor1 = aSAH$wfns, predictor2 = aSAH$s100, pcvsuite [15]. partial.auc = c(100, 90), percent = TRUE) Figure 3 shows how to create a plot with multiple Smoothing smoothed curves with pROC in S+. One loads the Whether or not to smooth a ROC curve is a difficult pROC library within S+, selects the new ROC curve choice. It can be useful in ROC curves with only few item in the Statistics menu, selects the data on which points, in which the trapezoidal rule consistently under- the analysis is to be performed, and then moves to the estimates the true AUC [17]. This is the case with most Smoothing tab to set parameters for smoothing. clinical scores, such as the WFNS shown in Figure 2 Conclusion where three smoothing methods available in pROC are In this case studyweshowedhow pROC could be run plotted: (i) normal distribution fitting, (ii) density and for ROC analysis. The main conclusion drawn from this (iii) binormal. In our case study: analysis is that none of the measured biomarkers can (i) The normal fitting (red) gives a significantly lower predict the patient outcome better than the neurological AUC estimate (Δ = -5.1, p = 0.0006, Bootstrap test). score (WFNS). This difference is due to the non-normality of WFNS. Distribution fitting can be very powerful when there is a Installation and usage clear knowledge of the underlying distributions, but should be avoided in other contexts. pROC can be installed in R by issuing the following (ii) The density (green) smoothing also produces a command in the prompt: -7 lower (Δ = -1.5, p = 6*10 ) AUC. It is interesting to note install.packages("pROC”) that even with a smaller difference in AUCs, the p-value Loading the package: can be more significant due to a higher covariance. library(pROC) Robin et al. BMC Bioinformatics 2011, 12:77 Page 6 of 8 http://www.biomedcentral.com/1471-2105/12/77 Figure 3 Screenshot of pROC in S+ for smoothing WFNS ROC curve. Top left: the General tab, where data is entered. Top right: the details about smoothing. Bottom left: the details for the plot. Checking the box “Add to existing plot” allows drawing several curves on a plot. Bottom right: the result in the standard S+ plot device. Getting help: Functions and methods ?pROC A summary of the functions available to the user in the S+ command line version of pROC is shown in Table 2. pROC is available from the File menu, item Find Table 3 shows the list of the methods provided for plot- Packages.... It can be loaded from the File menu, item ting and printing. Load Library.... Conclusions In addition to the command line functions, a GUI is The pROC package is a powerful set of tools analyzing then available in the Statistics menu. It features one and comparing ROC curves in R and S+. Unlike existing window for univariate ROC curves (which contains packages such as ROCR or verification, it is solely dedi- options for smoothing, pAUC, CIs and plotting) and two windows for paired and unpaired tests of two ROC cated to ROC analysis, but provides in our knowledge curves. In addition a specific help file for the GUI is the most complete set of statistical tests and plots for available from the same menu. ROC curves. As shown in the case study reported here, Robin et al. BMC Bioinformatics 2011, 12:77 Page 7 of 8 http://www.biomedcentral.com/1471-2105/12/77 Table 2 Functions provided in pROC DeLong’s paired test, B: DeLong’s unpaired test, C: bootstrap paired test are. Determines if two ROC curves are possibly paired (with 10000 replicates), D: bootstrap unpaired test (with 10000 replicates) paired and E: Venkatraman’s test (with 10000 permutations). Additional file 3: Correlations between DeLong and bootstrap auc Computes the area under the ROC curve paired tests. X axis: DeLong’s test; Y-axis: bootstrap test with number of ci Computes the confidence interval of a ROC curve bootstrap replicates. A: 10, B: 100, C: 1000 and D: 10000. ci.auc Computes the confidence interval of the AUC Additional file 4: Correlation between DeLong and Venkatraman’s test. X axis: DeLong’s test; Y-axis: Venkatraman’s test with 10000 ci.se Computes the confidence interval of sensitivities at given permutations. specificities Additional file 5: Binormal smoothing. Binormal smoothing with ci.sp Computes the confidence interval of specificities at given pcvsuite (green, solid) and pROC (black, dashed). sensitivities ci. Computes the confidence interval of thresholds thresholds coords Returns the coordinates (sensitivities, specificities, List of abbreviations thresholds) of a ROC curve aSAH: aneurysmal subarachnoid haemorrhage; AUC: area under the curve; CI: confidence interval; CRAN: comprehensive R archive network; CSAN: roc Builds a ROC curve comprehensive S-PLUS archive network; pAUC: partial area under the curve; roc.test Compares the AUC of two correlated ROC curves ROC: receiver operating characteristic. smooth Smoothes a ROC curve Acknowledgements The authors would like to thank E. S. Venkatraman and Colin B. Begg for their support in the implementation of their test. Table 3 Methods provided by pROC for standard This work was supported by Proteome Science Plc. functions lines ROC curves (roc) and smoothed ROC curves (smooth.roc) Author details Biomedical Proteomics Research Group, Department of Structural Biology plot ROC curves (roc), smoothed ROC curves (smooth.roc) and and Bioinformatics, Medical University Centre, Geneva, Switzerland. Swiss confidence intervals (ci.se, ci.sp, ci.thresholds) Institute of Bioinformatics, Medical University Centre, Geneva, Switzerland. print All pROC objects (auc, ci.auc, ci.se, ci.sp, ci.thresholds, roc, smooth.roc) Authors’ contributions XR carried out the programming and software design and drafted the manuscript. NTu, AH, NTi provided data and biological knowledge, tested pROC features the computation of AUC and pAUC, var- and critically reviewed the software and the manuscript. FL helped to draft ious kinds of confidence intervals, several smoothing and to critically improve the manuscript. JCS conceived the biomarker study, participated in its design and coordination, and helped to draft the methods, and the comparison of two paired or unpaired manuscript. MM participated in the design and coordination of the ROC curves. We believe that pROC should provide bioinformatics part of the study, participated in the programming and researchers, especially in the biomarker community, software design and helped to draft the manuscript. All authors read and approved the final manuscript. with the necessary tools to better interpret their results in biomarker classification studies. Received: 10 September 2010 Accepted: 17 March 2011 pROC is available in two versions for R and S+. A thor- Published: 17 March 2011 ough documentation with numerous examples is provided References in the standard R format. For users unfamiliar with pro- 1. Swets JA: The Relative Operating Characteristic in Psychology. Science gramming, a graphical user interface is provided for S+. 1973, 182:990-1000. 2. Pepe MS: The statistical evaluation of medical tests for classification and prediction Oxford: Oxford University Press; 2003. Availability and requirements 3. Sonego P, Kocsor A, Pongor S: ROC analysis: applications to the � Project name: pROC classification of biological sequences and 3D structures. Brief Bioinform � Project home page: http://expasy.org/tools/pROC/ 2008, 9:198-209. 4. Fawcett T: An introduction to ROC analysis. Pattern Recogn Lett 2006, � Operating system(s): Platform independent 27:861-874. � Programming language: R and S+ 5. Hanczar B, Hua J, Sima C, Weinstein J, Bittner M, Dougherty ER: Small- � Other requirements: R ≥ 2.10.0 or S+ ≥ 8.1.1 sample precision of ROC-related estimates. Bioinformatics 2010, 26:822-830. � License: GNU GPL 6. Robin X, Turck N, Hainard A, Lisacek F, Sanchez JC, Müller M: Bioinformatics � Any restrictions to use by non-academics: none for protein biomarker panel classification: What is needed to bring biomarker panels into in vitro diagnostics? Expert Rev Proteomics 2009, 6:675-689. Additional material 7. McClish DK: Analyzing a Portion of the ROC Curve. Med Decis Making 1989, 9:190-195. Additional file 1: Assessment of the ROC comparison tests.We 8. Jiang Y, Metz CE, Nishikawa RM: A receiver operating characteristic partial evaluate the uniformity of the tests under the null hypothesis (ROC area index for highly sensitive diagnostic tests. Radiology 1996, curves are not different), and the correlation between the different tests. 201:745-750. 9. Streiner DL, Cairney J: What’s under the ROC? An introduction to receiver Additional file 2: Histograms of the frequency of 600 test p-values operating characteristics curves. Canadian Journal of Psychiatry Revue under the null hypothesis (ROC curves are not different).A: Canadienne De Psychiatrie 2007, 52:121-128. Robin et al. BMC Bioinformatics 2011, 12:77 Page 8 of 8 http://www.biomedcentral.com/1471-2105/12/77 10. Stephan C, Wesseling S, Schink T, Jung K: Comparison of Eight Computer Programs for Receiver-Operating Characteristic Analysis. Clin Chem 2003, 49:433-439. 11. R Development Core Team: R: A Language and Environment for Statistical Computing Vienna, Austria: R Foundation for Statistical Computing; 2010. 12. Sing T, Sander O, Beerenwinkel N, Lengauer T: ROCR: visualizing classifier performance in R. Bioinformatics 2005, 21:3940-3941. 13. NCAR: verification: Forecast verification utilities v. 1.31.[http://CRAN.R- project.org/package=verification]. 14. Carey V, Redestig H: ROC: utilities for ROC, with uarray focus, v. 1.24.0. [http://www.bioconductor.org]. 15. Pepe M, Longton G, Janes H: Estimation and Comparison of Receiver Operating Characteristic Curves. The Stata journal 2009, 9:1. 16. Hanley JA, McNeil BJ: A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 1983, 148:839-843. 17. DeLong ER, DeLong DM, Clarke-Pearson DL: Comparing the Areas under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach. Biometrics 1988, 44:837-845. 18. Bandos AI, Rockette HE, Gur D: A permutation test sensitive to differences in areas for comparing ROC curves from a paired design. Stat Med 2005, 24:2873-2893. 19. Braun TM, Alonzo TA: A modified sign test for comparing paired ROC curves. Biostat 2008, 9:364-372. 20. Venkatraman ES, Begg CB: A distribution-free procedure for comparing receiver operating characteristic curves from a paired experiment. Biometrika 1996, 83:835-848. 21. Bandos AI, Rockette HE, Gur D: A Permutation Test for Comparing ROC Curves in Multireader Studies: A Multi-reader ROC, Permutation Test. Acad Radiol 2006, 13:414-420. 22. Moise A, Clement B, Raissis M: A test for crossing receiver operating characteristic (roc) curves. Communications in Statistics - Theory and Methods 1988, 17:1985-2003. 23. Venkatraman ES: A Permutation Test to Compare Receiver Operating Characteristic Curves. Biometrics 2000, 56:1134-1138. 24. Campbell G: Advances in statistical methodology for the evaluation of diagnostic and laboratory tests. Stat Med 1994, 13:499-508. 25. Wickham H: plyr: Tools for splitting, applying and combining data v. 1.4. [http://CRAN.R-project.org/package=plyr]. 26. Carpenter J, Bithell J: Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. Stat Med 2000, 19:1141-1164. 27. Metz CE, Herman BA, Shen JH: Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. Stat Med 1998, 17:1033-1053. 28. Hanley JA: The robustness of the “binormal” assumptions used in fitting ROC curves. Med Decis Making 1988, 8:197-203. 29. Zou KH, Hall WJ, Shapiro DE: Smooth non-parametric receiver operating characteristic (ROC) curves for continuous diagnostic tests. Stat Med 1997, 16:2143-2156. 30. Venables WN, Ripley BD: Modern Applied Statistics with S. Fourth edition. New York: Springer; 2002. 31. Turck N, Vutskits L, Sanchez-Pena P, Robin X, Hainard A, Gex-Fabry M, Fouda C, Bassem H, Mueller M, Lisacek F, et al: A multiparameter panel method for outcome prediction following aneurysmal subarachnoid hemorrhage. Intensive Care Med 2010, 36:107-115. 32. Ewens WJ, Grant GR: Statistics (i): An Introduction to Statistical Inference. Statistical methods in bioinformatics New York: Springer-Verlag; 2005. doi:10.1186/1471-2105-12-77 Submit your next manuscript to BioMed Central Cite this article as: Robin et al.: pROC: an open-source package for R and take full advantage of: and S+ to analyze and compare ROC curves. BMC Bioinformatics 2011 12:77. • Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png BMC Bioinformatics Springer Journals

pROC: an open-source package for R and S+ to analyze and compare ROC curves

Loading next page...
 
/lp/springer-journals/proc-an-open-source-package-for-r-and-s-to-analyze-and-compare-roc-Rxr3r56sUD

References (67)

Publisher
Springer Journals
Copyright
Copyright © 2011 by Robin et al; licensee BioMed Central Ltd.
Subject
Life Sciences; Bioinformatics; Microarrays; Computational Biology/Bioinformatics; Computer Appl. in Life Sciences; Combinatorial Libraries; Algorithms
eISSN
1471-2105
DOI
10.1186/1471-2105-12-77
pmid
21414208
Publisher site
See Article on Publisher Site

Abstract

Background: Receiver operating characteristic (ROC) curves are useful tools to evaluate classifiers in biomedical and bioinformatics applications. However, conclusions are often reached through inconsistent use or insufficient statistical analysis. To support researchers in their ROC curves analysis we developed pROC, a package for R and S+ that contains a set of tools displaying, analyzing, smoothing and comparing ROC curves in a user-friendly, object- oriented and flexible interface. Results: With data previously imported into the R or S+ environment, the pROC package builds ROC curves and includes functions for computing confidence intervals, statistical tests for comparing total or partial area under the curve or the operating points of different classifiers, and methods for smoothing ROC curves. Intermediary and final results are visualised in user-friendly interfaces. A case study based on published clinical and biomarker data shows how to perform a typical ROC analysis with pROC. Conclusions: pROC is a package for R and S+ specifically dedicated to ROC analysis. It proposes multiple statistical tests to compare ROC curves, and in particular partial areas under the curve, allowing proper ROC interpretation. pROC is available in two versions: in the R programming language or with a graphical user interface in the S+ statistical software. It is accessible at http://expasy.org/tools/pROC/ under the GNU General Public License. It is also distributed through the CRAN and CSAN public repositories, facilitating its installation. Background learning, evaluating biomarker performances or compar- A ROC plot displays the performance of a binary classi- ing scoring methods [2,4]. fication method with continuous or discrete ordinal out- In the ROC context, the area under the curve (AUC) put. It shows the sensitivity (the proportion of correctly measures the performance of a classifier and is fre- classified positive observations) and specificity (the pro- quently applied for method comparison. A higher AUC portion of correctly classified negative observations) as means a better classification. However, comparison the output threshold is moved over the range of all pos- between AUCs is often performed without a proper sta- sible values. ROC curves do not depend on class prob- tistical analysis partially due to the lack of relevant, abilities, facilitating their interpretation and comparison accessible and easy-to-use tools providing such tests. across different data sets. Originally invented for the Small differences in AUCs can be significant if ROC detection of radar signals, they were soon applied to curves are strongly correlated, and without statistical testing two AUCs can be incorrectly labelled as similar. psychology [1] and medical fields such as radiology [2]. They are now commonly used in medical decision mak- In contrast a larger difference can be non significant in ing, bioinformatics [3], data mining and machine small samples, as shown by Hanczar et al. [5], who also provide an analytical expression for the variance of AUC’sasa function ofthe samplesize. We recently * Correspondence: Xavier.Robin@unige.ch; markus.mueller@isb-sib.ch identified this lack of proper statistical comparison as a Biomedical Proteomics Research Group, Department of Structural Biology potential cause for the poor acceptance of biomarkers as and Bioinformatics, Medical University Centre, Geneva, Switzerland Swiss Institute of Bioinformatics, Medical University Centre, Geneva, diagnostic tools in medical applications [6]. Evaluating a Switzerland classifier by means of total AUC is not suitable when Full list of author information is available at the end of the article © 2011 Robin et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Robin et al. BMC Bioinformatics 2011, 12:77 Page 2 of 8 http://www.biomedcentral.com/1471-2105/12/77 the performance assessment only takes place in high between two ROC curves. Pcvsuite, ROCR and ROC can specificity or high sensitivity regions [6]. To account for compute AUC or pAUC, but the pAUC can only be these cases, the partial AUC (pAUC) was introduced as defined as a portion of specificity. a local comparative approach that focuses only on a The pROC package was designed in order to facilitate portion of the ROC curve [7-9]. ROC curve analysis and apply proper statistical tests for Software for ROC analysis already exists. A previous their comparison. It provides a consistent and user- review [10] compared eight ROC programs and found friendly set of functions building and plotting a ROC curve, several methods smoothing the curve, computing that there is a need for a tool performing valid and stan- the full or partial AUC over any range of specificity or dardized statistical tests with good data import and plot functions. sensitivity, as well as computing and visualizing various The R [11] and S+ (TIBCO Spotfire S+ 8.2, 2010, Palo CIs. It includes tests for the statistical comparison of two Alto, CA) statistical environments provide an extensible ROC curves as well as their AUCs and pAUCs. The soft- framework upon which software can be built. No ROC ware comes with an extensive documentation and relies tool is implemented in S+ yet while four R packages on the underlying R and S+ systems for data input and computing ROC curves are available: plots. Finally, a graphical user interface (GUI) was devel- 1) ROCR [12] provides tools computing the perfor- oped for S+ for users unfamiliar with programming. mance of predictions by means of precision/recall plots, lift charts, cost curves as well as ROC plots and AUCs. Implementation Confidence intervals (CI) are supported for ROC analy- AUC and pAUC sis but the user must supply the bootstrapped curves. In pROC, the ROC curves are empirical curves in the 2) The verification package [13] is not specifically sensitivity and specificity space. AUCs are computed aimed at ROC analysis; nonetheless it can plot ROC with trapezoids [4]. The method is extended for pAUCs curves, compute the AUC and smooth a ROC curve by ignoring trapezoids outside the partial range and with the binomial model. A Wilcoxon test for a single adding partial trapezoids with linear interpolation when ROC curve is also implemented, but no test comparing necessary. The pAUC region can be defined either as a two ROC curves is included. portion of specificity, as originally described by McClish 3) Bioconductor includes the ROC package [14] which [7], or as a portion of sensitivity, as proposed later by can only compute the AUC and plot the ROC curve. Jiang et al. [8]. Any section of the curve pAUC(t ,t ) 0 1 4) Pcvsuite [15] is an advanced package for ROC can be analyzed, and not only portions anchored at curves which features advanced functions such as cov- 100% specificity or 100% sensitivity. Optionally, pAUC ariate adjustment and ROC regression. It was originally can be standardized with the formula by McClish [7]: designed for Stata and ported to R. It is not available on the CRAN (comprehensive R archive network), but can 1 pAUC − min 1+ , (1) be downloaded for Windows and MacOS from http:// 2 max− min labs.fhcrc.org/pepe/dabs/rocbasic.html. Table 1 summarizes the differences between these where min is the pAUC over the same region of the packages. Only pcvsuite enables the statistical comparison diagonal ROC curve, and max is thepAUCoverthe Table 1 Features of the R packages for ROC anaylsis Package name ROCR Verification ROC (Bioconductor) pcvsuite pROC Smoothing No Yes No Yes Yes Partial AUC Only No Only SP Only SP SP and SE SP 2 3 4 Confidence intervals Partial Partial No Partial Yes Plotting Confidence Yes Yes No Yes Yes Intervals Statistical tests No AUC (one No AUC, pAUC, SP AUC, pAUC, SP, SE, sample) ROC Available on CRAN Yes Yes No, http://www.bioconductor. No, http://labs.fhcrc.org/pepe/ Yes org/ dabs/ Partial AUC only between 100% and a specified cutoff of specificity. Bootstrapped ROC curves must be computed by the user. Only threshold averaging. Only at a given specificity or inverse ROC. Robin et al. BMC Bioinformatics 2011, 12:77 Page 3 of 8 http://www.biomedcentral.com/1471-2105/12/77 same region of the perfect ROC curve. The result is a opposed to their respective AUCs. Their method evalu- standardized pAUC which is always 1 for a perfect ROC ates the integrated absolutedifferencebetweenthetwo curve and 0.5 for a non-discriminant ROC curve, what- ROC curves, and a permutation distribution is generated ever the partial region defined. to compute the statistical significance of this difference. As the measurements leading to the two ROC curves may be performed on different scales, they are not gen- Comparison erally exchangeable between two samples. Therefore, the Two ROC curves are “paired” (or sometimes termed permutations are based on ranks, and ranks are recom- “correlated” in the literature) if they derive from multi- puted as described in [20] to break the ties generated by ple measurements on the same sample. Several tests the permutation. existto compare paired[16-22]orunpaired[23]ROC Finally a test based on bootstrap is implemented to curves. The comparison can be based on AUC compare the ROC curve at a given level of specificity or [16-19,21], ROC shape [20,22,23], a given specificity [15] sensitivity as proposed by Pepe et al. [15]. It works or confidence bands [3,24]. Several tests are implemen- similar to the (p)AUC test, but instead of computing the ted in pROC. Three of them are implemented without (p)AUC at each iteration, the sensitivity (or specificity) modification from the literature [17,20,23], and the corresponding to the given specificity (or respectively others are based on the bootstrap percentile method. sensitivity) is computed. This test is equivalent to a The bootstrap test to compare AUC or pAUC in pAUC test with a very small pAUC range. pROC implements the method originally described by Hanley and McNeil [16]. They define Z as Confidence intervals θ − θ 1 2 CIs are computed with Delong’smethod[17]for AUCs Z = , (2) sd (θ − θ ) 1 2 and with bootstrap for pAUCs [26]. The CIs of the thresholds or the sensitivity and specificity values are where θ and θ are the two (partial) AUCs. Unlike Han- 1 2 computed with bootstrap resampling and the averaging ley and McNeil, we compute sd(θ - θ ) with N (defaults to 1 2 methods described by Fawcett [4]. In all bootstrap CIs, 2000) bootstrap replicates. In each replicate r, the original patients are resampled and the modified curve is built measurements are resampled with replacement; both new before the statistics of interest is computed. As in the ROC curves corresponding to this new sample are built, bootstrap comparison test, the resampling is done in a the resampled AUCs θ and θ and their difference D = 1,r 2,r r stratified manner by default. θ - θ are computed. Finally, we compute sd(θ - θ )= 1,r 2,r 1 2 sd(D). As Z approximately follows a normal distribution, Smoothing one or two-tailed p-values are calculated accordingly. This Several methods to smooth a ROC curve are also imple- bootstrap test is very flexible and can be applied to AUC, mented. Binormal smoothing relies on the assumption pAUC and smoothed ROC curves. that there exists a monotone transformation to make Bootstrap is stratified by default; in this case the same both case and control values normally distributed [2]. number of case and control observations than in the original Under this condition a simple linear relationship (Equa- sample will be selected in each bootstrap replicate. Stratifica- tion 4) holds between the normal quantile function () tion can be disabled and observations will be resampled values of sensitivities and specificities. In our implemen- regardless of their class labels. Repeats for the bootstrap and tation, a linear regression between all quantile values progress bars are handled by the plyr package [25]. defines a and b, which then define the smoothed curve. The second method to compare AUCs implemented in pROC was developed by DeLong et al. [17] based on −1 −1 (4) φ (SE)= a + bφ (SP) U-statistics theory and asymptotic normality. As this test does not require bootstrapping, it runs significantly This is different from the method described by Metz faster, but it cannot handle pAUC or smoothed ROC et al. [27] who use maximum likelihood estimation of a curves. For both tests, since the variance depends on the and b. Binormal smoothing was previously shown to be covariance of the ROC curves (Equation 3), strongly robust and to provide good fits in many situations even correlated ROC curves can have similar AUC values and when the deviation from basic assumptions is quite still be significantly different. strong [28]. For continuous data we also include meth- ods for kernel (density) smoothing [29], or to fit various var (θ − θ ) = var (θ ) + var (θ ) − 2 cov (θ , θ ) (3) 1 2 1 2 1 2 known distributions to the class densities with fitdistr in Venkatraman and Begg [20] and Venkatraman [23] the MASS package [30]. If a user would like to run a introduced tests to compare two actual ROC curves as custom smoothing algorithm that is optimized for the Robin et al. BMC Bioinformatics 2011, 12:77 Page 4 of 8 http://www.biomedcentral.com/1471-2105/12/77 analysed data, then pROC also accepts class densities or the customized smoothing function as input. CI and sta- tistical tests of smoothed AUCs are done with bootstrap. Results and Discussion We first evaluate the accuracy of the ROC comparison tests. Results in Additional File 1 show that all unpaired tests give uniform p-values under a null hypothesis (Addi- tional Files 1 and 2) and that there is a very good correla- tion between DeLong’s and bootstrap tests (Additional Files 1 and 3). The relation between Venkatraman’sand the other tests is also investigated (Additional Files 1 and 4). We now present how to perform a typical ROC analy- sis with pROC. In a recent study [31], we analyzed the level of several biomarkers in the blood of patients at hospital admission after aneurysmal subarachnoid hae- morrhage (aSAH) to predict the 6-month outcome. The 141 patients collected were classified according to their outcome with a standard neurological scale, the Glasgow outcome scale (GOS). The biomarker performances Figure 1 ROC curves of WFNS and S100b.ROC curves of WFNS were compared with the well established neurological (blue) and S100b (green). The black bars are the confidence scale of the World Federation of Neurological Surgeons intervals of WFNS for the threshold 4.5 and the light green area is (WFNS), also obtained at admission. the confidence interval shape of S100b. The vertical light grey shape corresponds to the pAUC region. The pAUC of both empirical Case study on clinical aSAH data curves is printed in the middle of the plot, with the p-value of the difference computed by a bootstrap test on the right. Thepurposeof thecasepresented hereis to identify patients at risk of poor post-aSAH outcome, as they require specific healthcare management; therefore the In the rest of this paper, we report only not standar- clinical test must be highly specific. Detailed results of dized pAUCs. the study are reported in [31]. We only outline the fea- CI tures relevant to the ROC analysis. Given the pAUC of WFNS, it makes sense to compute a ROC curves were generated in pROC for five biomar- 95% CI of the pAUC to assess the variability of the mea- kers (H-FABP, S100b, Troponin I, NKDA and UFD-1) sure. In this case, we performed 10000 bootstrap repli- and three clinical factors (WFNS, Modified Fisher score cates and obtained the 1.6-5.0% interval. In our and age). experience, 10000 replicates give a fair estimate of the AUC and pAUC second significant digit. A lower number of replicates Since we are interested in a clinical test with a high spe- (for example 2000, the default) gives a good estimate of cificity, we focused on partial AUC between 90% and the first significant digit only. Other confidence intervals 100% specificity. can be computed. The threshold with the point farthest The best pAUC is obtained by WFNS, with 3.1%, clo- to the diagonal line in the specified region was deter- sely followed by S100b with 3.0% (Figure 1). A perfect mined with pROC to be 4.5 with the coords function. A clinical test within the same region corresponds to a rectangular confidence interval can be computed and pAUC of 10%, while a ROC curve without any discrimi- the bounds are 89.0-98.9 in specificity and 26.0-54.0 in nation power would yield only 0.5%. In the case of sensitivity (Figure 1). If the variability of sensitivity at WFNS, we computed a standardized pAUC of 63.7% 90% specificity is considered more relevant than at a with McClish’s formula (Equation 1). Of these 63.9%, specific threshold, the interval of sensitivity is computed 50% are due to the small portion (0.5% non-standardized) as 32.8-68.8. As shown in Figure 1 for S100b,a CI of the ROC curve below the identity line, and the remain- shape can be obtained by simply computing the CI’sof ing 13.9% are made of the larger part (2.6% non-standar- the sensitivities over several constantly spaced levels of dized) above the curve. In the R version of pROC,the specificity, and these CI bounds are then joined to gen- standardized pAUC of WFNS can be computed with: erate the shape. The following R code calculates the roc(response = aSAH$outcome, predictor = confidence shape: aSAH$wfns, partial.auc = c(100, 90), par- plot(x = roc(response = aSAH$outcome, tial.auc.correct = TRUE, percent = TRUE) predictor = aSAH$s100, percent = TRUE, ci = Robin et al. BMC Bioinformatics 2011, 12:77 Page 5 of 8 http://www.biomedcentral.com/1471-2105/12/77 TRUE, of = “se”, sp = seq(0, 100, 5)), ci. type="shape”) The confidence intervals of a threshold or of a prede- fined level of sensitivity or specificity answer different questions. For instance, it would be wrong to compute the CI of the threshold 4.5 and report only the CI bound of sensitivity without reporting the CI bound of specificity as well. Similarly, determining the sensitivity and specificity of the cut-off 4.5 and then computing both CIs separately would also be inaccurate. Statistical comparison The second best pAUC is that of S100b with 3.0%. The difference to WFNS is very small and the bootstrap test of pROC indicates that it is not significant (p = 0.8, Fig- ure 1). Surprisingly, a Venkatraman’s test (over the total ROC curve) indicates a difference in the shape of the ROCcurves(p= 0.004),andindeedatestevaluating pAUCs in the high sensitivity region (90-100% sensitiv- ity) would highlight a significant difference (p = 0.005, pAUC = 4.3 and 1.4 for WFNS and S100b respectively). Figure 2 ROC curve of WFNS and smoothing. Empirical ROC However, since we are not interested in the high sensi- curve of WFNS is shown in grey with three smoothing methods: tivity region of the AUC there is no significant differ- binormal (blue), density (green) and normal distribution fit (red). ence between WFNS and S100b. In pROC pairwise comparison of ROC curves is implemented. Multiple testing is not accounted for and (iii) The binormal smoothing (blue) gives a slightly in the event of running several tests, the user is but not significantly higher AUC than the empirical reminded that as with any statistical test, multiple tests ROC curve (Δ = +2.4, p = 0.3). It is probably the best should be performed with care, and if necessary appro- of the 3 smoothing estimates in this case (as mentioned priate corrections should be applied [32]. earlier we were expecting a higher AUC as the empiri- The bootstrap test can be performed with the follow- cal AUC of WFNS was underestimated). For compari- ing code in R: son, Additional File 5 displays both our implementation roc.test(response = aSAH$outcome, predic- of binormal smoothing with the one implemented in tor1 = aSAH$wfns, predictor2 = aSAH$s100, pcvsuite [15]. partial.auc = c(100, 90), percent = TRUE) Figure 3 shows how to create a plot with multiple Smoothing smoothed curves with pROC in S+. One loads the Whether or not to smooth a ROC curve is a difficult pROC library within S+, selects the new ROC curve choice. It can be useful in ROC curves with only few item in the Statistics menu, selects the data on which points, in which the trapezoidal rule consistently under- the analysis is to be performed, and then moves to the estimates the true AUC [17]. This is the case with most Smoothing tab to set parameters for smoothing. clinical scores, such as the WFNS shown in Figure 2 Conclusion where three smoothing methods available in pROC are In this case studyweshowedhow pROC could be run plotted: (i) normal distribution fitting, (ii) density and for ROC analysis. The main conclusion drawn from this (iii) binormal. In our case study: analysis is that none of the measured biomarkers can (i) The normal fitting (red) gives a significantly lower predict the patient outcome better than the neurological AUC estimate (Δ = -5.1, p = 0.0006, Bootstrap test). score (WFNS). This difference is due to the non-normality of WFNS. Distribution fitting can be very powerful when there is a Installation and usage clear knowledge of the underlying distributions, but should be avoided in other contexts. pROC can be installed in R by issuing the following (ii) The density (green) smoothing also produces a command in the prompt: -7 lower (Δ = -1.5, p = 6*10 ) AUC. It is interesting to note install.packages("pROC”) that even with a smaller difference in AUCs, the p-value Loading the package: can be more significant due to a higher covariance. library(pROC) Robin et al. BMC Bioinformatics 2011, 12:77 Page 6 of 8 http://www.biomedcentral.com/1471-2105/12/77 Figure 3 Screenshot of pROC in S+ for smoothing WFNS ROC curve. Top left: the General tab, where data is entered. Top right: the details about smoothing. Bottom left: the details for the plot. Checking the box “Add to existing plot” allows drawing several curves on a plot. Bottom right: the result in the standard S+ plot device. Getting help: Functions and methods ?pROC A summary of the functions available to the user in the S+ command line version of pROC is shown in Table 2. pROC is available from the File menu, item Find Table 3 shows the list of the methods provided for plot- Packages.... It can be loaded from the File menu, item ting and printing. Load Library.... Conclusions In addition to the command line functions, a GUI is The pROC package is a powerful set of tools analyzing then available in the Statistics menu. It features one and comparing ROC curves in R and S+. Unlike existing window for univariate ROC curves (which contains packages such as ROCR or verification, it is solely dedi- options for smoothing, pAUC, CIs and plotting) and two windows for paired and unpaired tests of two ROC cated to ROC analysis, but provides in our knowledge curves. In addition a specific help file for the GUI is the most complete set of statistical tests and plots for available from the same menu. ROC curves. As shown in the case study reported here, Robin et al. BMC Bioinformatics 2011, 12:77 Page 7 of 8 http://www.biomedcentral.com/1471-2105/12/77 Table 2 Functions provided in pROC DeLong’s paired test, B: DeLong’s unpaired test, C: bootstrap paired test are. Determines if two ROC curves are possibly paired (with 10000 replicates), D: bootstrap unpaired test (with 10000 replicates) paired and E: Venkatraman’s test (with 10000 permutations). Additional file 3: Correlations between DeLong and bootstrap auc Computes the area under the ROC curve paired tests. X axis: DeLong’s test; Y-axis: bootstrap test with number of ci Computes the confidence interval of a ROC curve bootstrap replicates. A: 10, B: 100, C: 1000 and D: 10000. ci.auc Computes the confidence interval of the AUC Additional file 4: Correlation between DeLong and Venkatraman’s test. X axis: DeLong’s test; Y-axis: Venkatraman’s test with 10000 ci.se Computes the confidence interval of sensitivities at given permutations. specificities Additional file 5: Binormal smoothing. Binormal smoothing with ci.sp Computes the confidence interval of specificities at given pcvsuite (green, solid) and pROC (black, dashed). sensitivities ci. Computes the confidence interval of thresholds thresholds coords Returns the coordinates (sensitivities, specificities, List of abbreviations thresholds) of a ROC curve aSAH: aneurysmal subarachnoid haemorrhage; AUC: area under the curve; CI: confidence interval; CRAN: comprehensive R archive network; CSAN: roc Builds a ROC curve comprehensive S-PLUS archive network; pAUC: partial area under the curve; roc.test Compares the AUC of two correlated ROC curves ROC: receiver operating characteristic. smooth Smoothes a ROC curve Acknowledgements The authors would like to thank E. S. Venkatraman and Colin B. Begg for their support in the implementation of their test. Table 3 Methods provided by pROC for standard This work was supported by Proteome Science Plc. functions lines ROC curves (roc) and smoothed ROC curves (smooth.roc) Author details Biomedical Proteomics Research Group, Department of Structural Biology plot ROC curves (roc), smoothed ROC curves (smooth.roc) and and Bioinformatics, Medical University Centre, Geneva, Switzerland. Swiss confidence intervals (ci.se, ci.sp, ci.thresholds) Institute of Bioinformatics, Medical University Centre, Geneva, Switzerland. print All pROC objects (auc, ci.auc, ci.se, ci.sp, ci.thresholds, roc, smooth.roc) Authors’ contributions XR carried out the programming and software design and drafted the manuscript. NTu, AH, NTi provided data and biological knowledge, tested pROC features the computation of AUC and pAUC, var- and critically reviewed the software and the manuscript. FL helped to draft ious kinds of confidence intervals, several smoothing and to critically improve the manuscript. JCS conceived the biomarker study, participated in its design and coordination, and helped to draft the methods, and the comparison of two paired or unpaired manuscript. MM participated in the design and coordination of the ROC curves. We believe that pROC should provide bioinformatics part of the study, participated in the programming and researchers, especially in the biomarker community, software design and helped to draft the manuscript. All authors read and approved the final manuscript. with the necessary tools to better interpret their results in biomarker classification studies. Received: 10 September 2010 Accepted: 17 March 2011 pROC is available in two versions for R and S+. A thor- Published: 17 March 2011 ough documentation with numerous examples is provided References in the standard R format. For users unfamiliar with pro- 1. Swets JA: The Relative Operating Characteristic in Psychology. Science gramming, a graphical user interface is provided for S+. 1973, 182:990-1000. 2. Pepe MS: The statistical evaluation of medical tests for classification and prediction Oxford: Oxford University Press; 2003. Availability and requirements 3. Sonego P, Kocsor A, Pongor S: ROC analysis: applications to the � Project name: pROC classification of biological sequences and 3D structures. Brief Bioinform � Project home page: http://expasy.org/tools/pROC/ 2008, 9:198-209. 4. Fawcett T: An introduction to ROC analysis. Pattern Recogn Lett 2006, � Operating system(s): Platform independent 27:861-874. � Programming language: R and S+ 5. Hanczar B, Hua J, Sima C, Weinstein J, Bittner M, Dougherty ER: Small- � Other requirements: R ≥ 2.10.0 or S+ ≥ 8.1.1 sample precision of ROC-related estimates. Bioinformatics 2010, 26:822-830. � License: GNU GPL 6. Robin X, Turck N, Hainard A, Lisacek F, Sanchez JC, Müller M: Bioinformatics � Any restrictions to use by non-academics: none for protein biomarker panel classification: What is needed to bring biomarker panels into in vitro diagnostics? Expert Rev Proteomics 2009, 6:675-689. Additional material 7. McClish DK: Analyzing a Portion of the ROC Curve. Med Decis Making 1989, 9:190-195. Additional file 1: Assessment of the ROC comparison tests.We 8. Jiang Y, Metz CE, Nishikawa RM: A receiver operating characteristic partial evaluate the uniformity of the tests under the null hypothesis (ROC area index for highly sensitive diagnostic tests. Radiology 1996, curves are not different), and the correlation between the different tests. 201:745-750. 9. Streiner DL, Cairney J: What’s under the ROC? An introduction to receiver Additional file 2: Histograms of the frequency of 600 test p-values operating characteristics curves. Canadian Journal of Psychiatry Revue under the null hypothesis (ROC curves are not different).A: Canadienne De Psychiatrie 2007, 52:121-128. Robin et al. BMC Bioinformatics 2011, 12:77 Page 8 of 8 http://www.biomedcentral.com/1471-2105/12/77 10. Stephan C, Wesseling S, Schink T, Jung K: Comparison of Eight Computer Programs for Receiver-Operating Characteristic Analysis. Clin Chem 2003, 49:433-439. 11. R Development Core Team: R: A Language and Environment for Statistical Computing Vienna, Austria: R Foundation for Statistical Computing; 2010. 12. Sing T, Sander O, Beerenwinkel N, Lengauer T: ROCR: visualizing classifier performance in R. Bioinformatics 2005, 21:3940-3941. 13. NCAR: verification: Forecast verification utilities v. 1.31.[http://CRAN.R- project.org/package=verification]. 14. Carey V, Redestig H: ROC: utilities for ROC, with uarray focus, v. 1.24.0. [http://www.bioconductor.org]. 15. Pepe M, Longton G, Janes H: Estimation and Comparison of Receiver Operating Characteristic Curves. The Stata journal 2009, 9:1. 16. Hanley JA, McNeil BJ: A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 1983, 148:839-843. 17. DeLong ER, DeLong DM, Clarke-Pearson DL: Comparing the Areas under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach. Biometrics 1988, 44:837-845. 18. Bandos AI, Rockette HE, Gur D: A permutation test sensitive to differences in areas for comparing ROC curves from a paired design. Stat Med 2005, 24:2873-2893. 19. Braun TM, Alonzo TA: A modified sign test for comparing paired ROC curves. Biostat 2008, 9:364-372. 20. Venkatraman ES, Begg CB: A distribution-free procedure for comparing receiver operating characteristic curves from a paired experiment. Biometrika 1996, 83:835-848. 21. Bandos AI, Rockette HE, Gur D: A Permutation Test for Comparing ROC Curves in Multireader Studies: A Multi-reader ROC, Permutation Test. Acad Radiol 2006, 13:414-420. 22. Moise A, Clement B, Raissis M: A test for crossing receiver operating characteristic (roc) curves. Communications in Statistics - Theory and Methods 1988, 17:1985-2003. 23. Venkatraman ES: A Permutation Test to Compare Receiver Operating Characteristic Curves. Biometrics 2000, 56:1134-1138. 24. Campbell G: Advances in statistical methodology for the evaluation of diagnostic and laboratory tests. Stat Med 1994, 13:499-508. 25. Wickham H: plyr: Tools for splitting, applying and combining data v. 1.4. [http://CRAN.R-project.org/package=plyr]. 26. Carpenter J, Bithell J: Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. Stat Med 2000, 19:1141-1164. 27. Metz CE, Herman BA, Shen JH: Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. Stat Med 1998, 17:1033-1053. 28. Hanley JA: The robustness of the “binormal” assumptions used in fitting ROC curves. Med Decis Making 1988, 8:197-203. 29. Zou KH, Hall WJ, Shapiro DE: Smooth non-parametric receiver operating characteristic (ROC) curves for continuous diagnostic tests. Stat Med 1997, 16:2143-2156. 30. Venables WN, Ripley BD: Modern Applied Statistics with S. Fourth edition. New York: Springer; 2002. 31. Turck N, Vutskits L, Sanchez-Pena P, Robin X, Hainard A, Gex-Fabry M, Fouda C, Bassem H, Mueller M, Lisacek F, et al: A multiparameter panel method for outcome prediction following aneurysmal subarachnoid hemorrhage. Intensive Care Med 2010, 36:107-115. 32. Ewens WJ, Grant GR: Statistics (i): An Introduction to Statistical Inference. Statistical methods in bioinformatics New York: Springer-Verlag; 2005. doi:10.1186/1471-2105-12-77 Submit your next manuscript to BioMed Central Cite this article as: Robin et al.: pROC: an open-source package for R and take full advantage of: and S+ to analyze and compare ROC curves. BMC Bioinformatics 2011 12:77. • Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit

Journal

BMC BioinformaticsSpringer Journals

Published: Mar 17, 2011

There are no references for this article.