Get 20M+ Full-Text Papers For Less Than $1.50/day. Subscribe now for You or Your Team.

Learn More →

ROCR: visualizing classifier performance in R

ROCR: visualizing classifier performance in R Vol. 21 no. 20 2005, pages 3940–3941 BIOINFORMATICSAPPLICATIONS NOTE doi:10.1093/bioinformatics/bti623 Data and text mining ROCR: visualizing classifier performance in R 1, 1 2 1 Tobias Sing , Oliver Sander , Niko Beerenwinkel and Thomas Lengauer Department of Computational Biology and Applied Algorithmics, Max-Planck-Institute for Informatics, Stuhlsatzenhausweg 85, 66123 Saarbru¨ cken, Germany and Department of Mathematics, University of California, Berkeley, CA 94720-3840, USA Received on March 10, 2005; revised on June 1, 2005; accepted on August 9, 2005 Advance Access publication August 11, 2005 ABSTRACT such trade-off visualizations include receiver operating char- acteristic (ROC) graphs, sensitivity/specificity curves, lift charts Summary: ROCR is a package for evaluating and visualizing the per- and precision/recall plots. Fawcett (2004) provides a general intro- formance of scoring classifiers in the statistical language R. It features duction into evaluating scoring classifiers with a focus on over 25 performance measures that can be freely combined to create ROC graphs. two-dimensional performance curves. Standard methods for investig- Although functions for drawing ROC graphs are provided by the ating trade-offs between specific performance measures are available Bioconductor project (http://www.bioconductor.org) or by the within a uniform framework, including receiver operating characteristic machine learning package Weka (http://www.cs.waikato.ac.nz/ (ROC) graphs, precision/recall plots, lift charts and cost curves. ROCR ml), for example, no comprehensive evaluation suite is available integrates tightly with R’s powerful graphics capabilities, thus allowing to date. ROCR is a flexible evaluation package for R (http://www. for highly adjustable plots. Being equipped with only three commands r-project.org), a statistical language that is widely used in biomed- and reasonable default values for optional parameters, ROCR com- ical data analysis. Our tool allows for creating cutoff-parametrized bines flexibility with ease of usage. performance curves by freely combining two out of more than Availability: http://rocr.bioinf.mpi-sb.mpg.de. ROCR can be used 25 performance measures (Table 1). Curves from different cross- under the terms of the GNU General Public License. Running within validation or bootstrapping runs can be averaged by various meth- R, it is platform-independent. ods. Standard deviations, standard errors and box plots are available Contact: tobias.sing@mpi-sb.mpg.de to summarize the variability across the runs. The parametrization can be visualized by printing cutoff values at the corresponding Pattern classification has become a central tool in bioinformatics, curve positions, or by coloring the curve according to the cutoff. All offering rapid insights into large data sets (Baldi and Brunak, 2001). components of a performance plot are adjustable using a flexible While one area of our work involves predicting phenotypic prop- mechanism for dispatching optional arguments. Despite this flex- erties of HIV-1 from genotypic information (Beerenwinkel et al., ibility, ROCR is easy to use, with only three commands and reas- 2002, 2003; Sing et al., 2004), scoring or ranking predictors are also onable default values for all optional parameters. vital in a wide range of other biological problems. Examples include In the example below, we will briefly introduce ROCR’s three microarray analysis (e.g. prediction of tissue condition based on commands—prediction, performance and plot—applied gene expression), protein structural and functional characterization to a 10-fold cross-validation set of predictions and correspond- (remote homology detection, prediction of post-translational modi- ing class labels from a study on predicting HIV coreceptor usage fications and molecular function annotation based on sequence or from the sequence of the viral envelope protein. After loading structural motifs), genome annotation (gene finding and splice site the dataset, a prediction object is created from the raw predic- identification), protein–ligand interactions (virtual screening and tions and class labels. molecular docking) and structure–activity relationships (predicting data(ROCR.hiv) bioavailability or toxicity of drug compounds). In many of these pred <- prediction( cases, considerable class skew, class-specific misclassification costs, ROCR.hiv$hiv.svm$predictions, and extensive noise due to variability in experimental assays com- ROCR.hiv$hiv.svm$labels) plicate predictive modelling. Thus, careful predictor validation is compulsory. Performance measures or combinations thereof are computed by The real-valued output of scoring classifiers is turned into a bin- invoking the performance method on this prediction object. ary class decision by choosing a cutoff. As no cutoff is optimal The resulting performance object can be visualized using the according to all possible performance criteria, cutoff choice method plot. For example, an ROC curve that trades off involves a trade-off among different measures. Typically, a the rate of true positives against the rate of false positives is trade-off between a pair of criteria (e.g. sensitivity versus obtained as follows: specificity) is visualized as a cutoff-parametrized curve in the perf <- performance(pred, "tpr", "fpr") plane spanned by the two measures. Popular examples of plot(perf, avg="threshold", spread.estimate="boxplot") To whom correspondence should be addressed. 3940  The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org ROCR: Visualizing clasifier performance in R Fig. 1. Visualizations of classifier performance (HIV coreceptor usage data): (a) receiver operating characteristic (ROC) curve; (b) peak accuracy across a range of cutoffs; (c) absolute difference between empirical and predicted rate of positives for windowed cutoff ranges, in order to evaluate how well the scores are calibrated as probability estimates. Owing to the probabilistic interpretation, cutoffs need to be in the interval [0,1], in contrast to other performance plots. (d) Score density estimates for the negative (solid) and positive (dotted) class. Table 1. Performance measures in the ROCR package help(performance). A reference manual can be downloaded from the ROCR website. In conclusion, ROCR is a comprehensive tool for evaluating Contingency ratios error rate, accuracy, sensitivity, specificity, scoring classifiers and producing publication-quality figures. It true/false positive rate, fallout, miss, allows for studying the intricacies inherent to many biological data- precision, recall, negative predictive sets and their implications on classifier performance. value, prediction-conditioned fallout/miss. Discrete covariation Phi/Matthews correlation coefficient, measures mutual information, c test statistic, ACKNOWLEDGEMENT odds ratio Work at MPI supported by EU NoE BioSapiens (LSHG-CT-2003- Information retrieval F-measure, lift, precision-recall measures break-even point 503265). Performance in ROC space ROC convex hull, area under Conflict of Interest: none declared. the ROC curve Absolute scoring calibration error, mean cross-entropy, performance root mean-squared error Cost measures expected cost, explicit cost REFERENCES Baldi,P. and Brunak,S. (2001) Bioinformatics: The Machine Learning Approach. MIT Press, Cambridge, MA. Beerenwinkel,N. et al. (2003) Geno2pheno: estimating phenotypic drug resistance The optional parameter avg selects a particular form of from HIV-1 genotypes. Nucleic Acids Res., 31, 3850–3855. performance curve averaging across the validation runs; the visu- Beerenwinkel,N. et al. (2002) Diversity and complexity of HIV-1 drug resistance: a alization of curve variability is determined with the parameter bioinformatics approach to predicting phenotype from genotype. Proc. Natl Acad. Sci. USA, 99, 8271–8276. spread.estimate. Fawcett,T. (2004) ROC graphs: notes and practical considerations for researchers. Issuing demo(ROCR) starts a demonstration of further graphical Technical Report HPL-2003-4. HP Labs, Palo Alto, CA. capabilities of ROCR. The command help(package=ROCR) Sing,T., Beerenwinkel,N. and Lengauer,T. (2004) Learning mixtures of localized rules points to the available help pages. In particular, a complete by maximizing the area under the ROC curve. Valencia, Spain. In Proceedings of list of available performance measures can be obtained via the 1st International Workshop on ROC Analysis in Artificial Intelligence, 89–96. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bioinformatics Oxford University Press

ROCR: visualizing classifier performance in R

Loading next page...
 
/lp/oxford-university-press/rocr-visualizing-classifier-performance-in-r-w0Cfwkmhxj

References (4)

Publisher
Oxford University Press
Copyright
© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org
ISSN
1367-4803
eISSN
1460-2059
DOI
10.1093/bioinformatics/bti623
pmid
16096348
Publisher site
See Article on Publisher Site

Abstract

Vol. 21 no. 20 2005, pages 3940–3941 BIOINFORMATICSAPPLICATIONS NOTE doi:10.1093/bioinformatics/bti623 Data and text mining ROCR: visualizing classifier performance in R 1, 1 2 1 Tobias Sing , Oliver Sander , Niko Beerenwinkel and Thomas Lengauer Department of Computational Biology and Applied Algorithmics, Max-Planck-Institute for Informatics, Stuhlsatzenhausweg 85, 66123 Saarbru¨ cken, Germany and Department of Mathematics, University of California, Berkeley, CA 94720-3840, USA Received on March 10, 2005; revised on June 1, 2005; accepted on August 9, 2005 Advance Access publication August 11, 2005 ABSTRACT such trade-off visualizations include receiver operating char- acteristic (ROC) graphs, sensitivity/specificity curves, lift charts Summary: ROCR is a package for evaluating and visualizing the per- and precision/recall plots. Fawcett (2004) provides a general intro- formance of scoring classifiers in the statistical language R. It features duction into evaluating scoring classifiers with a focus on over 25 performance measures that can be freely combined to create ROC graphs. two-dimensional performance curves. Standard methods for investig- Although functions for drawing ROC graphs are provided by the ating trade-offs between specific performance measures are available Bioconductor project (http://www.bioconductor.org) or by the within a uniform framework, including receiver operating characteristic machine learning package Weka (http://www.cs.waikato.ac.nz/ (ROC) graphs, precision/recall plots, lift charts and cost curves. ROCR ml), for example, no comprehensive evaluation suite is available integrates tightly with R’s powerful graphics capabilities, thus allowing to date. ROCR is a flexible evaluation package for R (http://www. for highly adjustable plots. Being equipped with only three commands r-project.org), a statistical language that is widely used in biomed- and reasonable default values for optional parameters, ROCR com- ical data analysis. Our tool allows for creating cutoff-parametrized bines flexibility with ease of usage. performance curves by freely combining two out of more than Availability: http://rocr.bioinf.mpi-sb.mpg.de. ROCR can be used 25 performance measures (Table 1). Curves from different cross- under the terms of the GNU General Public License. Running within validation or bootstrapping runs can be averaged by various meth- R, it is platform-independent. ods. Standard deviations, standard errors and box plots are available Contact: tobias.sing@mpi-sb.mpg.de to summarize the variability across the runs. The parametrization can be visualized by printing cutoff values at the corresponding Pattern classification has become a central tool in bioinformatics, curve positions, or by coloring the curve according to the cutoff. All offering rapid insights into large data sets (Baldi and Brunak, 2001). components of a performance plot are adjustable using a flexible While one area of our work involves predicting phenotypic prop- mechanism for dispatching optional arguments. Despite this flex- erties of HIV-1 from genotypic information (Beerenwinkel et al., ibility, ROCR is easy to use, with only three commands and reas- 2002, 2003; Sing et al., 2004), scoring or ranking predictors are also onable default values for all optional parameters. vital in a wide range of other biological problems. Examples include In the example below, we will briefly introduce ROCR’s three microarray analysis (e.g. prediction of tissue condition based on commands—prediction, performance and plot—applied gene expression), protein structural and functional characterization to a 10-fold cross-validation set of predictions and correspond- (remote homology detection, prediction of post-translational modi- ing class labels from a study on predicting HIV coreceptor usage fications and molecular function annotation based on sequence or from the sequence of the viral envelope protein. After loading structural motifs), genome annotation (gene finding and splice site the dataset, a prediction object is created from the raw predic- identification), protein–ligand interactions (virtual screening and tions and class labels. molecular docking) and structure–activity relationships (predicting data(ROCR.hiv) bioavailability or toxicity of drug compounds). In many of these pred <- prediction( cases, considerable class skew, class-specific misclassification costs, ROCR.hiv$hiv.svm$predictions, and extensive noise due to variability in experimental assays com- ROCR.hiv$hiv.svm$labels) plicate predictive modelling. Thus, careful predictor validation is compulsory. Performance measures or combinations thereof are computed by The real-valued output of scoring classifiers is turned into a bin- invoking the performance method on this prediction object. ary class decision by choosing a cutoff. As no cutoff is optimal The resulting performance object can be visualized using the according to all possible performance criteria, cutoff choice method plot. For example, an ROC curve that trades off involves a trade-off among different measures. Typically, a the rate of true positives against the rate of false positives is trade-off between a pair of criteria (e.g. sensitivity versus obtained as follows: specificity) is visualized as a cutoff-parametrized curve in the perf <- performance(pred, "tpr", "fpr") plane spanned by the two measures. Popular examples of plot(perf, avg="threshold", spread.estimate="boxplot") To whom correspondence should be addressed. 3940  The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org ROCR: Visualizing clasifier performance in R Fig. 1. Visualizations of classifier performance (HIV coreceptor usage data): (a) receiver operating characteristic (ROC) curve; (b) peak accuracy across a range of cutoffs; (c) absolute difference between empirical and predicted rate of positives for windowed cutoff ranges, in order to evaluate how well the scores are calibrated as probability estimates. Owing to the probabilistic interpretation, cutoffs need to be in the interval [0,1], in contrast to other performance plots. (d) Score density estimates for the negative (solid) and positive (dotted) class. Table 1. Performance measures in the ROCR package help(performance). A reference manual can be downloaded from the ROCR website. In conclusion, ROCR is a comprehensive tool for evaluating Contingency ratios error rate, accuracy, sensitivity, specificity, scoring classifiers and producing publication-quality figures. It true/false positive rate, fallout, miss, allows for studying the intricacies inherent to many biological data- precision, recall, negative predictive sets and their implications on classifier performance. value, prediction-conditioned fallout/miss. Discrete covariation Phi/Matthews correlation coefficient, measures mutual information, c test statistic, ACKNOWLEDGEMENT odds ratio Work at MPI supported by EU NoE BioSapiens (LSHG-CT-2003- Information retrieval F-measure, lift, precision-recall measures break-even point 503265). Performance in ROC space ROC convex hull, area under Conflict of Interest: none declared. the ROC curve Absolute scoring calibration error, mean cross-entropy, performance root mean-squared error Cost measures expected cost, explicit cost REFERENCES Baldi,P. and Brunak,S. (2001) Bioinformatics: The Machine Learning Approach. MIT Press, Cambridge, MA. Beerenwinkel,N. et al. (2003) Geno2pheno: estimating phenotypic drug resistance The optional parameter avg selects a particular form of from HIV-1 genotypes. Nucleic Acids Res., 31, 3850–3855. performance curve averaging across the validation runs; the visu- Beerenwinkel,N. et al. (2002) Diversity and complexity of HIV-1 drug resistance: a alization of curve variability is determined with the parameter bioinformatics approach to predicting phenotype from genotype. Proc. Natl Acad. Sci. USA, 99, 8271–8276. spread.estimate. Fawcett,T. (2004) ROC graphs: notes and practical considerations for researchers. Issuing demo(ROCR) starts a demonstration of further graphical Technical Report HPL-2003-4. HP Labs, Palo Alto, CA. capabilities of ROCR. The command help(package=ROCR) Sing,T., Beerenwinkel,N. and Lengauer,T. (2004) Learning mixtures of localized rules points to the available help pages. In particular, a complete by maximizing the area under the ROC curve. Valencia, Spain. In Proceedings of list of available performance measures can be obtained via the 1st International Workshop on ROC Analysis in Artificial Intelligence, 89–96.

Journal

BioinformaticsOxford University Press

Published: Aug 11, 2005

There are no references for this article.