Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Modifying boosted trees to improve performance on task 1 of the 2006 KDD challenge cup

Modifying boosted trees to improve performance on task 1 of the 2006 KDD challenge cup Task 1 of the 2006 KDD Challenge Cup required classification of pulmonary embolisms (PEs) using variables derived from computed tomography angiography. We present our approach to the challenge and justification for our choices. We used boosted trees to perform the main classification task, but modified the algorithm to address idiosyncrasies of the scoring criteria. The two main modifications were: 1) changing the dependent variable in the training set to account for multiple PEs per patient, and 2) incorporating neighborhood information through augmentation of the set of predictor variables. Both of these resulted in measurable predictive improvement. In addition, we discuss a statistically based method for setting the classification threshold. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM SIGKDD Explorations Newsletter Association for Computing Machinery

Modifying boosted trees to improve performance on task 1 of the 2006 KDD challenge cup

Loading next page...
 
/lp/association-for-computing-machinery/modifying-boosted-trees-to-improve-performance-on-task-1-of-the-2006-zv0n0vWhAN
Publisher
Association for Computing Machinery
Copyright
Copyright © 2006 by ACM Inc.
ISSN
1931-0145
DOI
10.1145/1233321.1233327
Publisher site
See Article on Publisher Site

Abstract

Task 1 of the 2006 KDD Challenge Cup required classification of pulmonary embolisms (PEs) using variables derived from computed tomography angiography. We present our approach to the challenge and justification for our choices. We used boosted trees to perform the main classification task, but modified the algorithm to address idiosyncrasies of the scoring criteria. The two main modifications were: 1) changing the dependent variable in the training set to account for multiple PEs per patient, and 2) incorporating neighborhood information through augmentation of the set of predictor variables. Both of these resulted in measurable predictive improvement. In addition, we discuss a statistically based method for setting the classification threshold.

Journal

ACM SIGKDD Explorations NewsletterAssociation for Computing Machinery

Published: Dec 1, 2006

There are no references for this article.