Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Detecting large-scale system problems by mining console logs

Detecting large-scale system problems by mining console logs Detecting Large-Scale System Problems by Mining Console Logs Wei Xu — Ling Huang — Armando Fox — David Patterson — Michael I. Jordan — {xuw,fox,pattrsn,jordan}@cs.berkeley.edu ABSTRACT Surprisingly, console logs rarely help operators detect problems in large-scale datacenter services, for they often consist of the voluminous intermixing of messages from many software components written by independent developers. We propose a general methodology to mine this rich source of information to automatically detect system runtime problems. We rst parse console logs by combining source code analysis with information retrieval to create composite features. We then analyze these features using machine learning to detect operational problems. We show that our method enables analyses that are impossible with previous methods because of its superior ability to create sophisticated features. We also show how to distill the results of our analysis to an operator-friendly one-page decision tree showing the critical messages associated with the detected problems. We validate our approach using the Darkstar online game server and the Hadoop File System, where we detect numerous real problems with high accuracy and few false positives. In the Hadoop case, we are able to analyze 24 million lines of console logs in 3 minutes. Our http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png

Detecting large-scale system problems by mining console logs

Association for Computing Machinery — Oct 11, 2009

Loading next page...
 
/lp/association-for-computing-machinery/detecting-large-scale-system-problems-by-mining-console-logs-hJXIMEOFAa

References (50)

Datasource
Association for Computing Machinery
Copyright
The ACM Portal is published by the Association for Computing Machinery. Copyright © 2010 ACM, Inc.
ISBN
978-1-60558-752-3
doi
10.1145/1629575.1629587
Publisher site
See Article on Publisher Site

Abstract

Detecting Large-Scale System Problems by Mining Console Logs Wei Xu — Ling Huang — Armando Fox — David Patterson — Michael I. Jordan — {xuw,fox,pattrsn,jordan}@cs.berkeley.edu ABSTRACT Surprisingly, console logs rarely help operators detect problems in large-scale datacenter services, for they often consist of the voluminous intermixing of messages from many software components written by independent developers. We propose a general methodology to mine this rich source of information to automatically detect system runtime problems. We rst parse console logs by combining source code analysis with information retrieval to create composite features. We then analyze these features using machine learning to detect operational problems. We show that our method enables analyses that are impossible with previous methods because of its superior ability to create sophisticated features. We also show how to distill the results of our analysis to an operator-friendly one-page decision tree showing the critical messages associated with the detected problems. We validate our approach using the Darkstar online game server and the Hadoop File System, where we detect numerous real problems with high accuracy and few false positives. In the Hadoop case, we are able to analyze 24 million lines of console logs in 3 minutes. Our

There are no references for this article.