Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Automatic expansion of domain-specific lexicons by term categorization

Automatic expansion of domain-specific lexicons by term categorization We discuss an approach to the automatic expansion of domain-specific lexicons , that is, to the problem of extending, for each c i in a predefined set C = { c 1 ,…, c m } of semantic domains , an initial lexicon L i 0 into a larger lexicon L i 1 . Our approach relies on term categorization , defined as the task of labeling previously unlabeled terms according to a predefined set of domains. We approach this as a supervised learning problem in which term classifiers are built using the initial lexicons as training data. Dually to classic text categorization tasks in which documents are represented as vectors in a space of terms, we represent terms as vectors in a space of documents. We present the results of a number of experiments in which we use a boosting-based learning device for training our term classifiers. We test the effectiveness of our method by using WordNetDomains, a well-known large set of domain-specific lexicons, as a benchmark. Our experiments are performed using the documents in the Reuters Corpus Volume 1 as implicit representations for our terms. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Transactions on Speech and Language Processing (TSLP) Association for Computing Machinery

Automatic expansion of domain-specific lexicons by term categorization

Loading next page...
 
/lp/association-for-computing-machinery/automatic-expansion-of-domain-specific-lexicons-by-term-categorization-zm99BRk0N0
Publisher
Association for Computing Machinery
Copyright
Copyright © 2006 by ACM Inc.
ISSN
1550-4875
DOI
10.1145/1138379.1138380
Publisher site
See Article on Publisher Site

Abstract

We discuss an approach to the automatic expansion of domain-specific lexicons , that is, to the problem of extending, for each c i in a predefined set C = { c 1 ,…, c m } of semantic domains , an initial lexicon L i 0 into a larger lexicon L i 1 . Our approach relies on term categorization , defined as the task of labeling previously unlabeled terms according to a predefined set of domains. We approach this as a supervised learning problem in which term classifiers are built using the initial lexicons as training data. Dually to classic text categorization tasks in which documents are represented as vectors in a space of terms, we represent terms as vectors in a space of documents. We present the results of a number of experiments in which we use a boosting-based learning device for training our term classifiers. We test the effectiveness of our method by using WordNetDomains, a well-known large set of domain-specific lexicons, as a benchmark. Our experiments are performed using the documents in the Reuters Corpus Volume 1 as implicit representations for our terms.

Journal

ACM Transactions on Speech and Language Processing (TSLP)Association for Computing Machinery

Published: May 1, 2006

There are no references for this article.