Get 20M+ Full-Text Papers For Less Than $1.50/day. Subscribe now for You or Your Team.

Learn More →

Finding approximate matches in large lexicons

Finding approximate matches in large lexicons Approximate string matching is used for spelling correction and personal name matching. In this paper we show how to use string matching techniques in conjunction with lexicon indexes to find approximate matches in a large lexicon. We test several lexicon indexing techniques, including n‐grams and permuted lexicons, and several string matching techniques, including string similarity measures and phonetic coding. We propose methods for combining these techniques, and show experimentally that these combinations yield good retrieval effectiveness while keeping index size and retrieval time low. Our experiments also suggest that, in contrast to previous claims, phonetic codings are markedly inferior to string distance measures, which are demonstrated to be suitable for both spelling correction and personal name matching. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Software: Practice and Experience Wiley

Finding approximate matches in large lexicons

Loading next page...
 
/lp/wiley/finding-approximate-matches-in-large-lexicons-AeAgr0dYHz

References (27)

Publisher
Wiley
Copyright
Copyright © 1995 Wiley Subscription Services, Inc., A Wiley Company
ISSN
0038-0644
eISSN
1097-024X
DOI
10.1002/spe.4380250307
Publisher site
See Article on Publisher Site

Abstract

Approximate string matching is used for spelling correction and personal name matching. In this paper we show how to use string matching techniques in conjunction with lexicon indexes to find approximate matches in a large lexicon. We test several lexicon indexing techniques, including n‐grams and permuted lexicons, and several string matching techniques, including string similarity measures and phonetic coding. We propose methods for combining these techniques, and show experimentally that these combinations yield good retrieval effectiveness while keeping index size and retrieval time low. Our experiments also suggest that, in contrast to previous claims, phonetic codings are markedly inferior to string distance measures, which are demonstrated to be suitable for both spelling correction and personal name matching.

Journal

Software: Practice and ExperienceWiley

Published: Mar 1, 1995

Keywords: ; ; ; ;

There are no references for this article.