Get 20M+ Full-Text Papers For Less Than $1.50/day. Subscribe now for You or Your Team.

Learn More →

String similarity search and join: a survey

String similarity search and join: a survey Abstract String similarity search and join are two important operations in data cleaning and integration, which extend traditional exact search and exact join operations in databases by tolerating the errors and inconsistencies in the data. They have many real-world applications, such as spell checking, duplicate detection, entity resolution, and webpage clustering. Although these two problems have been extensively studied in the recent decade, there is no thorough survey. In this paper, we present a comprehensive survey on string similarity search and join. We first give the problem definitions and introduce widely-used similarity functions to quantify the similarity. We then present an extensive set of algorithms for string similarity search and join. We also discuss their variants, including approximate entity extraction, type-ahead search, and approximate substring matching. Finally, we provide some open datasets and summarize some research challenges and open problems. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png "Frontiers of Computer Science" Springer Journals

String similarity search and join: a survey

Loading next page...
 
/lp/springer-journals/string-similarity-search-and-join-a-survey-AmswQJ1qry

References (91)

Publisher
Springer Journals
Copyright
2015 Higher Education Press and Springer-Verlag Berlin Heidelberg
ISSN
2095-2228
eISSN
2095-2236
DOI
10.1007/s11704-015-5900-5
Publisher site
See Article on Publisher Site

Abstract

Abstract String similarity search and join are two important operations in data cleaning and integration, which extend traditional exact search and exact join operations in databases by tolerating the errors and inconsistencies in the data. They have many real-world applications, such as spell checking, duplicate detection, entity resolution, and webpage clustering. Although these two problems have been extensively studied in the recent decade, there is no thorough survey. In this paper, we present a comprehensive survey on string similarity search and join. We first give the problem definitions and introduce widely-used similarity functions to quantify the similarity. We then present an extensive set of algorithms for string similarity search and join. We also discuss their variants, including approximate entity extraction, type-ahead search, and approximate substring matching. Finally, we provide some open datasets and summarize some research challenges and open problems.

Journal

"Frontiers of Computer Science"Springer Journals

Published: Jun 1, 2016

There are no references for this article.