Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Time Expression and Named Entity RecognitionData Analysis

Time Expression and Named Entity Recognition: Data Analysis [We analyze four diverse datasets about time expressions for their intrinsic characteristics and find five such common characteristics; similarly we analyze two well-known benchmark datasets about named entities for their intrinsic characteristics and find three such common characteristics. For the common characteristics of time expressions, firstly, most time expressions are very short, consisting of about 2 words on average; secondly, most time expressions contain at least one time-related word that can distinguish time expressions from common text; thirdly, only a small group of words are used to express time information; fourthly, words in time expressions demonstrate similar syntactic behaviour; and finally, time expressions are formed by loose structure, with more than 53.5% of time tokens appearing in different positions within time expressions. For the common characteristics of named entities, firstly, most named entities contain uncommon words, which mainly appear in named entities and hardly appear in common text; secondly, named entities are mainly made up of proper nouns, with more than 84.8% of proper nouns appear in named entities under the whole text and more than 80.1% of the words are proper nouns within named entities; thirdly, named entities are formed by loose structure, with more than 53.77% of distinct words that appear in different positions within named entities.] http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png

Time Expression and Named Entity RecognitionData Analysis

Loading next page...
 
/lp/springer-journals/time-expression-and-named-entity-recognition-data-analysis-X8qGe0bFvV
Publisher
Springer International Publishing
Copyright
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021
ISBN
978-3-030-78960-2
Pages
35 –46
DOI
10.1007/978-3-030-78961-9_3
Publisher site
See Chapter on Publisher Site

Abstract

[We analyze four diverse datasets about time expressions for their intrinsic characteristics and find five such common characteristics; similarly we analyze two well-known benchmark datasets about named entities for their intrinsic characteristics and find three such common characteristics. For the common characteristics of time expressions, firstly, most time expressions are very short, consisting of about 2 words on average; secondly, most time expressions contain at least one time-related word that can distinguish time expressions from common text; thirdly, only a small group of words are used to express time information; fourthly, words in time expressions demonstrate similar syntactic behaviour; and finally, time expressions are formed by loose structure, with more than 53.5% of time tokens appearing in different positions within time expressions. For the common characteristics of named entities, firstly, most named entities contain uncommon words, which mainly appear in named entities and hardly appear in common text; secondly, named entities are mainly made up of proper nouns, with more than 84.8% of proper nouns appear in named entities under the whole text and more than 80.1% of the words are proper nouns within named entities; thirdly, named entities are formed by loose structure, with more than 53.77% of distinct words that appear in different positions within named entities.]

Published: Jun 7, 2021

Keywords: Short time expressions; Time token; Loose structure; Proper nouns; Small group of time words

There are no references for this article.