Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Speech Processing in Mobile EnvironmentsSpotting and Recognition of Consonant–Vowel Units from Continuous Speech

Speech Processing in Mobile Environments: Spotting and Recognition of Consonant–Vowel Units from... [Automatic speech recognition is the process of converting speech into text. It is carried out by transforming speech signal into a sequence of symbols by using acoustic models, and converting this sequence of symbols into text by using a language model. Two approaches are commonly used for speech recognition. The first approach is based on word-level matching using word models, and then using a language model. The major drawback of this approach is to develop word models for all words of a language. In a language generally the number of words will be of order 105–106. The second approach is based on segmenting speech into subword units, and labeling them using a subword unit recognizer. The limitation of this approach lies in accurate segmentation of speech into subword units of varying durations. An approach to continuous speech recognition by spotting consonant–vowel (CV) units is presented in literature in the context of Indian languages. This approach is based on the detection of vowel onset points (VOPs) and labeling the segments around the VOPs using a CV recognizer. The major issues in this approach are accurate detection of VOPs and labeling the regions around these VOPs. In literature AANN models are used for the detection of VOPs with 30% and 6% missed and spurious rates, respectively. The performance of CV spotting and recognition using AANN models is significantly low due to inaccurate detection of VOPs. In this chapter, we propose an approach for spotting and recognition of CV units from continuous speech using accurate VOPs. Here, VOPs are determined using two-stage approach. In stage-1, VOPs are determined using the evidences from excitation source, spectral energy, and modulation spectrum of the speech segments. In stage-2, VOPs determined in stage-1 are verified and the genuine VOPs are positioned accurately using the deviation between successive epoch intervals.] http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png

Speech Processing in Mobile EnvironmentsSpotting and Recognition of Consonant–Vowel Units from Continuous Speech

Loading next page...
 
/lp/springer-journals/speech-processing-in-mobile-environments-spotting-and-recognition-of-gsB69a8QrD
Publisher
Springer International Publishing
Copyright
© Springer International Publishing Switzerland 2014
ISBN
978-3-319-03115-6
Pages
65 –76
DOI
10.1007/978-3-319-03116-3_5
Publisher site
See Chapter on Publisher Site

Abstract

[Automatic speech recognition is the process of converting speech into text. It is carried out by transforming speech signal into a sequence of symbols by using acoustic models, and converting this sequence of symbols into text by using a language model. Two approaches are commonly used for speech recognition. The first approach is based on word-level matching using word models, and then using a language model. The major drawback of this approach is to develop word models for all words of a language. In a language generally the number of words will be of order 105–106. The second approach is based on segmenting speech into subword units, and labeling them using a subword unit recognizer. The limitation of this approach lies in accurate segmentation of speech into subword units of varying durations. An approach to continuous speech recognition by spotting consonant–vowel (CV) units is presented in literature in the context of Indian languages. This approach is based on the detection of vowel onset points (VOPs) and labeling the segments around the VOPs using a CV recognizer. The major issues in this approach are accurate detection of VOPs and labeling the regions around these VOPs. In literature AANN models are used for the detection of VOPs with 30% and 6% missed and spurious rates, respectively. The performance of CV spotting and recognition using AANN models is significantly low due to inaccurate detection of VOPs. In this chapter, we propose an approach for spotting and recognition of CV units from continuous speech using accurate VOPs. Here, VOPs are determined using two-stage approach. In stage-1, VOPs are determined using the evidences from excitation source, spectral energy, and modulation spectrum of the speech segments. In stage-2, VOPs determined in stage-1 are verified and the genuine VOPs are positioned accurately using the deviation between successive epoch intervals.]

Published: Dec 20, 2013

Keywords: Speech Signal; Recognition Performance; Continuous Speech; Noisy Speech; TIMIT Database

There are no references for this article.