Access the full text.
Sign up today, get DeepDyve free for 14 days.
A central issue in speech recognition is which basic units of speech are extracted by the auditory system and used for lexical access. One suggestion is that complex acoustic-phonetic information is mapped onto abstract phonological representations of speech and that a finite set of phonological features is used to guide speech perception. Previous studies analyzing the N1m component of the auditory evoked field have shown that this holds for the acoustically simple feature place of articulation. Brain magnetic correlates indexing the extraction of acoustically more complex features, such as lip rounding (ROUND) in vowels, have not been unraveled yet. The present study uses magnetoencephalography (MEG) to describe the spatial-temporal neural dynamics underlying the extraction of phonological features. We examined the induced electromagnetic brain response to German vowels and found the event-related desynchronization in the upper beta-band to be prolonged for those vowels that exhibit the lip rounding feature (ROUND). It was the presence of that feature rather than circumscribed single acoustic parameters, such as their formant frequencies, which explained the differences between the experimental conditions. We conclude that the prolonged event-related desynchronization in the upper beta-band correlates with the computational effort for the extraction of acoustically complex phonological features from the speech signal. The results provide an additional biomagnetic parameter to study mechanisms of speech perception. Background production by means of articulatory properties [5] and in It is still unresolved which are the basic units of speech speech perception by using various acoustic/phonetic that are extracted by the auditory system and used for lex- parameters in speech sounds [6,7]. From a linguistic point ical access (see [1-3] for an overview of models of speech of view, such a set of 12 to 14 abstract and hierarchically comprehension). To address this problem in the initial organized features seems to be sufficient to explain the steps of speech processing, the concept of distinctive fea- robustness and efficiency of speech recognition under var- tures has been proposed many years ago by Jakobson, ious conditions like different speakers, contexts or differ- Fant & Halle [4]. That is, a set of abstract, phonological ent levels of environmental noise [8]. However, whether features allows the identification of acoustically quite var- the human brain uses a similar approach in speech recog- iable exemplars of classes of speech sounds. These distinc- nition is still a matter of debate. Supporting evidence tive or phonological features can be described in speech comes from studies of phonological feature discrimina- Page 1 of 9 (page number not for citation purposes) Behavioral and Brain Functions 2007, 3:26 http://www.behavioralandbrainfunctions.com/content/3/1/26 tion, e.g. voice-onset time differences [9]; which has neu- the induced brain activity might therefore unravel proc- rophysiological support from monkey studies; [10], or esses related to the extraction of this complex feature. differences in the place of articulation in vowels [11-13]. In the latter study, it has been shown that around 100 ms When analyzing the induced brain activity, the outcome post stimulus onset (i.e., in the N100m component of the can be a reduction or an enhancement of the spectral auditory evoked magnetic field) slightly different patches power relative to the baseline interval in defined time-fre- in the auditory cortex were activated during the extraction quency bins. An enhancement is usually interpreted as an of the mutually exclusive place of articulation-features event-related synchronization of neural activity which dorsal and coronal. However, other phonological features might index feature binding processes [14,17-19], in which the vowels differed as well like lip rounding whereas an reduction in spectral power is usually inter- (ROUND) did not systematically influence the N100m. preted as an event-related desynchronization (ERD) of neural activity. Based on the observation that increased The present study reanalyzes the raw data of the Obleser cellular excitability in thalamo-cortical networks leads to et al. [12] study with the aim to describe further parame- desynchronization in the EEG [20], it has been proposed ters indexing the extraction of other phonological fea- that the ERD is correlated with the amount of cortical tures. For this purpose and in contrast to the original activity [15], i.e. the more information processing is going strategy of data analysis, we studied the so-called induced on the larger is the ERD. Factors like task complexity, brain activity, which is time- but not phase-locked to the attention or mental effort modulated the ERD in the event-related brain activity (for review see [14-16]. This expected way (see [15] for review). The interpretation of data analysis technique visualizes the non-phase locked ERD is depending on the frequency band, where alpha brain activity which is largely averaged out when using ERD is usually interpreted as increased cortical activation conventional averaging techniques such as event-related and gamma ERD as an index for cortical deactivation. A potentials or fields. In sum, the evoked and the induced recent overview over changes in the induced brain activity brain activity are delivering complementary information during language processing is given by Bastiaansen and about perceptual processes. The rationale for the alterna- Hagoort [21]. tive way of data analysis is the following: The more com- plex the stimuli and the more acoustic variance is in the Based on these considerations, we analyzed the induced stimuli belonging to one particular experimental condi- brain responses during active listening to natural speech tion and the higher the level of information processing sounds and detecting target vowels. In order to accom- under investigation, the higher we expect the variance of plish this task, extraction and categorization of all crucial the timing of a particular cognitive process to be. In the features is inevitable. Consecutively, we would expect for present study with six tokens for each of the seven spoken the present study a more intense ERD when the extraction German vowels, it is reasonable to assume such variabil- of acoustically complex features such as round is required ity, especially for abstract phonological features where as compared to the processing of unrounded speech several acoustic low-level features have to be integrated. sounds. A more intense ERD can be reflected in a larger amplitude and/or a longer duration of the ERD [22-24]. Different phonological features seem to differ with respect to the required amount of acoustic feature integration. For Methods The subjects, stimulus material, experimental design and instance, tongue height can presumably be defined on the basis of a cut-off criterion for the first formants' frequency. data collection was reported in detail in [12]. From the Other phonological features, like ROUND, are more diffi- analysis of the MEG raw data on, the present study differs cult to handle. It is widely accepted that the third form- from the previous reported one. We will first shortly sum- ant's frequency is important for perceiving ROUND, but marize the basic methodological information and then this formant alone is not enough for the perfect detection describe the analyses of the induced brain activity. of ROUND in speech signals (for illustration see also Fig. 1). To our knowledge, no acoustic/phonetic parameter Subjects has been identified allowing a near-perfect detection of Data of 13 subjects (seven females) with a mean age of ROUND in speech. Recent automatic speech recognition 24.8 ± 4.6 years (M ± SD) were analyzed. As a good signal systems conceptually based on using phonological fea- to noise ratio is crucial for time-frequency analyses, we tures [8] are practically blind for ROUND, because it can- used only those subjects which formerly passed the crite- not be described as a linear combination of well defined ria for the source analyses. Unfortunately, one subject of acoustic/phonetic parameters. Consequently, ROUND is those could not be analyzed, because of technical prob- a candidate feature that requires higher level information lems in de-archiving the raw data in one of the recording processing, effects of which might be averaged out when blocks. None of the participants reported a history of neu- using phase-locked averaging techniques. The analysis of rological, psychiatric, or otological illness. All subjects Page 2 of 9 (page number not for citation purposes) Behavioral and Brain Functions 2007, 3:26 http://www.behavioralandbrainfunctions.com/content/3/1/26 out of words spoken by a male speaker. To prepare the stimuli, lists of words (several for each vowel) were artic- ulated and recorded with artificially long lasting vowels (~300 ms), so that segments free of any coarticulation could be extracted. In a pilot rating study, the 10 best for each vowel category with a typical variation of fundamen- tal and formant frequencies was selected. From the 10 kHz-digitized speech signal, 350 ms portions containing only the steady-state vowel were used. Stimuli were first equalized for the root mean square (rms), than normal- ized for peak amplitude and finally ramped with 50 ms Gaussian on- and offsets. Pitch frequency (119 ± 10 Hz, M ± SD) and formant frequencies varied within vowel cate- gories (cf. Fig. 1 and Table 1), thus introducing consider- able acoustic diversity. Vowels were presented binaurally with 50 dB SL via a non- magnetic echo-free stimulus delivery system with almost linear frequency characteristic in the critical range of 0.2– 4 kHz. Vowels were aligned in pseudo-randomized sequences of 572 stimuli with a stimulus onset asyn- chrony ranging randomly between 1.6 and 2.0 s. Every subject listened to three of such sequences. To sustain sub- jects' attention to the stimuli, a target detection task was employed: In every sequence, two given vowels had a low probability of occurrence (10% together) and served as targets. Subjects had to press a button with their right Upp ( (x axis us Figure 1 y axis, loga eder panel: A vowel space plo , logari rith thmic di mic display) splay) is against second formant frequency sho tting first formant frequency wn for all vowel exemplars index finger when they detected a target. As all vowel cat- Upper panel: A vowel space plotting first formant frequency egories exhibited acoustic variance, subjects had to map (y axis, logarithmic display) against second formant frequency (x axis, logarithmic display) is shown for all vowel exemplars stimuli onto vowel categories to decide whether a given used. Lower panel: Third formant frequency (y axis, logarith- stimulus is a target or not, i.e. subjects had to maintain a mic display) against second formant frequency (x axis, loga- phonological processing mode throughout the experi- rithmic display) is shown. Note the considerable acoustic ment. The reasons to use 2 vowels as targets in one block variance within vowel categories, and that no single formant and different vowels across blocks are outlined below: dimension alone predicts the roundedness. (1) We wanted to have at best all vowels serving as possi- ble targets across the whole experiment to avoid that a cer- were monolingual native speakers of German. Only right- tain anchor point in the vowel space would transform the handers were included, as ascertained by a right-handed- perceptual space. ness score > 90 in the Edinburgh Handedness Question- naire [25]. Subjects gave written informed consent and (2) We wanted to have more than one vowel as a target in were paid € 20 for their participation. one block of measurement (i) to reduce the possibility to carry out the task by a simplistic auditory pattern match- Stimulus material and experimental design ing strategy and thus making the phonetic processing We investigated brain responses to seven naturally spoken mode less salient; (ii) to avoid that a strategy of attending German vowels. Some of them were unrounded, [a], [e], to just one featural dimension in one block of measure- [i] (as in "father", "bay", "bee", respectively) and others ment (i.e. tongue height or place of articulation) would be were rounded, like [o], [u] (as in "doe" and "do"), or [ø], sufficient to solve the task – this was important to reduce [y] (as in "Goethe" and "Dürer"). The latter two vowels are the possible confound of attention to certain feature the rounded counterparts of the front vowels [e] and [i], dimensions. (3) We wanted to have all target vowels with and do not occur in English. The classification of the vow- the same probability being a target. els in terms of their phonological feature ROUND as well as their pitches and formant frequencies are given in Table Given those constraints and the odd number of vowels in 1 and Figure 1. For every vowel category, we selected six the study (to cover a substantial part of the German vowel tokens resulting in 42 different stimuli. Vowels were cut space and having a parametric variation along the featural Page 3 of 9 (page number not for citation purposes) Behavioral and Brain Functions 2007, 3:26 http://www.behavioralandbrainfunctions.com/content/3/1/26 n Figure 2 Time-frequency plots for the left and els over each hemisphere) are shown se the ri pg ah rately for u t hemisphe nre rou (ln ef ded vowels (upper pa t and right column; spectral power wa nels) and rounded s co vowels (lower panel) llapsed across 34 chan- Time-frequency plots for the left and the right hemisphere (left and right column; spectral power was collapsed across 34 chan- nels over each hemisphere) are shown separately for unrounded vowels (upper panels) and rounded vowels (lower panel). The grayscale intensity codes the standardized change of power in the respective frequency band compared to baseline. Note that the relative suppression in the 16–32 Hz band is sustained longer in rounded vowels, and that the relative changes are more pronounced over the left hemisphere. dimensions) we had to accept to avoid [ø] being a target. relative probability of only 98.3% and was never one of We hoped that the resulting complex target definition the two targets across the multiple blocks of measure- procedure will hide for the subjects the fact that [ø] had a Table 1: Overview over the assignment of the phonological feature ROUND as well as the pitch and formant frequency variability in the vowel categories used. Vowel category ROUND F min-max F min-max F min-max F min-max 0 1 2 3 [a] - 103–113 552–747 1188–1224 2663–3171 [i] - 127–132 267–287 2048–2120 2838–3028 [e] - 109–125 302–322 2055–2143 2711–2890 [y] + 115–144 238–248 1516–1769 1978–2097 [ø] + 108–125 301–325 1293–1447 1945–2079 [u] + 112–118 231–256 522–645 2117–2292 [o] + 109–125 293–346 471–609 2481–2688 F0 refers to the pitch and F1, F2, F3 to the corresponding formant frequencies. Page 4 of 9 (page number not for citation purposes) Behavioral and Brain Functions 2007, 3:26 http://www.behavioralandbrainfunctions.com/content/3/1/26 ment. Indeed, none of the subjects reported after the for the present study), 6 pairs of channels located along an experiment, that [ø] was never a target anterior-posterior line capturing the maximum spectral power changes were used for more detailed topographical Prior to the recordings, subjects repeated vowel stimuli analyses (see also Fig. 3 &4). In order to account for small aloud and recognized all stimuli as typical German vow- alterations in the individual topography and to enhance els. Binaural loudness was slightly readjusted to ensure the signal-to-noise ratio, the inferior and superior record- the auditory perception in the midline. Subjects watched ing channel of each pair were collapsed (see upper panels silent videos in order to maintain constant alertness and in Fig. 4 for illustration). to reduce excessive eye movements. Statistical analysis Data acquisition and reduction Data from the upper beta-band (frequency band around Neuromagnetic fields were recorded using a whole head 24 Hz, which showed the maximum induced spectral system (MAGNES 2500,4D Neuroimaging, San Diego) in power changes in higher frequency bands across all exper- a magnetically shielded room (Vacuumschmelze, Hanau, imental conditions) at two different time bins (covering Germany). The measuring surface of the sensor is helmet the maximum induced spectral power changes) were ini- shaped and covers the entire cranium. Within the sensor, tially compared in a 2 × 2 × 7 repeated measures analysis 148 magnetometer signal detectors are arranged in a uni- of variance with factors TIME BIN (450 ms, 550 ms), formly distributed array spaced by 28 mm. Subjects were HEMISPHERE (left channels, right channels) and VOWEL measured in lying position. They were instructed to avoid CATEGORY (all seven vowels). The first time bin was cho- eye blinks and head movements, and to carry out the sen in a way to cover the maximum ERD across all chan- monitoring task carefully. Continuous data were recorded nels and subjects in all conditions. The second time bin in 20-minute blocks at a sampling rate of508.6 Hz within was then shifted 100 ms to reduce the overlap of time win- a pass band of 0.1 to 100 Hz. dows for the spectral power estimate while still covering the cognitive process of interest. An amplitude difference For analyses of the induced brain activity, a method simi- between the experimental conditions in the second but lar to event-related perturbation analyses [26] was applied not the first time window (as reflected in an interaction) using the avg_q software [27]. In artifact-free epochs of would index a prolonged cognitive process of interest in a MEG raw data (signal deviations of more than 3.5 pT in subset of conditions. the MEG or erroneous button presses on non-target vow- els as well as target vowels were excluded), power spectral A specific test for the influence of the ROUND was then estimates were derived from Fourier transforms on pairs performed in a reduced 2 × 2 × 2 design with factors time (overlapping by one half) of 188.75 ms Welch-windowed bin (450 ms, 550 ms), hemisphere (left channel group, segments. Under these constraints, a frequency resolution right channel group) and round (mean of unrounded of 7.94 Hz was obtained. Power estimates were selectively vowels [a, e, i] vs. mean of rounded vowels [o, u, ø, y]). averaged for each segment around stimulus onset and stimulus class. Nine time segments were situated equidis- Additionally, a topographical change over time was tested tantly within a 600-ms interval before stimulus onset by introducing an additional six-fold factor topography (baseline), and 21 segments after stimulus onset within (six pairs of channels ranging from posterior-temporal to the total interval of 1700 ms. The mean power spectra anterior-temporal sites). were transformed with respect to baseline-related changes. Normalized mean power spectra were calculated We also tested whether the selective beta-band desynchro- by dividing each single mean spectral power estimate nization was systematically different from the N100m- within one time-frequency bin by the mean spectral related topography. Therefore, the ERD topography, as power estimate across all corresponding baseline seg- represented by the ERD change along the anterior-poste- ments. rior line was compared with the enhancement of normal- ized spectral power in the time/frequency bin centered To further reduce the data and to obtain the most relevant around 100 ms and 8 Hz in the same pairs of channels. time/frequency bins based on estimates that are not For comparability, we standardized the percentage of affected by small alterations in the location or orientation power change in both bands according to McCarthy and of the generating brain regions, and that show little Wood's transformation [28] and then tested a 2 × 2 × 6 × dependency on individual variations of brain anatomy, 7 repeated measures analysis of variance with factors time- the normalized spectral power was collapsed across 34 frequency bin (24 Hz around 450 ms, 8 Hz around 100 recording channels centered over the left and 34 over the ms), hemisphere (left channels, right channels), topogra- right hemisphere, respectively [12]. For the time/fre- phy (six pairs of channels ranging from posterior-tempo- quency bin of interest (which was in the upper beta-band ral to anterior-temporal sites) and vowel category (all Page 5 of 9 (page number not for citation purposes) Behavioral and Brain Functions 2007, 3:26 http://www.behavioralandbrainfunctions.com/content/3/1/26 Top views (pr 6 Figure 3 50 ms post stimulus onset a ojection onto ar pla e shown for un ne; nose is on top and right side on the right) rounded (upper row) and rounded of 16–32 vowels (lo Hz band topog wer row) raphies from ~100 to Top views (projection onto a plane; nose is on top and right side on the right) of 16–32 Hz band topographies from ~100 to 650 ms post stimulus onset are shown for unrounded (upper row) and rounded vowels (lower row). The event-related desyn- chronization (ERD, blue) is markedly sustained in rounded vowels and most prominent over left anterior sites (scaling 110 - 90% relative power change, color steps indicating 2% change; ERS, event-related synchronization). The grey background high- lights the time range with the most pronounced ERD. seven vowels). Where necessary (i.e. where a violation of the 550 ms time bin. However, not all vowel categories the homogeneity of variances assumption was evident by behaved similarly over time [vowel category × time bin Mauchly's criterion), Greenhouse-Geisser-adjusted p-val- interaction; F(6,72) = 2.3, p < .05]. This permitted a more ues are reported. specific testing of the influence of ROUND. The time bin × round interaction attained significance [F(1,12) = 11.3, p < .01]: It was only in rounded vowels that the ERD was Results As Figure 2 shows, the event-related desynchronization sustained (Figs. 2, 3; lower panels) across both time bins (ERD) changed as a function of time bin [F(1,12) = 7.0, p (normalized 24 Hz band power 0.949 at 450 ms; 0.950 at < .03], in that the beta-band power regained amplitude in 550 ms), whereas no ERD in the later time bin was present Time courses of relativ vowel categories Figure 4 e changes in 24 Hz band power from left- and right-hemispheric anterior sites are displayed for all seven Time courses of relative changes in 24 Hz band power from left- and right-hemispheric anterior sites are displayed for all seven vowel categories. As best seen in the highlighted box, unrounded vowels (thick lines) do not show the sustained suppression as it is evident in rounded vowels (dashed lines). Head shapes in the upper panels illustrate the location of the displayed channels in the sensor array. Page 6 of 9 (page number not for citation purposes) Behavioral and Brain Functions 2007, 3:26 http://www.behavioralandbrainfunctions.com/content/3/1/26 for unrounded vowels (0.947 at 450 ms; 0.981 at 550 spectral power in the alpha-band. The topographical dif- ms). ference suggests that the ERD in the upper beta-band is generated in different brain areas. The prolonged ERD in As demonstrated in Figure 3, the ROUND-related ERD the upper beta-band during the extraction of the feature was most pronounced over left anterior sites. To further ROUND was evident for the general comparison amongst specify the topographical aspects of the ERD, the power all vowels (cf. Fig. 4). change over hemispheres, time bins and round was tested by introducing the additional repeated measures factor Event-related desynchronization has been proposed to be topography (6 pairs of channels ranging from posterior- correlated with the amount of cortical activity, and factors temporal to anterior-temporal sites). This analysis yielded like task complexity, attention or mental effort do modu- a ROUND × topography interaction [F(5,60) = 5.6, p < late the ERD [15]. In general, enhanced mental effort for .001]: The ERD topography gradient with stronger ERD various reasons leads to a more pronounced ERD. Thus, over more anterior sites was manifest in rounded vowels the prolonged ERD in the upper beta-band during the [F(5,60) = 6.6, ε = .34, p < .01] but not in unrounded vow- extraction of the feature ROUND as found in the present els [F(5,60) = 1.7, ε = .38, p > .20]. study can be interpreted most suitably as an index of the enhanced computational effort for the extraction of an Left-hemispheric ERD effects were generally stronger acoustically complex phonological feature. Interestingly, [F(1,12) = 38.0, p < .0001], but a topography × hemi- in automatic speech recognition, researchers are faced sphere interaction was also evident [F(5,60) = 13.2, ε = with a similar problem and to our knowledge phoneti- .42, p < .0001]: channel sites in the right hemisphere cians have no reliable algorithm for the detection of showed no topographic effects whatsoever (topography ROUND in speech. More simple phonological features, effect n.s.; cf. Fig. 3). Figure 4 shows the power change in such as place of articulation and tongue height are highly the 24 Hz band over time for the same pair of channels correlated with changes in formant frequencies F1 and F2 over the left and the right hemisphere and illustrates the [29] and can accordingly be handled by automatic speech hemispheric asymmetry of this round-related ERD proc- recognition [8]. Accordingly, those features were also ess. investigated more successfully in neuroimaging studies [12,13,30,31]. However, lip rounding appears to require To ensure that the selective beta-band ERD was systemat- more complex operations along the auditory pathway ically different from the topography of the N1m-P2m resulting in more subtle effects requiring alternative meth- complex which exhibits the highest spectral power in the ods of data analyses. Analysis of the induced brain activity 5–15 Hz range, the ERD power in the 24 Hz/400 ms-cen- seems to be a feasible tool to study these processes. tered time-frequency bin was compared to the enhance- ment of normalized spectral power in the time-frequency The study supports the notion of beta-band ERD as a cor- bin centered around 100 ms and 8 Hz across the same relate of mental effort and provides an additional biomag- pairs of channels. A topography × time-frequency bin netic parameter for studies of speech perception. This is to interaction proved that there were topographical differ- our knowledge the first study reporting a brain signature ences between the N100m-related activity and the ERD for the detection of ROUND in speech: all parameters response [F(5,60) = 16.1, ε = .28, p < .001]. Signal change reported so far including those of our own analyses of the was also largest over anterior-temporal sites in the 100 evoked brain activity in the same set of raw data [12] were ms/8 Hz bin, but exhibited a second peak in signal change not sensitive to the phonological feature ROUND. over posterior-temporal sites, whereas the ERD response showed only the described anterior-temporal peak in the Intriguingly, a previous ERD mapping study reported an left hemisphere (Figs. 3, 4). enhanced beta-band suppression in response to words over fronto-temporal electrodes roughly similar to the anterior channels showing most vigorous responses in Discussion The present study showed a prolonged event-related present MEG data [24]. Roughly the same brain region has desynchronization in the upper beta-band of the induced been involved in tasks requiring auditory working mem- brain activity whenever the phonological feature ROUND ory [32,33]. In this task, Kaiser and colleagues reported, was present in naturally spoken vowels. Due to the com- however, enhanced gamma-band synchronization. In plex distribution of the acoustic features in the speech sum, the conclusion that beta-band ERD mirrors sounds, this effect can be best explained by the presence enhanced processing load appears justified for neural net- of that feature rather than circumscribed single acoustic works dedicated to the auditory input-to-meaning map- parameters, such as their formant frequencies (Fig. 1). The ping. topography of this effect is different from that induced by the N1m-P2m-complex which is reflected as an enhanced Page 7 of 9 (page number not for citation purposes) Behavioral and Brain Functions 2007, 3:26 http://www.behavioralandbrainfunctions.com/content/3/1/26 2. Norris D, McQueen JM, Cutler A: Merging information in speech The effect of lip rounding we see here occurs comparably recognition: feedback is never necessary. Behav Brain Sci 2000, late, especially so in comparison to the place of articula- 23:299-325. tion feature which has been suggested to affect the N100m 3. Hickok G, Poeppel D: The cortical organization of speech processing. Nat Rev Neurosci 2007. [12] and to a certain extent even the P50m [34] compo- 4. Jakobson R, Fant G, Halle M: Preliminaries to speech analysis nent of the evoked field. Why should a more subtle vowel :the distinctive features and their correlates. Cambridge, Mass: MIT Pr; 1976. feature change exhibit its influence much later in the 5. Chomsky N, Halle M: The Sound Pattern of English. New York: speech sound decoding process? Besides any potential Harper and Row; 1968. temporally blurring effects of the comparably broad 6. Fant G: Speech sounds and features. Cambridge, Mass: MIT Pr; Welsh window chosen for increased accuracy in frequency 7. Stevens KN: Toward a model for lexical access based on estimation (189 ms), it is worth remembering that listen- acoustic landmarks and distinctive features. J Acoust Soc Am ing to isolated vowels in order to categorize them and to 2002, 111:1872-1891. 8. Lahiri A, Reetz H: Underspecified Recognition. In Laboratory Pho- react accordingly might involve a cascade of evaluation nology VII Edited by: Gussenhoven C, Warner N. Berlin: Mouton; and re-evaluation processes while disambiguating the 2002:637-75. 9. Phillips C, Pellathy T, Marantz A, Yellin E, Wexler K, Poeppel D, vowel percept. These processes might or might not be nec- McGinnis M, Roberts T: Auditory cortex accesses phonological essary in real life speech recognition (depending on categories: an MEG mismatch study. J Cogn Neurosci 2000, whether or not such a minute feature is needed to disam- 12:1038-1055. 10. Steinschneider M, Volkov IO, Noh MD, Garell PC, Howard MA III: biguate a lexical item), and this should be put to test using Temporal encoding of the voice onset time phonetic param- minimal word pairs rather than isolated vowels. In gen- eter by field potentials recorded directly from human audi- tory cortex. J Neurophysiol 1999, 82:2346-2357. eral, however, it seems fruitful to distinguish speech per- 11. Obleser J, Lahiri A, Eulitz C: Auditory-evoked magnetic field ception tasks as the one employed here (which require codes place of articulation in timing and topography around attention to subphonemic detail and may strongly involve 100 milliseconds post syllable onset. Neuroimage 2003, 20:1839-1847. phonological working memory and other supporting cog- 12. Obleser J, Lahiri A, Eulitz C: Magnetic brain response mirrors nitive processes depending on the nature of the task, see extraction of phonological features from spoken vowels. J [3] for further reference) from speech recognition tasks Cogn Neurosci 2004, 16:31-39. 13. Eulitz C, Lahiri A: Neurobiological evidence for abstract pho- which inevitably involve access to the mental lexicon. nological representations in the mental lexicon during Hence, bringing these levels of speech processing together speech recognition. J Cogn Neurosci 2004, 16:577-583. 14. Tallon-Baudry C, Bertrand O: Oscillatory gamma activity in and utilizing subtle featural differences such as lip round- humans and its role in object representation. Trends Cogn Sci ing in experiments that do require natural speech recogni- 1999, 3:151-162. tion will help to disambiguate the role of the beta ERD 15. Pfurtscheller G, Lopes da Silva FH: Event-related EEG/MEG syn- chronization and desynchronization: basic principles. Clin seen here. Neurophysiol 1999, 110:1842-1857. 16. Makeig S, Debener S, Onton J, Delorme A: Mining event-related This experiment cannot provide automatic speech recog- brain dynamics. Trends Cogn Sci 2004, 8:204-210. 17. Pulvermuller F: Words in the brain's language. Behav Brain Sci nition systems with an algorithm for the detection of 1999, 22:253-279. ROUND in speech, but it has become clear that such an 18. Engel AK, Fries P, Singer W: Dynamic predictions: oscillations and synchrony in top-down processing. Nat Rev Neurosci 2001, algorithm will have to be more complex than those for the 2:704-716. detection of the other phonological features in the vowels. 19. Varela F, Lachaux JP, Rodriguez E, Martinerie J: The brainweb: Since telling an [y] from an [i] is nevertheless accom- phase synchronization and large-scale integration. Nat Rev Neurosci 2001, 2:229-239. plished with great ease by all speakers of languages with 20. Steriade M, Llinas RR: The functional states of the thalamus and umlauted vowels, the results reported here are encourag- the associated neuronal interplay. Physiol Rev 1988, 68:649-742. ing to strive for new and more derivative parameters 21. Bastiaansen M, Hagoort P: Oscillatory neuronal dynamics during language comprehension. Prog Brain Res 2006, 159:179-196. reflecting the speech perception process: Understanding 22. Pfurtscheller G, Klimesch W: Topographical display and inter- the brain's effortless decoding of these acoustically more pretation of event-related desynchronization during a visual- verbal task. Brain Topogr 1990, 3:85-93. complex features in speech will ultimately lead to a more 23. Eulitz C, Maess B, Pantev C, Friederici AD, Feige B, Elbert T: Oscil- general model of the human speech perception faculty. latory neuromagnetic activity induced by language and non- language stimuli. Brain Res Cogn Brain Res 1996, 4:121-132. 24. Lebrun N, Clochon P, Etevenon P, Lambert J, Baron JC, Eustache F: Acknowledgements An ERD mapping study of the neurocognitive processes The authors gratefully acknowledge the help of Barbara Awiszus in collect- involved in the perceptual and semantic analysis of environ- ing data. Henning Reetz and Michaela Schlichtling helped record and edit mental sounds and words. Brain Res Cogn Brain Res 2001, the stimulus material, and Bernd Feige provided recent versions of avg_q 11:235-248. 25. Oldfield RC: The assessment and analysis of handedness: the data analysis software. Research was supported by grants of the German Edinburgh inventory. Neuropsychologia 1971, 9:97-113. Science Foundation to C.E. (SFB 471, FOR 348), the Leibniz-Prize awarded 26. Makeig S: Auditory event-related dynamics of the EEG spec- to Aditi Lahiri and a post-doctoral grant to J.O. (Landesstiftung Baden- trum and effects of exposure to tones. Electroencephalogr Clin Württemberg). Neurophysiol 1993, 86:283-293. 27. Feige B: Oscillatory brain activity and its analysis on the basis of MEG and EEG. Münster: Waxmann; 1999. References 1. McClelland JL, Mirman D, Holt LL: Are there interactive proc- esses in speech perception? Trends Cogn Sci 2006, 10:363-369. Page 8 of 9 (page number not for citation purposes) Behavioral and Brain Functions 2007, 3:26 http://www.behavioralandbrainfunctions.com/content/3/1/26 28. McCarthy G, Wood CC: Scalp distributions of event-related potentials: an ambiguity associated with analysis of variance models. Electroencephalogr Clin Neurophysiol 1985, 62:203-208. 29. Peterson G, Barney H: Control Methods Used in a Study of the Vowels. J Acoust Soc Am 1952, 24:175-184. 30. Obleser J, Boecker H, Drzezga A, Haslinger B, Hennenlotter A, Roet- tinger M, Eulitz C, Rauschecker JP: Vowel sound extraction in anterior superior temporal cortex. Hum Brain Mapp 2006, 27:562-571. 31. Obleser J, Scott SK, Eulitz C: Now You Hear It, Now You Don't: Transient Traces of Consonants and their Nonspeech Ana- logues in the Human Brain. Cereb Cortex 2006, 16:1069-1076. 32. Kaiser J, Ripper B, Birbaumer N, Lutzenberger W: Dynamics of gamma-band activity in human magnetoencephalogram during auditory pattern working memory. Neuroimage 2003, 20:816-827. 33. Kaiser J, Lutzenberger W: Frontal gamma-band activity in mag- netoencephalogram during auditory oddball processing. Neuroreport 2004, 15:2185-2188. 34. Tavabi K, Obleser J, Dobel C, Pantev C: Auditory evoked fields differentially encode speech features: An MEG investigation of the P50m and N100m time courses during syllable processing. Eur J Neurosci 2007 in press. Publish with Bio Med Central and every scientist can read your work free of charge "BioMed Central will be the most significant development for disseminating the results of biomedical researc h in our lifetime." Sir Paul Nurse, Cancer Research UK Your research papers will be: available free of charge to the entire biomedical community peer reviewed and published immediately upon acceptance cited in PubMed and archived on PubMed Central yours — you keep the copyright BioMedcentral Submit your manuscript here: http://www.biomedcentral.com/info/publishing_adv.asp Page 9 of 9 (page number not for citation purposes)
Behavioral and Brain Functions – Springer Journals
Published: Jun 1, 2007
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.