Recent Forum Posts
From categories:
page »

Zhang, H., Zhang, J., Peng, G., Ding, H., & Zhang, Y. (2020). Bimodal benefits revealed by categorical perception of lexical tones in Mandarin-speaking children with a cochlear implant and a contralateral hearing aid. Journal of Speech Language and Hearing Research.


Purpose: Pitch reception poses challenges for individuals with cochlear implants (CIs), and adding a hearing aid (HA) in the nonimplanted ear is potentially beneficial. The current study used fine-scale synthetic speech stimuli to investigate the bimodal benefit for lexical tone categorization in Mandarin-speaking kindergarteners using a CI and an HA in opposite ears.

Method: The data were collected from 16 participants who were required to complete two classical tasks for speech categorical perception (CP) with CI + HA device condition and CI alone condition. Linear mixed-effects models were constructed to evaluate the identification and discrimination scores across different device conditions.

Results: The bimodal kindergarteners showed CP for the continuum varying from Mandarin Tone 1 and Tone 2. Moreover, the additional acoustic information from the contralateral HA contributes to improved lexical tone categorization, with a steeper slope, a higher discrimination score of between-category stimuli pair, and an improved peakedness score (i.e., an increased benefit magnitude for discriminations of between-category over within-category pairs) for the CI + HA condition than the CI alone condition. The bimodal kindergarteners with better residual hearing thresholds at 250 Hz level in the nonimplanted ear could perceive lexical tones more categorically.

Conclusion: The enhanced CP results with bimodal listening provide clear evidence for the clinical practice to fit a contralateral HA in the nonimplanted ear in kindergarteners with unilateral CIs with direct benefits from the low-frequency acoustic hearing.

Zhang, L., Li, Y., Zhou, H., Zhang, Y., & Shu, H. (2020). Sentence context differentially modulates contributions of fundamental frequency contours to word recognition in Chinese-speaking children with and without dyslexia. Frontiers in Psychology, 11, 598658.


Previous work has shown that children with dyslexia are impaired in speech recognition in adverse listening conditions. Our study further examined how semantic context and fundamental frequency (F0) contours contribute to word recognition against interfering speech in dyslexic and non-dyslexic children. Thirty-two children with dyslexia and 35 chronological-age-matched control children were tested on the recognition of words in normal sentences versus wordlist sentences with natural versus flat F0 contours against single-talker interference. The dyslexic children had overall poorer recognition performance than non-dyslexic children. Furthermore, semantic context differentially modulated the effect of F0 contours on the recognition performances of the two groups. Specifically, compared with flat F0 contours, natural F0 contours increased the recognition accuracy of dyslexic children less than non-dyslexic children in the wordlist condition. By contrast, natural F0 contours increased the recognition accuracy of both groups to a similar extent in the sentence condition. These results indicate that access to semantic context improves the effect of natural F0 contours on word recognition in adverse listening conditions by dyslexic children who are more impaired in the use of natural F0 contours during isolated and unrelated word recognition. Our findings have practical implications for communication with dyslexic children when listening conditions are unfavorable.

Pang, W., Xing. H., Zhang, L., Shu, H., & Zhang, Y. (2020). Superiority of blind over sighted listeners in voice recognition. Journal of the Acoustical Society of America, 148(2), EL208-EL213.


The current study examined whether the blind are superior to sighted listeners in voice recognition. Three subject groups, including 17 congenitally blind, 18 late blind, and 18 sighted, showed no significant differences in the immediate voice recognition test. In the delayed test conducted two weeks later, however, both congenitally blind and late blind groups performed better than the sighted with no significant difference between the two blind groups. These results partly confirmed the anecdotal observation about the blind's superiority in voice recognition, which resides mainly in delayed memory phase but not in immediate recall and generalization phase.

Rao, A., Koerner, T.K., Madsen, B., & Zhang, Y. (2020). Investigating influences of medial olivocochlear efferent system on central auditory processing and listening in noise: A behavioral and event-related potential study. Brain Sciences, 10(7), 428.


This electrophysiological study investigated the role of the medial olivocochlear (MOC) efferents in listening in noise. Both ears of eleven normal-hearing adult participants were tested. The physiological tests consisted of transient-evoked otoacoustic emission (TEOAE) inhibition and the measurement of cortical event-related potentials (ERPs). The mismatch negativity (MMN) and P300 responses were obtained in passive and active listening tasks, respectively. Behavioral responses for the word recognition in noise test were also analyzed. Consistent with previous findings, the TEOAE data showed significant inhibition in the presence of contralateral acoustic stimulation. However, performance in the word recognition in noise test was comparable for the two conditions (i.e., without contralateral stimulation and with contralateral stimulation). Peak latencies and peak amplitudes of MMN and P300 did not show changes with contralateral stimulation. Behavioral performance was also maintained in the P300 task. Together, the results show that the peripheral auditory efferent effects captured via otoacoustic emission (OAE) inhibition might not necessarily be reflected in measures of central cortical processing and behavioral performance. As the MOC effects may not play a role in all listening situations in adults, the functional significance of the cochlear effects of the medial olivocochlear efferents and the optimal conditions conducive to corresponding effects in behavioral and cortical responses remain to be elucidated.

Keywords: event-related potential (ERP); medial olivocochlear (MOC) efferents; otoacoustic emissions inhibition; contralateral acoustic stimulation (CAS); MMN; P300

Lin, Y., Ding, H., & Zhang, Y. (2020). Multisensory Integration of Emotion in Schizophrenic Patients. Multisensory Research.


Multisensory integration (MSI) of emotion has been increasingly recognized as an essential element of schizophrenic patients’ impairments, leading to the breakdown of their interpersonal functioning. The present review provides an updated synopsis of schizophrenics’ MSI abilities in emotion processing by examining relevant behavioral and neurological research. Existing behavioral studies have adopted well-established experimental paradigms to investigate how participants understand multisensory emotion stimuli, and interpret their reciprocal interactions. Yet it remains controversial with regard to congruence-induced facilitation effects, modality dominance effects, and generalized vs specific impairment hypotheses. Such inconsistencies are likely due to differences and variations in experimental manipulations, participants’ clinical symptomatology, and cognitive abilities. Recent electrophysiological and neuroimaging research has revealed aberrant indices in event-related potential (ERP) and brain activation patterns, further suggesting impaired temporal processing and dysfunctional brain regions, connectivity and circuities at different stages of MSI in emotion processing. The limitations of existing studies and implications for future MSI work are discussed in light of research designs and techniques, study samples and stimuli, and clinical applications.

Miller, S. & Zhang, Y. (2020). Neural coding of syllable-final fricatives with and without hearing aid amplification. Journal of the American Academy of Audiology.


Background Cortical auditory event-related potentials are a potentially useful clinical tool to objectively assess speech outcomes with rehabilitative devices. Whether hearing aids reliably encode the spectrotemporal characteristics of fricative stimuli in different phonological contexts and whether these differences result in distinct neural responses with and without hearing aid amplification remain unclear.

Purpose To determine whether the neural coding of the voiceless fricatives /s/ and /ʃ/ in the syllable-final context reliably differed without hearing aid amplification and whether hearing aid amplification altered neural coding of the fricative contrast.

Research Design A repeated-measures, within subject design was used to compare the neural coding of a fricative contrast with and without hearing aid amplification.

Study Sample Ten adult listeners with normal hearing participated in the study.

Data Collection and Analysis Cortical auditory event-related potentials were elicited to an /ɑs/–/ɑʃ/ vowel-fricative contrast in unaided and aided listening conditions. Neural responses to the speech contrast were recorded at 64-electrode sites. Peak latencies and amplitudes of the cortical response waveforms to the fricatives were analyzed using repeated-measures analysis of variance.

Results The P2' component of the acoustic change complex significantly differed from the syllable-final fricative contrast with and without hearing aid amplification. Hearing aid amplification differentially altered the neural coding of the contrast across frontal, temporal, and parietal electrode regions.

Conclusions Hearing aid amplification altered the neural coding of syllable-final fricatives. However, the contrast remained acoustically distinct in the aided and unaided conditions, and cortical responses to the fricative significantly differed with and without the hearing aid.

hearing aid amplification - cortical auditory event-related potentials - fricatives

Zhang, L., Xie, S., Xing, H., Shu, H., & Zhang, Y. (2020). Perception of musical melody and rhythm as influenced by native language experience. Journal of the Acoustical Society of America, 147 (5), EL385-EL390.

Abstract: This study used the Musical Ear Test [Wallentin, Nielsen, Friis-Olivarius, Vuust, and Vuust (2010). Learn. Indiv. Diff. 20, 188–196] to compare musical aptitude of native Japanese and Chinese speakers. Although the two groups had similar overall accuracy, they showed significant differences in subtest performance. Specifically, the Chinese speakers outperformed the Japanese counterparts on the melody subtest, but the reverse was observed on the rhythm subtest. Within-group comparisons revealed that Chinese speakers performed better on the melody subtest than the rhythm subtest, while Japanese speakers showed an opposite trend. These results indicate that native language pitch and durational patterns of the listener can have a profound effect on the perception of music melody and rhythm, respectively, reflecting language-to-music transfer of learning.

There are a number of resources for dprime calculation.

Here are two open source packages with documentations and examples.

R package:

Matlab package:

Zhang, H., Zhang, J., Ding, H., & Zhang, Y. (2020). Bimodal benefits for lexical tone recognition: An investigation on Mandarin-speaking preschoolers with a cochlear implant and a contralateral hearing aid. Brain Sciences, 10, 238.

Pitch perception is known to be difficult for individuals with cochlear implant (CI), and adding a hearing aid (HA) in the non-implanted ear is potentially beneficial. The current study aimed to investigate the bimodal benefit for lexical tone recognition in Mandarin-speaking preschoolers using a CI and an HA in opposite ears. The child participants were required to complete tone identification in quiet and in noise with CI + HA in comparison with CI alone. While the bimodal listeners showed confusion between Tone 2 and Tone 3 in recognition, the additional acoustic information from the contralateral HA alleviated confusion between these two tones in quiet. Moreover, significant improvement was demonstrated in the CI + HA condition over the CI alone condition in noise. The bimodal benefit for individual subjects could be predicted by the low-frequency hearing threshold of the non-implanted ear and the duration of bimodal use. The findings support the clinical practice to fit a contralateral HA in the non-implanted ear for the potential benefit in Mandarin tone recognition in CI children. The limitations call for further studies on auditory plasticity on an individual basis to gain insights on the contributing factors to the bimodal benefit or its absence.

Keywords: lexical tones; bimodal benefit; speech learning; cochlear implant (CI); hearing aid (HA)

Lin, Y., Ding, H., & Zhang, Y. (Accepted). Prosody dominates over semantics in emotion word processing: Evidence from cross-channel and cross-modal Stroop effects. Journal of Speech, Language and Hearing Research.

Purpose: Emotional speech communication involves multisensory integration of linguistic (e.g., semantic content) and paralinguistic (e.g., prosody and facial expressions) messages. Previous studies on linguistic vs. paralinguistic salience effects in emotional speech processing have produced inconsistent findings. In the present study, we investigated the relative perceptual saliency of emotion cues in cross-channel auditory alone task (i.e., semantics-prosody Stroop task) and cross-modal audiovisual task (i.e., semantics-prosody-face Stroop task).

Method: Thirty normal Chinese adults participated in two Stroop experiments with spoken emotion adjectives in Mandarin Chinese. Experiment 1 manipulated auditory pairing of emotional prosody (happy or sad) and lexical semantic content in congruent and incongruent conditions. Experiment 2 extended the protocol to cross-modal integration by introducing visual facial expression during auditory stimulus presentation. Participants were asked to judge emotional information for each test trial according to the instruction of selective attention.

Results: Accuracy and reaction time data indicated that despite an increase in cognitive demand and task complexity in Experiment 2, prosody was consistently more salient than semantic content for emotion word processing and did not take precedence over facial expression. While congruent stimuli enhanced performance in both experiments, the facilitatory effect was smaller in Experiment 2.

Conclusion: Together, the results demonstrate the salient role of paralinguistic prosodic cues in emotion word processing and congruence facilitation effect in multisensory integration. Our study contributes tonal language data on how linguistic and paralinguistic messages converge in multisensory speech processing, and lays a foundation for further exploring the brain mechanisms of cross-channel/modal emotion integration with potential clinical applications.

Keywords: Multimodality, Stroop, Emotion word processing, Paralinguistic cues, Prosody, Facial expression

Miller, S. & Zhang, Y. (Accepted). Neural coding of syllable-final fricatives with and without hearing aid amplification. Journal of the American Academy of Audiology.

The results suggest that hearing aid amplification alters neural representations of syllable-final fricatives in a complex manner. Consistent with results for syllable-initial fricative sounds (Miller & Zhang, 2014), normal-hearing listeners were able to discriminate the contrast with ease, and their aided and unaided ACC components did significantly differ for /s/ versus /sh/, suggesting a differentiation of underlying cortical processing of the speech contrast that is sensitive to the use of hearing aids. Together, the ERP results revealed that hearing aids altered the cortical processing of fricative contrasts across the scalp in both onset and coda positions.

Acclimatization to hearing aid use when measured using longitudinal speech recognition scores has a long time course (Gatehouse 1992). Our results indicate that hearing aid signal processing altered the spectral and temporal properties of fricatives that corresponded with neural response changes to the contrast. Therefore, even though behavioral responses to the fricatives were unaffected by amplification, the brain would need to accommodate these acoustic changes from hearing aid signal processing to recognize the respective sound categories.

Fei Chen from the Department of Chinese and Bilingual Studies, the Hong Kong Polytechnic University is a current visiting scholar at Zhang Lab for the Fall semester 2019.
Research profile:
Google Scholar page:

Hong, T., Wang, J., Zhang, L., Zhang, Y., Shu, H., & Li, P. (2019). Age-sensitive associations of segmental and suprasegmental perception with sentence-level language skills in Mandarin-speaking children with cochlear implants. Research in Developmental Disabilities.


Background and aim: It remains unclear how recognition of segmental and suprasegmental phonemes contributes to sentence-level language processing skills in Mandarin-speaking children with cochlear implants (CIs). Our study examined the influence of implantation age on the recognition of consonants, lexical tones and sentences respectively, and more importantly, the contribution of phonological skills to sentence repetition accuracy in Mandarin-speaking children with CIs.

Methods: The participants were three groups of prelingually deaf children who received cochlear implants at various ages and their age-matched controls with normal hearing. Three tasks were administered to assess their consonant perception, lexical tone recognition and language skills in open-set sentence repetition.

Results: Children with CIs lagged behind NH peers in all the three tests, and performances on segmental, suprasegmental and sentence-level
processing were differentially modulated by implantation age. Furthermore, performances on recognition of consonants and lexical tones
were significant predictors of sentence repetition accuracy in the children with CIs.

Conclusion: Overall, segmental and suprasegmental perception as well as sentence-level processing is impaired in Mandarin-speaking children with CIs compared with age-matched children with NH. In children with CIs recognition of segmental and suprasegmental phonemes at the lower level predicts sentence repetition accuracy at the higher level. More importantly, implantation age plays an important role in the development of phonological skills and higher-order language skills, suggesting that age-appropriate aural rehabilitation and speech intervention programs need to be developed in order to help CI users who receive CIs at different ages.

What this paper adds?

Findings of this study contribute to better understanding of speech perception and language processing in Mandarin-speaking children with
cochlear implants (CIs). Specifically, our results demonstrate that performances of Mandarin-speaking children with CIs on segmental,
suprasegmental, and sentence-level processing were differentially modulated by implantation age. Furthermore, recognition of both
consonants and lexical tones contributes to sentence repetition accuracy in children with CIs. These findings have prognostic implications for
developing post-implant rehabilitation and intervention programs.

Kao, C., & Zhang, Y. (2019). Magnetic source imaging and infant MEG: Current trends and technical advances. Brain Sciences, 9, 181.

Abstract: Magnetoencephalography (MEG) is known for its temporal precision and good spatial resolution in cognitive brain research. Nonetheless, it is still rarely used in developmental research, and its role in developmental cognitive neuroscience is not adequately addressed. The current review focuses on the source analysis of MEG measurement and its potential to answer critical questions on neural activation origins and patterns underlying infants’ early cognitive experience. The advantages of MEG source localization are discussed in comparison with functional Magnetic Resonance Imaging (fMRI) and functional near-infrared spectroscopy (fNIRS), two leading imaging tools for studying cognition across age. Challenges of the current MEG experimental protocols are highlighted, including measurement and data processing, which could potentially be resolved by developing and improving both software and hardware. A selection of infant MEG research in auditory, speech, vision, motor, sleep, cross-modality, and clinical application is then summarized and discussed with a focus on the source localization analyses. Based on the literature review and the advancements of the infant MEG systems and source analysis software, typical practices of infant MEG data collection and analysis are summarized as the basis for future developmental cognitive research.

Keywords: Magnetoencephalography (MEG); infant; cognitive development; source localization; equivalent current dipole (ECD); minimum norm estimation (MNE)

Praat tools
Zhang LabZhang Lab 16 Jul 2019 08:17
in discussion Researchers / General issues » Praat tools

Here is a list of plugins and tools for Praat users to do speech (and nonspeech) analysis and synthesis.

Easy to learn and use

More advanced

More plugins with CPrAN manager

Praat tools by Zhang LabZhang Lab, 16 Jul 2019 08:17

Yu, K., Li, L., Chen, Y., Zhou, Y., Wang, R., Zhang, Y., & Li, P. (2019). Effects of native language experience on Mandarin lexical tone processing in proficient second language learners. Psychophysiology.

Abstract: Learning the acoustic and phonological information in lexical tones is significant for learners of tonal languages. Although there is a wealth of knowledge from studies of second language (L2) tone learning, it remains unclear how L2 learners process acoustic versus phonological information differently depending on whether their first language (L1) is a tonal language. In the present study, we first examined proficient L2 learners of Mandarin with tonal and non-tonal L1 in a behavioral experiment (identifying a Mandarin tonal continuum) to construct tonal contrasts that could differentiate the phonological from the acoustic information in Mandarin lexical tones for the L2 learners. We then conducted an event-related potential (ERP) experiment to investigate these learners’ automatic processing of acoustic and phonological information in Mandarin lexical tones by mismatch negativity (MMN). Although both groups of L2 learners showed similar behavioral identification features for the Mandarin tonal continuum as native speakers, L2 learners with non-tonal L1, as compared with both native speakers and L2 learners with tonal L1, showed longer reaction time to the tokens of the Mandarin tonal continuum. More importantly, the MMN data further revealed distinct roles of acoustic and phonological information on the automatic processing of L2 lexical tones between the two groups of L2 learners. Taken together, the results indicate that the processing of acoustic and phonological information in L2 lexical tones may be modulated by L1 experience with a tonal language. The theoretical implications of the current study are discussed in light of L2 speech learning.

Key words: Mandarin Chinese, L2 lexical tones, acoustic information, phonological information, mismatch negativity (MMN), L1 tonal experience

Cheng, B., Zhang, X., Fan, S., & Zhang, Y. (2019). The role of temporal acoustic exaggeration in high variability phonetic training: A behavioral and ERP study. Frontiers in Psychology (Auditory Cognitive Neuroscience). doi: 10.3389/fpsyg.2019.01178

Abstract: High variability phonetic training (HVPT) has been found to be effective in helping adult learners acquire nonnative phonetic contrasts. The present study investigated the role of temporal acoustic exaggeration by comparing the canonical HVPT paradigm without involving acoustic exaggeration with a modified adaptive HVPT paradigm that integrated key temporal exaggerations in infant-directed speech (IDS). Sixty native Chinese adults participated in the training of the English /i/ and /ɪ/ vowel contrast and were randomly assigned to three subject groups. Twenty were trained with the typical HVPT (the HVPT group), twenty were trained under the modified adaptive approach with acoustic exaggeration (the HVPT-E group), and twenty were in the control group. Behavioral tasks for the pre- and post- tests used natural word identification, synthetic stimuli identification, and synthetic stimuli discrimination. Mismatch negativity (MMN) responses from the HVPT-E group were also obtained to assess the training effects in within- and across-category discrimination without requiring focused attention. Like previous studies, significant generalization effects to new talkers were found in both the HVPT group and the HVPT-E group. The HVPT-E group, by contrast, showed greater improvement as reflected in larger progress in natural word identification performance. Furthermore, the HVPT-E group exhibited more native-like categorical perception based on spectral cues after training, together with corresponding training-induced changes in the MMN responses to within- and across- category differences. These data provide the initial evidence supporting the important role of temporal acoustic exaggeration with adaptive training in facilitating phonetic learning and promoting brain plasticity at the perceptual and pre-attentive neural levels.

Keywords: High variability phonetic training, categorical perception, mismatch negativity, second language learning, acoustic exaggeration

Funding: The research was supported in part by grants from the National Social Science Foundation of China (15BYY005). Yang Zhang additionally received support from
University of Minnesota’s Brain Imaging Grant to work on the manuscript.

Zhang, L., Jiang, W., Shu, H., & Zhang, Y. (In press). Congenital blindness enhances perception of musical rhythm more than melody in Mandarin speakers. Journal of the Acoustical Society of America.

Abstract: This study adopted the Musical Ear Test (Wallentin et al., 20103) to compare musical competence of sighted and congenitally blind Mandarin speakers. On the rhythm subtest, the blind participants outperformed the sighted. On the melody subtest, however, the two groups performed equally well. Compared with sighted speakers of non-tonal languages reported in previous studies (Wallentin et al., 2010; Bhatara et al., 2015), Furthermore, the sighted Mandarin speakers performed better than sighted speakers of non-tonal languages (i.e., Dutch and French) only on the melody subtest. These results indicate that tonal language experience and congenital blindness exert differential influences on musical aptitudes with rhythm perception reflecting a cross-modal compensation effect and melody perception dominated by a cross-domain language-to-music transfer effect.

Keywords: congenital blindness; Mandarin speakers; musical aptitudes; rhythm; melody 

Khosravani, S., Mahnan, A., Yeh, I., Watson, P. J., Zhang, Y., Goding, G., & Konczak, J. (Accepted). Atypical somatosensory-motor cortical response during vowel vocalization in spasmodic dysphonia. Clinical Neurophysiology.


Objective: Spasmodic dysphonia (SD) is a debilitating voice/speech disorder without an effective cure. To obtain a better understanding of the underlying cortical neural mechanism of the disease we analyzed electroencephalographic (EEG) signals of people with SD during voice production.

Method: Ten SD individuals and 10 healthy volunteers produced 50 vowel vocalization epochs of 2500ms duration. Two EEG features were derived: 1) event-related change in spectral power during vocalization relative to rest, 2) inter-regional spectral coherence.

Results: During early vocalization (500-1000ms) the SD group showed significantly larger alpha band spectral power over the left motor cortex. During late vocalization (1000-2500ms) SD patients showed a significantly larger gamma band coherence between left somatosensory and premotor cortical areas.

Conclusions: Two atypical patterns of cortical activity characterize the pathophysiology of spasmodic dysphonia during voice production: 1) a reduced movement-related desynchronization of motor cortical networks, 2) an excessively large synchronization between left somatosensory and premotor cortical areas.

Significance: The pathophysiology of SD is characterized by an abnormally high synchronous activity within and across cortical neural networks involved in voice production that is mainly lateralized in the left hemisphere.

Funding: NIH 1 R01 DC016315-01A1 (PI: JK; co-Investigators: PW, YZ, GG)

Chieh Kao has been selected by the Graduate Fellowship Office to be the recipient of the Interdisciplinary Doctoral Fellowship for the 2019-20 academic year. Congratulations! This prestigious award is a tribute to Chieh's excellent academic record and professional promise. The Fellowship is a non-service award that carries an academic year stipend of $25,000, plus tuition for up to 14 credits per semester at the regular Graduate School rate (the IDF fellowship does not cover collegiate fees or student services fees). Subsidized health insurance will also be included during the academic year and through summer 2020.

page »
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License