USC researchers from the Signal Analysis and Interpretation Laboratory (SAIL), directed by Professor Shri Narayanan of the Ming Hsieh Department of Electrical Engineering, have won the INTERSPEECH 2015 Computational Paralinguistics Challenge Award. The team was led by Dr. Matthew Black, a member of SAIL and a research computer scientist at USC/ISI. Matthew is also the CEO and co-founder of Behavioral Informatix, a start-up focused on behavioral signal processing technology solutions. The team included EE PhD students Daniel Bone, Zisis Skordilis, Rahul Gupta, Wei Xia, Pavlos Papadopoulos, Sandeep Nallan Chakravarthula, Bo Xiao, and Jangwon Kim, along with SAIL researchers Maarten Van Segbroeck, Panayiotis Georgiou and Shrikanth Narayanan. This is a record sixth such award for USC; teams from USC SAIL were winners of the INTERSPEECH challenge awards in 2009, 2011, 2012, 2013 and 2014. INTERSPEECH is the world's largest interdisciplinary conference focused on speech and language science and technology.
The award recognizes the research paper describing an innovative approach that achieved the best performance in an international competition for automated estimation of the degree of nativeness in speech. The paper was presented at the 16th Annual Conference of the International Speech Communication Association held in September 2015 in Dresden, Germany. The signal processing and pattern recognition algorithms of the competitors were tested on a database supplied by the organizers and contained realistic recordings from a high diversity of speakers. The winning paper of USC SAIL is entitled, “Automated Evaluation of Non-Native English Pronunciation Quality: Combining Knowledge- and Data-Driven Features at Multiple Time Scales”.
Automatic assessment of pronunciation quality of speech has significant applications both in language learning applications and other domains such as health. The winning paper from SAIL tackled the problem posed under a challenging setting using English speech data drawn from multiple speakers from a variety of native language backgrounds using a combination of knowledge-inspired and data-driven approaches. Signal features that capture degree of nativeness from pauses in speech, speaking rate, rhythm/stress, and goodness of phone pronunciation were automatically computed. A key finding of the paper is that highly accurate automated assessment can be attained using a small diverse set of intuitive and interpretable features.
Details of this research, and ongoing efforts on speech, language and multimodal signal processing, as well as human behavioral signal processing and behavioral informatics with applications to domains ranging from commerce and security to health realms such as autism, depression, family studies and addiction, can be found at .