Logo: University of Southern California

Prof. Mari Ostendorf

Distinguished Lecturer Series

Modeling Spoken Language

Prof. Mari Ostendorf 
Associate Chair for Research- University of Washington

When: September 15, 2004 @ 3:00pm
Where: Gerontology Auditorium (GER 124)

Abstract: As storage costs plummet and speech recognition technology progressively improves, it becomes feasible to think of archiving and publishing "spoken documents" that can be accessed as easily as we do online text documents. The range of potentially interesting spoken documents is vast, including records of meetings, committee hearings, news broadcasts, and call center data, as well as multi-media documents that include speech recordings. Language processing technology for spoken documents is even more critical than for text, since it is much more cumbersome to mine audio recordings than text for useful information. A key component of both speech recognition technology and many subsequent language processing technologies is statistical language modeling. Language models are used to characterize word sequences as an information source (a discrete stochastic process) that is to be decoded from noisy observations, such as acoustic features in speech recognition or words in another language in machine translation. Despite the fact that language is known to have long-distance structure, the most widely used language model is a simple n-gram or (n-1)-order Markov process, estimated from word sequence counts in data representative of the target task. In addition, performance gains in language modeling in recent years have been driven as much by data collection as by advances in representation of linguistic structure. As vast text resources are increasingly available via the web, one might argue that this trend will continue. However, spoken language can be quite different from written language, particularly for informal conversational speech, transcripts of which are not as readily available as written text. Human language can vary substantially depending on topic and register, such that the addition of mismatched text to the training set can actually hurt language modeling performance when using simple n-gram models. These observations argue for a decomposition of language at several levels, in terms of factors related to speaking style, topic, syntax and even morphology. This talk will show that leveraging larger data resources in learning models is synergistic with and not simply an alternative to representing structure in language, with examples of success stories in different languages and speech recognition tasks.

Mari Ostendorf joined the Speech Signal Processing Group at BBN Laboratories in 1985, where she worked on low-rate coding and acoustic modeling for continuous speech recognition. Two years later, she went to Boston University in the Department of Electrical and Computer Engineering, where she taught undergraduate and graduate signal processing and pattern recognition courses and ran a large speech research lab. She joined the University of Washington in 1999. Her early work was in speech coding; more recently she has been involved in projects on both continuous speech recognition and speech synthesis, as well as some other types of signals. Current efforts include segment-based acoustic modeling for spontaneous speech recognition, dependence modeling for adaptation, use of out-of-domain data in language modeling, and stochastic models of prosody for both recognition and synthesis. She has published over 100 papers on various problems in speech and language processing. Dr. Ostendorf has served on the Speech Processing and the DSP Education Committees of the IEEE Signal Processing Society and numerous workshop committees.


Click here to visit her website


Click here to see pictures from the event