universidad de zaragoza

Zaragoza 8-10 Noviembre 2006

Speech 3.0
Spoken Dialog Systems of the Third Generation

Roberto Pieraccini
SpeechCycle, Inc.


About ten years ago we witnessed the birth of the first commercial informational spoken dialog applications over the telephone, such as package tracking and flight status information.  Those systems, based on proprietary platforms, implemented simple dialogs that resulted in a few user-machine turns. As the technology progressed and application builders moved into standard platforms, we saw more complex applications of the transactional type, such as stock trading and travel reservations, typically spanning a dozen or so turns.

Today, as complexity is shifting from voice processing to the application space, we are moving into the third generation of dialog systems (Speech 3.0), which are exemplified by problem solving applications. These systems are characterized by highly complex interactions that last for dozens of turns and tens of minutes with a sophisticated interplay between dialog logic, natural language speech understanding, backend functionality, advanced VUI features, and integration with live agents.

After a brief historical perspective on the evolution of the spoken dialog industry, I will describe the issues and the approaches related the development, maintenance, tuning, and business of the third generation of dialog systems. I will show how the level of complexity required by this type of applications needs sophisticated authoring, reporting, and tuning tools, new breed of speech technology, and the ability to manage and account for continuous changes in the environment.  I will then conclude by delineating the future trends of spoken dialog systems, both from the research as well as from the industrial point of view.

Roberto Pieraccini:
Roberto Pieraccini graduated in electrical engineering from the “Universita’ degli Studi di Pisa,” Pisa, Italy, in 1980. From 1981 to 1990 he was a researcher at CSELT (Torino, Italy). In June 1990 he joined AT&T Bell Laboratories (Murray Hill, NJ) as a Member of Technical Staff, and from 1995 to 1999 he was with AT&T Shannon Laboratories (Florham Park, NJ). From November 1999 to August 2003 he was director of R&D for dialog technology at SpeechWorks International.  In 2003 he joined the Human Language Technology department of IBM T.J. Watson Research in Yorktown Heights, where he managed the Conversational Interaction Technology department. In 2005 he accepted the position of Chief Technology Officer at SpeechCycle (formerly Tell-Eureka Corporation), a company specializing in spoken dialog systems for technical support customer care.  During his carrier he was actively involved in research on speech recognition, language modeling, spoken dialog systems and machine learning, and he authored more than 100 publications on these subjects.