This specific ISBN edition is currently not available.View all copies of this ISBN edition:
Recent advances in speech synthesis will enable the development of high-quality natural voice systems with broad application in education, business, entertainment, and medicine. Text to Speech Synthesis is the first book to comprehensively document these new research trends and paradigms, balancing coverage of research and applications. It brings together seminal research by leaders in the field, drawn from both academic and industrial laboratories worldwide.The authors and editors offer broad coverage of several key areas, including new unit selection approaches, speech representations and modeling, data-driven synthesis schemes, and expressive speech synthesis.Coverage includes:Unit Selection Methods: Reducing discontinuities at synthesis time in corpus-based speech processing, voice quality variation, and join costs Hidden Markov Model (HMM)-Based Synthesis: Advanced uses of speech recognition technology, HMM-based multilingual speech synthesis, and new prosody control techniques Expressive Speech Synthesis: Challenges, questions, and avenues of research, including diphone transplantation and minimization of pitch modification Speech Representation and Models: A new articulatory modeling paradigm for controlling synthesis qualityThis is an essential resource for all researchers working in speech synthesis and related areas such as multimedia signal processing, linguistics, and spoken user interfaces. It will also be valuable to any engineer, developer, or manager who must evaluate the latest speech technologies or integrate them into practical applications.
"synopsis" may belong to another edition of this title.
Dr. Shrikanth Narayanan is associate professor at the Signal and Image Processing Institute of USC's Electrical Engineering Department. He founded and directs USC's Speech Analysis and Interpretation Laboratory, and serves as research area director of the Integrated Media Systems Center, an NSF Engineering Research Center. He is associate editor of IEEE Transactions of Speech and Audio Processing, serves on the speech communication technical committee of the Acoustical Society of America, and was Principal Member of Technical Staff at AT&T Laboratories.
Dr. Abeer Alwan, a professor of electrical engineering at UCLA, established and directs the Speech Processing and Auditory Perception Laboratory there. Her research interests include modeling human speech production and perception mechanisms and applying these models to speech-processing applications such as noise-robust automatic speech recnognition, compression, and synthesis. She is a Fellow of the Acoustical Society of America and recently served as editor-in-chief of the journal Speech Communication.
Speech synthesis research has attracted renewed interest worldwide. There have been several recent conferences on speech synthesis where current approaches and advances have been highlighted. The goal of this book is to provide an in-depth exposition of some of the recent trends and novel directions in the field. This book was inspired largely by an IEEE sponsored workshop held in September 2002 in Santa Monica, CA and was dedicated to the memory of Mike Macon, a speech synthesis researcher who tragically passed away at a young age. The Foreword of this book by Jan van Santen highlights some of Mike’s important contributions.
The chapters in this book attempt to cover a wide range of topics in speech synthesis. They include unit selection approaches, speech representations and models for synthesis, data-driven synthesis schemes, and expressive speech synthesis.
One of the major challenges for corpus based speech approaches is the reduction of discontinuities at synthesis time. The chapter by Bozkurt, et. al., introduces signal processing schemes aimed at addressing concatenation and smoothing of speech units. Another challenge for unit selection synthesis systems is dealing with voice quality variations in the unit inventory. Kawai and Tsukaki address the issue by considering long-term recording of a single speaker corpus. The authors derive acoustic measures that correlate with perceptual measures of voice quality variations, which in turn could be used for optimal unit selection. The third chapter in this section by Vepa, King, and Taylor focuses on the issue of defining and calculating the join (or concatenation) cost to predict the perceived discontinuity at concatenation points. They also examine underlying representations to simultaneously compute concatenation cost and smooth acoustic coefficients.
An underlying technical challenge in synthesizing natural sounding speech by data-driven means is the ability to exercise control over the synthesis quality. The chapter by Sondhi and Sinder presents an alternative paradigm in achieving better parametric control in speech synthesizers by relying on articulatory representation for the speech signal. Notably, they explore the notion of using articulatory units in a corpus-based concatenative speech synthesis set up.
Prosody control, an important element for achieving natural expressive speech quality, is the topic addressed by Prudon, d’Alessandro, and Boula de Mareüil. In their chapter, they focus on prosody synthesis and evaluation using both rule-based and data-driven approaches for diphone synthesis in the French language. The chapter by Klabbers, van Santen andWouters considers the problem of prosody control in unit selection systems in a different light. They describe and evaluate an approach of prosodic factorization, to be used while designing a unit selection system, that can help minimize the amount of pitch modification required.
An emerging and promising approach to speech synthesis is based on hidden Markov models (HMMs) which have been used successfully in automatic speech recognition. This paradigm shift in speech synthesis is highlighted in the chapter by Ostendorf and Bulyko where the parallels, potential pitfalls, and missing links between synthesis and recognition using HMMs are discussed. The following chapter by Tokuda, Zen, and Black, describes how the HMM framework can be used to develop an end-to-end multilingual synthesis system highlighting the benefits and open research challenges. One such challenge relates to prosody control. The chapter by Iwano, Yamada, Togawa, and Furui addresses this issue so that the rate of the synthesized speech can be continuously and effectively modified. Their system was evaluated subjectively to assess naturalness.
A critical aspect of natural speech is its expressive quality. Recent trends in speech synthesis aim at achieving and improving the expressive nature of synthesized speech by manipulating segmental and suprasegmental properties of the speech signal. An overview of synthesizing expressive speech is provided by Bulut, Narayanan, and Johnson in Chapter 9. They review both rule-based and data-driven methods, data collection, and evaluation approaches, and provide a summary of open questions in emotional speech synthesis. Eide, Bakis, Hamza, and Piterelli discuss and compare methods for generating expressive speech for unlimited and limited resource scenarios. For both cases they show significant differences between expressive and neutral synthetic speech.
The editors would like to thank the many authors who contributed to this book. We would also like to thank Andrew Tescher, Editor in Chief of the IMSC Press Series. and Bernard Goodwin of Prentice Hall for their excellent support in helping with the publication of this book. We are grateful to Gloria Halfacre, Panayiotis Georgiou, Seth Scafani, and Allan Weber for their help with the production of this book.
"About this title" may belong to another edition of this title.
Book Description Prentice Hall PTR, 2004. Condition: New. book. Seller Inventory # M013145661X
Book Description Prentice Hall PTR, 2004. Hardcover. Condition: New. Never used!. Seller Inventory # P11013145661X