Text to Speech Synthesis: New Paradigms and Advances - Hardcover

9780131456617: Text to Speech Synthesis: New Paradigms and Advances
View all copies of this ISBN edition:
 
 

Recent advances in speech synthesis will enable the development of high-quality natural voice systems with broad application in education, business, entertainment, and medicine. Text to Speech Synthesis is the first book to comprehensively document these new research trends and paradigms, balancing coverage of research and applications. It brings together seminal research by leaders in the field, drawn from both academic and industrial laboratories worldwide.

The authors and editors offer broad coverage of several key areas, including new unit selection approaches, speech representations and modeling, data-driven synthesis schemes, and expressive speech synthesis.

Coverage includes:

  • Unit Selection Methods: Reducing discontinuities at synthesis time in corpus-based speech processing, voice quality variation, and join costs
  • Hidden Markov Model (HMM)-Based Synthesis: Advanced uses of speech recognition technology, HMM-based multilingual speech synthesis, and new prosody control techniques
  • Expressive Speech Synthesis: Challenges, questions, and avenues of research, including diphone transplantation and minimization of pitch modification
  • Speech Representation and Models: A new articulatory modeling paradigm for controlling synthesis quality

This is an essential resource for all researchers working in speech synthesis and related areas such as multimedia signal processing, linguistics, and spoken user interfaces. It will also be valuable to any engineer, developer, or manager who must evaluate the latest speech technologies or integrate them into practical applications.



"synopsis" may belong to another edition of this title.

About the Author:

Dr. Shrikanth Narayanan is associate professor at the Signal and Image Processing Institute of USC's Electrical Engineering Department. He founded and directs USC's Speech Analysis and Interpretation Laboratory, and serves as research area director of the Integrated Media Systems Center, an NSF Engineering Research Center. He is associate editor of IEEE Transactions of Speech and Audio Processing, serves on the speech communication technical committee of the Acoustical Society of America, and was Principal Member of Technical Staff at AT&T Laboratories.

Dr. Abeer Alwan, a professor of electrical engineering at UCLA, established and directs the Speech Processing and Auditory Perception Laboratory there. Her research interests include modeling human speech production and perception mechanisms and applying these models to speech-processing applications such as noise-robust automatic speech recnognition, compression, and synthesis. She is a Fellow of the Acoustical Society of America and recently served as editor-in-chief of the journal Speech Communication.



013145661XAB04232004
Excerpt. © Reprinted by permission. All rights reserved.:

Speech synthesis research has attracted renewed interest worldwide. There have been several recent conferences on speech synthesis where current approaches and advances have been highlighted. The goal of this book is to provide an in-depth exposition of some of the recent trends and novel directions in the field. This book was inspired largely by an IEEE sponsored workshop held in September 2002 in Santa Monica, CA and was dedicated to the memory of Mike Macon, a speech synthesis researcher who tragically passed away at a young age. The Foreword of this book by Jan van Santen highlights some of Mike’s important contributions.

The chapters in this book attempt to cover a wide range of topics in speech synthesis. They include unit selection approaches, speech representations and models for synthesis, data-driven synthesis schemes, and expressive speech synthesis.

One of the major challenges for corpus based speech approaches is the reduction of discontinuities at synthesis time. The chapter by Bozkurt, et. al., introduces signal processing schemes aimed at addressing concatenation and smoothing of speech units. Another challenge for unit selection synthesis systems is dealing with voice quality variations in the unit inventory. Kawai and Tsukaki address the issue by considering long-term recording of a single speaker corpus. The authors derive acoustic measures that correlate with perceptual measures of voice quality variations, which in turn could be used for optimal unit selection. The third chapter in this section by Vepa, King, and Taylor focuses on the issue of defining and calculating the join (or concatenation) cost to predict the perceived discontinuity at concatenation points. They also examine underlying representations to simultaneously compute concatenation cost and smooth acoustic coefficients.

An underlying technical challenge in synthesizing natural sounding speech by data-driven means is the ability to exercise control over the synthesis quality. The chapter by Sondhi and Sinder presents an alternative paradigm in achieving better parametric control in speech synthesizers by relying on articulatory representation for the speech signal. Notably, they explore the notion of using articulatory units in a corpus-based concatenative speech synthesis set up.

Prosody control, an important element for achieving natural expressive speech quality, is the topic addressed by Prudon, d’Alessandro, and Boula de Mareüil. In their chapter, they focus on prosody synthesis and evaluation using both rule-based and data-driven approaches for diphone synthesis in the French language. The chapter by Klabbers, van Santen andWouters considers the problem of prosody control in unit selection systems in a different light. They describe and evaluate an approach of prosodic factorization, to be used while designing a unit selection system, that can help minimize the amount of pitch modification required.

An emerging and promising approach to speech synthesis is based on hidden Markov models (HMMs) which have been used successfully in automatic speech recognition. This paradigm shift in speech synthesis is highlighted in the chapter by Ostendorf and Bulyko where the parallels, potential pitfalls, and missing links between synthesis and recognition using HMMs are discussed. The following chapter by Tokuda, Zen, and Black, describes how the HMM framework can be used to develop an end-to-end multilingual synthesis system highlighting the benefits and open research challenges. One such challenge relates to prosody control. The chapter by Iwano, Yamada, Togawa, and Furui addresses this issue so that the rate of the synthesized speech can be continuously and effectively modified. Their system was evaluated subjectively to assess naturalness.

A critical aspect of natural speech is its expressive quality. Recent trends in speech synthesis aim at achieving and improving the expressive nature of synthesized speech by manipulating segmental and suprasegmental properties of the speech signal. An overview of synthesizing expressive speech is provided by Bulut, Narayanan, and Johnson in Chapter 9. They review both rule-based and data-driven methods, data collection, and evaluation approaches, and provide a summary of open questions in emotional speech synthesis. Eide, Bakis, Hamza, and Piterelli discuss and compare methods for generating expressive speech for unlimited and limited resource scenarios. For both cases they show significant differences between expressive and neutral synthetic speech.

The editors would like to thank the many authors who contributed to this book. We would also like to thank Andrew Tescher, Editor in Chief of the IMSC Press Series. and Bernard Goodwin of Prentice Hall for their excellent support in helping with the publication of this book. We are grateful to Gloria Halfacre, Panayiotis Georgiou, Seth Scafani, and Allan Weber for their help with the production of this book.



"About this title" may belong to another edition of this title.

Buy Used

Condition: Fine
Ancien livre de bibliothèque. Edition... Learn more about this copy

Shipping: US$ 8.60
From France to U.S.A.

Destination, rates & speeds

Add to Basket

Other Popular Editions of the Same Title

9788129710789: Text To Speech Synthesis

Featured Edition

ISBN 10:  8129710781 ISBN 13:  9788129710789
Publisher: Dorling Kindesley Pearson Education
Softcover

Top Search Results from the AbeBooks Marketplace

Stock Image

Abeer Alwan et Shrikanth Narayanan
Published by Prentice Hall (2004)
ISBN 10: 013145661X ISBN 13: 9780131456617
Used Hardcover Quantity: 1
Seller:
Ammareal
(Morangis, France)

Book Description Hardcover. Condition: Très bon. Ancien livre de bibliothèque. Edition 2004. Ammareal reverse jusqu'à 15% du prix net de cet article à des organisations caritatives. ENGLISH DESCRIPTION Book Condition: Used, Very good. Former library book. Edition 2004. Ammareal gives back up to 15% of this item's net price to charity organizations. Seller Inventory # D-486-141

More information about this seller | Contact seller

Buy Used
US$ 149.47
Convert currency

Add to Basket

Shipping: US$ 8.60
From France to U.S.A.
Destination, rates & speeds
Stock Image

Published by Prentice Hall (2004)
ISBN 10: 013145661X ISBN 13: 9780131456617
Used Hardcover Quantity: 1
Seller:
Iridium_Books
(DH, SE, Spain)

Book Description Condition: Used - Good. Seller Inventory # 9780131456617

More information about this seller | Contact seller

Buy Used
US$ 423.66
Convert currency

Add to Basket

Shipping: US$ 34.40
From Spain to U.S.A.
Destination, rates & speeds
Stock Image

NARAYANAN
Published by PEARSON EDUCACION (2004)
ISBN 10: 013145661X ISBN 13: 9780131456617
Used Hardcover Quantity: 1
Seller:
Iridium_Books
(DH, SE, Spain)

Book Description Condition: Muy Bueno / Very Good. Seller Inventory # 100000000849191

More information about this seller | Contact seller

Buy Used
US$ 1,168.47
Convert currency

Add to Basket

Shipping: US$ 34.40
From Spain to U.S.A.
Destination, rates & speeds