Structural KLD for Cross-Variety Speaker Adaptation in HMM-based Speech Synthesis

Markus E. Toman; Michael Pucher

doi:10.2316/P.2013.798-069

Structural KLD for Cross-Variety Speaker Adaptation in HMM-based Speech Synthesis

Markus E. Toman and Michael Pucher

Keywords

Speech processing, algorithms and techniques, speech synthesis, speaker adaptation, variety modeling

Abstract

While the synthesis of natural sounding, neutral style speech can be achieved using today’s technology, fast adaptation of speech synthesis to different contexts and situations still poses a challenge. In the context of variety modeling (dialects, sociolects) we have to cope with the problem that no standardized orthographic form is available and that existing speech resources for these varieties are rare. We present recent approaches in the field of cross-lingual speaker transformation for HMM-based speech synthesis and propose a method for transforming an arbitrary speaker’s voice from one variety to another one. We apply Kullback-Leibler divergence for data mapping of HMM-states, transfer probability density functions to the decision tree of the other variety and perform speaker adaptation. A method to integrate structural information in the mapping is also presented and analyzed. Subjective listening tests show that the proposed method produces speech of significantly higher quality than standard speaker adaptation techniques.

Important Links:

DOI: 10.2316/P.2013.798-069
From Proceeding (797) Computer Graphics and Imaging / 798: Signal Processing, Pattern Recognition and Applications - 2013

Go Back