Text normalization and diphone preparation for Bangla speech synthesis

Muhammad Masud Rashid, Akter Hussain, M. Shahidur Rahman

Research output: Contribution to journalArticlepeer-review

11 Scopus citations

Abstract

This paper presents methodologies involved in text normalization and diphone preparation for Bangla Text to Speech (TTS) synthesis. A Concatenation based TTS system comprises basically two modules- one is natural language processing and the other is Digital Signal Processing (DSP). Natural language processing deals with converting text to its pronounceable form, called Text Normalization and the diphone selection method based on the normalized text is called Grapheme to Phoneme (G2P) conversion. Text normalization issues addressed in this paper include tokenization, conjuncts, null modified characters, numerical words, abbreviations and acronyms. Issues related with diphone preparation include diphone categorization, corpus preparation, diphone labeling and diphone selection. Appropriate rules and algorithms are proposed to tackle all the above mentioned issues. We developed a speech synthesizer for Bangla using diphone based concatenative approach which is demonstrated to produce much natural sounding synthetic speech.

Original languageEnglish
Pages (from-to)551-559
Number of pages9
JournalJournal of Multimedia
Volume5
Issue number6
DOIs
StatePublished - 2010
Externally publishedYes

Keywords

  • Diphone
  • Grapheme-tophoneme
  • Sentence analysis
  • Speech synthesis
  • Text normalization

Fingerprint

Dive into the research topics of 'Text normalization and diphone preparation for Bangla speech synthesis'. Together they form a unique fingerprint.

Cite this