Robust fundamental frequency estimation in sustained vowels: Detailed algorithmic comparisons and information fusion with adaptive Kalman filtering

Athanasios Tsanas, Matías Zañartu, Max A. Little, Cynthia Fox, Lorraine O. Ramig, Gari D. Clifford

Research output: Contribution to journalArticlepeer-review

51 Scopus citations

Abstract

There has been consistent interest among speech signal processing researchers in the accurate estimation of the fundamental frequency (F0) of speech signals. This study examines ten F0 estimation algorithms (some well-established and some proposed more recently) to determine which of these algorithms is, on average, better able to estimate F0 in the sustained vowel /a/. Moreover, a robust method for adaptively weighting the estimates of individual F0 estimation algorithms based on quality and performance measures is proposed, using an adaptive Kalman filter (KF) framework. The accuracy of the algorithms is validated using (a) a database of 117 synthetic realistic phonations obtained using a sophisticated physiological model of speech production and (b) a database of 65 recordings of human phonations where the glottal cycles are calculated from electroglottograph signals. On average, the sawtooth waveform inspired pitch estimator and the nearly defect-free algorithms provided the best individual F0 estimates, and the proposed KF approach resulted in a ∼16% improvement in accuracy over the best single F0 estimation algorithm. These findings may be useful in speech signal processing applications where sustained vowels are used to assess vocal quality, when very accurate F0 estimation is required.

Original languageEnglish
Pages (from-to)2885-2901
Number of pages17
JournalJournal of the Acoustical Society of America
Volume135
Issue number5
DOIs
StatePublished - May 2014
Externally publishedYes

Fingerprint

Dive into the research topics of 'Robust fundamental frequency estimation in sustained vowels: Detailed algorithmic comparisons and information fusion with adaptive Kalman filtering'. Together they form a unique fingerprint.

Cite this