AUTHORS : Md. Arifour Rahman, Yosuke Sugiura, Tetsuya Shimamura
Download as PDF
ABSTRACT: This paper proposes a technique for improving the performance of linear prediction (LP) by utilizing the prediction error filter (PEF) as a pre-processor. Problems often occur in estimating the power spectrum of the input speech signal using LP due to the large spectral dynamic range of speech which makes the autocorrelation matrix ill-conditioned. In the proposed method, the LP based power spectrum estimation is compensated by the spectrum characteristics of the designed PEF. The accuracy of formant frequency estimation is verified on synthetic speech. The validity of the proposed method is also illustrated by inspecting real air conducted and bone conducted speeches. Through the experiments, we show that the proposed method can estimate the power spectrum more accurately than the conventional direct and pre-emphasis LP methods.
KEYWORDS: Linear prediction, prediction error filter, formant frequency estimation, spectrum compensation, air conducted speech, bone conducted speech
REFERENCES:
[1] S. Haykin, Adaptive Filter Theory, PrenticeHall, 2002.
[2] S. V. Vaseghi, Advanced Digital Signal Processing and Noise Reduction, Wiley, 2009.
[3] J. Makhoul, Linear Prediction: A Tutorial Review, Proc. IEEE, Vol. 63, No. 4, 1975, pp. 561- 580.
[4] J. D. Markel, Digital Inverse Filtering- a New Tool for Format Trajectory Estimation, IEEE Trans. Audio Electroacoust., Vol. AU-20, No. 2, 1972, pp. 129-137.
[5] B. S. Atal and S. Hanauer, Speech Analysis and Synthesis by Linear Prediction of the Speech Wave, J. Acoust. Soc. Amer., Vol. 50, No. 2, 1974, pp. 637-655.
[6] S. Chandra and W. C. Lin, Experimental Comparison Between Stationary and Nonstationary Formulations of Linear Prediction Applied to Voiced Speech Analysis, IEEE Trans. on Acoust., Speech, Signal Processing, Vol. ASSP- 22, No. 6, 1974, pp.403-415.
[7] P. Kabal, Ill-Conditioning and Bandwidth Expansion in Linear Prediction of Speech, Proc. IEEE Int. Conf. on Acoust., Speech and Signal Processing Acoust, 2003, pp. 824-827.
[8] S. V. Parter, On the Extreme Eigenvalues of Truncated Toeplitz Matrices. Bulletin of Amer. Math. Soc., Vol. 67, 1961, pp. 191-196.
[9] H. Kesten, On the Extreme Eigenvalues of Translation Kernels and Toeplitz Matrices, J. d’Analyse Math., Vol. 10, 1962, pp. 117-138.
[10] G. Fant, J. Liljencrants and Q. G. Lin, A Four Parameter Model of Glottal Flow, Quart. Progress and Status Rep., Speech Transmission Lab, Royal Inst. Technol., 1985, pp. 1-13.
[11] H. Strik, Automatic Parameterization of Differentiated Glottal Flow: Comparing Methods by Means of Synthetic Flow Pulses, J. Acoust. Soc. Amer., Vol. 103, No. 5, 1998, pp. 2659-2669.
[12] S. Stenfelt and R. Goode, Transmission Properties of Bone Conducted Sound: Measurements in Cadaver Heads, J. Acoust. Soc. Amer., Vol. 118, No. 4, 2005, pp. 2373-2391.
[13] M. McBride, P. Tran, T. Letowski and R. Patric, The Effect of Bone Conduction Microphone Locations on Speech Intelligibility and Sound Quality, Applied Ergonomics, Vol. 42, No. 3, 2011, pp. 495-502.