|
Plenary Lecture
Enhancement and Restoration

Professor Hector Perez-Meana
National Polytechnic Institute of Mexico
MEXICO
Email: hmpm@prodigy.net.mx
Abstract: Persons that suffer from diseases such as throat
cancer require that their larynx and vocal cords be extracted by a surgical
operation, and then require rehabilitation in order to be able to reintegrate to
their individual, social, familiar and work activities. To accomplish this,
different methods have been used, such as: The esophageal speech, the use of
tracheoesophagical prosthetics and the Artificial Larynx Transducer (ALT), also
known as “electronic larynx”.
The ALT, which has the form of a handheld device, introduces an excitation in
the vocal track by applying a vibration against the external walls of the neck.
This excitation is then modulated by the movement of the oral cavity to produce
the speech sound. This transducer is attached to the speaker’s neck, and in some
cases in the speaker’s cheeks. The ALT is very easy using even for new patients,
although the voice produced by these transducers is unnatural and with low
quality, besides that it is distorted by the ALT produced background noise. The
Esophageal speech is produced through the compression of the contained air in
the vocal tract with the tongue. This air is swallowed and as passing through
the esophageal-pharynx segment produces a vibration of the esophageal upper
muscle, bringing about the speech. The generated sound is similar to a burp, the
tone is commonly very low and the timbre generally harsh. In ALT as well as in
esophageal speech, the voiced segments are the most affected part of speech.
Several approaches have been proposed to improve the quality and intelligibility
of ALT produced, as well as esophageal speech signals. Some of them reduce the
ALT produced background noise by using cepstral root subtraction or adaptive
filtering. However the speech quality produced by these approaches is still
poor. Another approach intended to improve the speech quality estimating the
frequency band from 4 KHz to 8 KHz using the frequency band from 300Hz to 4 KHz.
Although this approach may be an attractive alternative, it must be still
improved. A promising approach is based on speech conversion techniques which
carry out a spectral conversion using vector quantization methods. A similar
approach based on a pattern recognition approach, has also been proposed, in
which, firstly the voiced segments are detected and identified. Then the voiced
segments are replaced by their equivalent voiced segments of normal speech while
the unvoiced segments are kept without change. Finally the voiced, unvoiced and
silence segments are concatenated together to produce the restored speech. These
approaches perform fairly well although still present some problems because the
spectral conversion reduce a continuous spectral space into a discrete code
book, which may produce a distortion that still must be reduced.
This speech presents a review of alaryngeal speech enhancement systems,
providing also evaluation results to show the improvement in the quality and
intelligibility of produced speech.
Brief Biography of the Speaker:
Hector Perez-Meana received his M.S: Degree on Electrical Engineering from the
Electro-Communications University of Tokyo Japan in 1986 and his Ph. D. degree
in Electrical Engineering from the Tokyo Institute of Technology, Tokyo, Japan,
in 1989. From March 1989 to September 1991, he was a visiting researcher at
Fujitsu Laboratories Ltd, Kawasaki, Japan. From September 1991 to February 1997
he was with the Electrical Engineering Department of the Metropolitan University
of Mexico City where he was a Professor. In February 1997, he joined the
Graduate Studies and Research Section of The Mechanical and Electrical
Engineering School, Culhuacan Campus, of the National Polytechnic Institute of
Mexico, where he is now The Dean. In 1991 he received the IEICE excellent Paper
Award, and in 2000 the IPN Research Award and the IPN Research Diploma. In 1998
he was Co-Chair of the ISITA’98, and in 2009 he will be the General Chair of The
IEEE Midwest Symposium on Circuit and Systems (MWSCAS). Prof. Perez-Meana has
published more that 100 papers and two books. He also has directed 15 PhD theses
and more than 30 Master theses. He is a Senior member of the IEEE, member of The
IEICE, The Mexican Researcher System and The Mexican Academy of Science. His
principal research interests are adaptive systems, image processing, pattern
recognition watermarking and related fields
|