Audiovisual Speech Processing for Polish Applicable to Human-Computer Interface

The goal of improving the Human and Computer Interface (HCI) is to make work with computer more “natural”, by providing totally non-absorbing sensors as computer controllers. Standard devices with which computers’ users must now deal, like keyboards, mouses and monitors are supported with sensors such as microphones, camcorders and touchscreens. The goal is to provide interface where human could use all of its base senses, e.g. sight, hearing and touch and a speech as a basic form of human communication. Since human perception is multimodal in nature, communication with computer by using voice commands, especially in native language, seems to be comfortable interface. The next step in designing useful interfaces require development of efficient automatic speech recognition systems, that are robust to environmental noise. Supporting speech processing with an analysis of speakers face recognition and visual data stream processing will enhance speech recognition system capabilities, performance and noise robustness.

The main concept of this project is to select and develop algorithms for audiovisual speech processing, applicable to Polish, especially in the field of: features extraction algorithms and appropriate Region-of-Interest (ROI) selection with the most informative areas of user’s face; parametrisation methods will be also investigated, multistream data classification algorithms, data dimensionality reduction algorithms, audiovisual fusion strategies. Additionally, algorithms for non-uniform stream segmentation, that are now based on analyse of energy changes in audio stream only, will be supported with visual data to better fit segment borders to phonemes.

One of the important results of this project will be audiovisual corpus of Polish speech with detailed time annotations and content and speaker descriptions. Algorithms for semi-supervised and unsupervised production of this type of corpus will be also proposed. The most suitable algorithms for audiovisual processing Polish will be also implemented for use in parallel platforms, especially the General Purpose Computing on Graphics Processing Unit (GPGPU) with CUDA devices platform in the case of supporting real-time processing of audiovisual data.

Copyright © Zespół Przetwarzania Sygnałów AGH 2011-2014