Audiovisual Polish speech corpus

The corpus is designed to support research on speech recognition systems by analysing the movement of the speaker's face. It includes audiovisual records of the Polish language and consists of good quality recordings of the face (front) of 20 different people (men and women) and transcriptons of their speech. Semantic content of the recordings of each speaker is the same.

Total duration of recording is 200 minutes (and additional about 40 minuts test recordings of various quality for four other speakers).

Recordings were made mostly in natural lighting. The speakers were located on a bright, uniform background.

Audiowizualny korpus mowy polskiej

Sound was recorded with Zoom H4N recorder and microphones: capacitive AKG C5 Vocal and dynamic AKG Shotgun C568.

The recordings are in the .wav file format with the following parameters:

  • Sampling frequency: 44 100 Hz
  • Resolution: 16 bit
  • SNR: an average of about 40 dB

Video war recorded by JVC Everio GZ-HD500 camera.

The recordings are in the .mts/avchd file format and H.264/MPEG-4 AVC standard with the following parameters:

  • HD resolution: 1920×1080
  • Speed ​​Bit Stream: >14 Mbps
  • Frame Rate: 25/50 fps

Recommended playback parameters:

  • Monitor: best with support 1920×1080 resolution
  • Memory: min. 2GB
  • Processor: min. 3GHz
  • Graphics Card:
    • ATI: HD series models
    • NVIDIA PureVideo technology models of HD
  • Codecs: K Lite Codec Pack v. 7.9.2 (32bit) / 5.4.0 (64bit)

A sample of the corpus:
The Adobe Flash Plugin is needed to display this content.

Authors invite people interested in audio processing technologies to contact spin-off company

Copyright © Zespół Przetwarzania Sygnałów AGH 2011-2014