Follow us on Facebook
Follow us on Twitter
Signalogic on LinkedIn

Home > Products > Software > DSP Real-Time Algorithms > Speech Codec Samples

Speech Codec Wav Samples
Keywords: Voice Codec Samples, Codec Sample Data, WAV File Samples

Overview | Speech Recognition | Speech Codec Samples | Speech + Noise Codec Samples | ITU ◳ | MELPe Speech Codecs ◳ | PESQ Speech Quality Measurement | AT&T Noise Preprocessor | Notes | More Samples | EVS Codec ◳ | Codec PC Simulation Program ◳

Overview

Below are a variety of "before and after" .wav file samples for different LBR (low bit rate) speech (voice) codecs, including MELP, GSM, and G.729A/B, with bit rates ranging from 600 bps to 13000 bps. Click on .wav file links in the table below to listen to the samples (note -- all .wav file entries are mono, sampled at 8 kHz).

Speech samples are at left, with different codec types across (columns). Each row is a different language or sample type, such as addition of background noise. In several cases, a language sample may include both male and female speakers.

Underneath each sample is given the PESQ score, which is a numerical algorithm comparison between the original sample and the processed sample designed to closely approximate a MOS score. 4.5 is a perfect PESQ score, meaning there was no degradation of the processed sample from the original sample. PESQ scores normally refer to the "Original" sample at far left column, unless otherwise indicated. More information on PESQ is given below

Speech Recognition

Signalogic uses these wav files in speech recognition training, testing, and analysis work, for example comparing noise reduction and silence detection algorithms utilized by state-of-the-art codecs with those used by popular speech recognition open source, such as Kaldi ◳. The most advanced codec in use today is the 3GPP sponsored EVS codec ◳, which accurately classifies and models voice, music and other sounds, and background noise, applies precision silence detection, and offers a wide variety of sampling rates and bitrates.

For Kaldi online decoding, the Kaldi ASR Offloading project offers a production alternative to GStreamer. SigSRF software, deployed worldwide by telecom, LEA, and analytics users, has an inference library that interfaces to Kaldi run-time libraries. This allows RTP packet audio streams using wideband codecs to be processed with full RFC and jitter buffer support, including multiple stream groups, then given to Kaldi's online decoding raw audio interface.

More information on Signalogic speech recognition projects can be found at:

Speech Codec Samples

Numbers given in () below are bitrate (in bps) for uncompressed wav file samples, and inherent algorithm frame size (in msec) for compressed (i.e. codec encoder output) wav file samples. Typically, the frame size is also the delay of the codec, but in some cases there may be additional "look ahead" delay.
  
        
Original Sample
Fs = 8 kHz
(128000 bps)
MELP²
2400 bps
(22.5 msec)
MELPe
2400 bps
(22.5 msec)
MELPe
1200 bps
(67.5 msec)
MELPe-Plus
2400 bps
(22.5 msec)
MELPe-Plus
2700 bps
(20 msec)
MELPe-Plus
4000 bps
(20 msec)
MELPe-Plus17
600 bps
(30 msec)
G.729A
8000 bps
(10 msec)
GSM³
13000 bps
(20 msec)
MELPe
2400 + AT&T NPP4
G.729A
8000 bps
+ AT&T NPP
GSM
13000 bps
+AT&T NPP
CVSD
13000 bps
+AT&T NPP
G726
13000 bps
+AT&T NPP
Language & Score
Male
Female
Male
Female
Male
Female
Male
Female
Male
Female
Male
Female
Male
Female
Male
Female
Male
Female
Male
Female
Male
Female
Male
Female
Male
Female
Male
Female
Male
Female
 
English1 eng2_m eng2_f                         male600 female600                     eng_m eng_f eng_m eng_f
(ITU) PESQ Score 4.5 4.5                         2.673 2.293                            
 
English2 eng_m eng_f eng_m1 eng_f1 eng_m2 eng_f2 eng_m3 eng_f3 eng_m4 eng_f4 eng_m5 eng_f5 eng_m6 eng_f6     eng_m7 eng_f7     eng_m9 eng_f9 eng_m10 eng_f10            
(ITU) PESQ1 Score 4.5 4.5 2.666 2.445 2.86 2.583 2.413 2.323 2.923 2.704 2.958 2.734 3.266 3.078     3.570 3.265     2.583 2.434 3.254 3.076            
 
French f_m f_f f_m1 f_f1 f_m2 f_f2 f_m3 f_f3 f_m4 f_f4 f_m5 f_f5 f_m6 f_f6     f_m7 f_f7     f_m9 f_f9 f_m10 f_f10     f_m f_f f_m f_f
(ITU) PESQ Score 4.5 4.5 2.401 2.549 2.482 2.575 2.249 2.365 2.786 2.756 2.770 2.829 3.162 3.243     3.349 3.352     2.482 2.599 3.343 3.315            
 
German                                                            
(ITU) PESQ Score                                                            
 
Japanese                                                            
(ITU) PESQ Score                                                            
 
Chinese ch_m ch_f ch_m1 ch_f1 ch_m2 ch_f2 ch_m3 ch_f3 ch_m4 ch_f4 ch_m5 ch_f5 ch_m6 ch_f6     ch_m7 ch_f7     ch_m9 ch_f9 ch_m10 ch_f10           
(ITU) PESQ Score 4.5 4.5 2.781 2.572 3.080 2.769 2.738 2.477 3.120 2.739 3.124 2.809 3.490 3.164     3.730 3.601     2.969 2.641 3.548 3.462            
 
NSA test
vector5
nsa_m nsa_f nsa_m1 nsa_f1 nsa_m2 nsa_f2 nsa_m3 nsa_f3 nsa_m4 nsa_f4 nsa_m5 nsa_f5 nsa_m6 nsa_f6 nsa_m600 nsa_f600 nsa_m7 nsa_f7     nsa_m9 nsa_f9 nsa_m10 nsa_f10            
(ITU) PESQ Score 4.5 4.5 3.185 2.963 3.270 3.063 2.976 2.761 3.331 3.029 3.330 2.963 3.637 3.451 2.694 2.275 3.882 3.901     3.197 2.988 3.865 3.659            
 
GSM test
vector
                                                           
(ITU) PESQ Score                                                            
 
G.729A test
vector
                                                           
(ITU) PESQ Score                                                            
 

Speech + Noise Codec Samples

        
Original Sample
w/o Noise
Fs = 8 kHz
(128000 bps)
Original Sample
with Noise
Fs = 8 kHz
(128000 bps)
MELP²
2400 bps
(22.5 msec)
MELPe
2400 bps
(22.5 msec)
MELPe
1200 bps
(67.5 msec)
MELPe-Plus
2400 bps
(22.5 msec)
MELPe-Plus
2700 bps
(20 msec)
MELPe-Plus
4000 bps
(20 msec)
G.729A
8000 bps
(10 msec)
GSM³
(20 msec)
13000 bps
RLS
AT&T NPP4
MELPe
2400 + AT&T NPP
G.729A
+ AT&T NPP
8000 bps
GSM
+AT&T NPP
MELPe + RLS6
Language &
Score
                               
English + car
background
noise (1)
  car car1 car2 car3 car4 car5 car6 car7     car10 car9 car12    
PESQ Score     2.458 2.561 2.37 2.740 2.723 2.968 3.51511       2.8367 3.65512    
 
English +
white noise (2)
white noise original white noise white noise1 white noise2 white noise3 white noise4 white noise5 white noise6 white noise7   white noise9 white noise10 white noise9 white noise11   white noise13
PESQ Score 4.5 1.615 2.251 2.237 1.574 2.509 2.795 2.933     3.5508 1.5139 2.94610     3.1219
 
English +
wideband noise (3)
wideband
noise original
wideband
noise
wideband
noise1
wideband
noise2
wideband
noise3
wideband
noise4
wideband
noise5
wideband
noise6
wideband
noise7
  wideband
noise9
wideband
noise10
wideband
noise11
wideband
noise12
  wideband
noise14
PESQ Score 4.5 1.695 2.64313 2.69613 2.45313 2.85513 2.78813 3.15413 3.41813   3.913 2.08014 1.98914 2.06414   3.253
 
English + street
background
noise (4)
street noise original street noise street noise1 street noise2 street noise3 street noise4 street noise5 street noise6 street noise7   street noise9 street noise10 street noise11 street noise12   street noise14
PESQ Score 4.5 1.800 2.28315 2.35115 2.19715 2.49315 2.42215 2.92515 3.35915   3.799 2.13916 1.98916 2.13316   3.239

External Links -- More Samples

Below are listed more web pages with audio/voice codec samples:

http://www.kyastem.co.jp/english/e-sample.html

http://www.hawksoft.com/hawkvoice/codecs.shtml

SigSRF SDK (more wav files are included in the demo download) ◳

About PESQ

PESQ (Perceptual Evaluation of Speech Quality) is an ITU algorithm that is increasingly used to emulate MOS, which is "mean opinion score", a listening test for speech codecs. The objective is to automate and ensure reproducibility of measurement of degradation over telephony channels, due to speech compression, line conditions, delay/echo, etc. and obtain results that closely correlate with MOS human listening tests.

More information on the PESQ approach and algorithm can be found at: http://www.pesq.org/

The current ITU PESQ recommendation is P.862E ◳, and the current software version of ITU PESQ is v1.2, which is a significant improvement over earlier versions. There is still ongoing work by PESQ developers to address limitations for maximum input file size, worst-case variation in delay (dynamic time-alignment with reference input), and noisy backgrounds.

AT&T Noise Preprocessor

The AT&T website contains no published information about the AT&T Noise Preprocessor. Instead, below we provide the abstract of the AT&T patent:

The system and method of the invention relates to voice detection technology for determining instants of time at which a snapshot of noise characteristics results in improved adaptation of noise floors used in voice detection. The approach is based on the "lower envelope" of the smoothed input signal power. Incorporation of this approach in a simple time domain VAD (Voice Activity Detector) results in an effective low-complexity system which, on the basis of simulations, gives good performance down to SNR values of about 0 dB. In the invention the lower envelope also provides the updated value of the noise threshold during the presence of speech. The invention can also be embedded in other, more complex (e.g., frequency domain) VADs at low computational cost.

More information on the AT&T patent:

Patent Number       5,991,718
Author David Malah
Title System and method for noise threshold adaptation for voice activity detection in nonstationary noise environments

More information about AT&T Noise Preprocessor can be found in this 2001 ICASSP paper ◳.

Send e-mail to: Joe Alfred [ALIPM] at AT&T

Notes

1 ITU P.862 v1.2. For more information, see About PESQ section above.
2 Mixed Excitation Linear Predictive.
3 Global System for Mobile Communications.
4 AT&T Noise Preprocessor algorithm. NPP algorithm has some processing delay before algorithm output stabilizes.
5 Mixed Excitation Linear Predictive. MELP = original MELP v1.2, MELPe = enhanced MELP (current standard).
6 MELPe + RLS based adaptive noise cancellation information is located at http://www.owlnet.rice.edu/~ryanking/elec431/compare.html
7 PESQ score referenced to car10 waveform.
8 PESQ score referenced to white noise waveform.
9 PESQ score referenced to white noise original waveform.
10 PESQ score referenced to white_noise10 waveform.
11 PESQ score referenced to car
12 PESQ score referenced to car10
13 PESQ score referenced to wideband noise
14 PESQ score referenced to wideband noise original
15 PESQ score referenced to streetnoise
16 PESQ score referenced to streetnoise original
17 Preliminary Simulation of 600bps MELPe-Plus Codec.