Perceptual metric for speech quality evaluation (PMSQE):
Source code and audio examples
Juan M. Martín-Doñas, Angel M. Gomez, Jose A.
Gonzalez, Antonio M. Peinado
A Deep Learning Loss Function based on the Perceptual Evaluation of
the Speech Quality
This paper proposes a perceptual metric for
speech quality evaluation which is suitable, as a loss function,
for training deep learning methods. This metric, derived from
the perceptual evaluation of the speech quality (PESQ)
algorithm, is computed in a per-frame basis and from the power
spectra of the reference and processed speech signal. Thus, two
disturbance terms, which account for distortion once auditory
masking and threshold effects are factored in, amend the mean
square error (MSE) loss function by introducing perceptual
criteria based on human psychoacoustics. The proposed loss
function is evaluated for noisy speech enhancement with deep
neural networks. Experimental results show that our metric
achieves significant gains in speech quality (evaluated using an
objective metric and a listening test) when compared to using
MSE or other perceptual-based loss functions from the
literature.
Clean and noisy speech signal, and enhanced speech signals using a DNN trained with
the MSE loss function, the wMSE-SVS loss function and the proposed PMSQE approach.