M. Geravanchizadeh, S. Ghalami Osgouei,
Volume 10, Issue 4 (December 2014)
Abstract
This paper presents new adaptive filtering techniques used in speech enhancement system. Adaptive filtering schemes are subjected to different trade-offs regarding their steady-state misadjustment, speed of convergence, and tracking performance. Fractional Least-Mean-Square (FLMS) is a new adaptive algorithm which has better performance than the conventional LMS algorithm. Normalization of LMS leads to better performance of adaptive filter. Furthermore, convex combination of two adaptive filters improves its performance. In this paper, new convex combinational adaptive filtering methods in the framework of speech enhancement system are proposed. The proposed methods utilize the idea of normalization and fractional derivative, both in the design of different convex mixing strategies and their related component filters. To assess our proposed methods, simulation results of different LMS-based algorithms based on their convergence behavior (i.e., MSE plots) and different objective and subjective criteria are compared. The objective and subjective evaluations include examining the results of SNR improvement, PESQ test, and listening tests for dual-channel speech enhancement. The powerful aspects of proposed methods are their low complexity, as expected with all LMS-based methods, along with a high convergence rate.
M. Bashirpour, M. Geravanchizadeh,
Volume 12, Issue 3 (September 2016)
Abstract
Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its performance in emotion recognition using clean and noisy speech materials and compare it with the performances of the well-known MFCC, LPCC, RASTA-PLP, and also TEMFCC features. Speech samples are extracted from the Berlin emotional speech database (Emo DB) and Persian emotional speech database (Persian ESD) which are corrupted with 4 different noise types under various SNR levels. The experiments are conducted in clean train/noisy test scenarios to simulate practical conditions with noise sources. Simulation results show that higher recognition rates are achieved for PNCC as compared with the conventional features under noisy conditions.