M. Geravanchizadeh, S. Ghalami Osgouei,
Volume 10, Issue 4 (12-2014)
Abstract
This paper presents new adaptive filtering techniques used in speech enhancement system. Adaptive filtering schemes are subjected to different trade-offs regarding their steady-state misadjustment, speed of convergence, and tracking performance. Fractional Least-Mean-Square (FLMS) is a new adaptive algorithm which has better performance than the conventional LMS algorithm. Normalization of LMS leads to better performance of adaptive filter. Furthermore, convex combination of two adaptive filters improves its performance. In this paper, new convex combinational adaptive filtering methods in the framework of speech enhancement system are proposed. The proposed methods utilize the idea of normalization and fractional derivative, both in the design of different convex mixing strategies and their related component filters. To assess our proposed methods, simulation results of different LMS-based algorithms based on their convergence behavior (i.e., MSE plots) and different objective and subjective criteria are compared. The objective and subjective evaluations include examining the results of SNR improvement, PESQ test, and listening tests for dual-channel speech enhancement. The powerful aspects of proposed methods are their low complexity, as expected with all LMS-based methods, along with a high convergence rate.
M. Bashirpour, M. Geravanchizadeh,
Volume 12, Issue 3 (9-2016)
Abstract
Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its performance in emotion recognition using clean and noisy speech materials and compare it with the performances of the well-known MFCC, LPCC, RASTA-PLP, and also TEMFCC features. Speech samples are extracted from the Berlin emotional speech database (Emo DB) and Persian emotional speech database (Persian ESD) which are corrupted with 4 different noise types under various SNR levels. The experiments are conducted in clean train/noisy test scenarios to simulate practical conditions with noise sources. Simulation results show that higher recognition rates are achieved for PNCC as compared with the conventional features under noisy conditions.
Mohammad Hasheminejad,
Volume 19, Issue 4 (12-2023)
Abstract
The Nonparametric Speech Kernel (NSK), a nonparametric kernel technique, is presented in this study as a novel way to improve Speech Emotion Recognition (SER). The method aims to effectively reduce the size of speech features to improve recognition accuracy. The proposed approach addresses the need for efficient and compact low-dimensional features for speech emotion recognition. Having acknowledged the intrinsic distinctions between speech and picture data, we have refined the Kernel Nonparametric Weighted Feature Extraction (KNWFE) formulation to suggest NSK, which is especially intended for speech emotion identification. The output of NSK can be used as input features for deep learning models such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), or hybrid architectures. In deep learning, NSK can also be used as a kernel function for kernel-based methods such as kernelized support vector machines (SVM) or kernelized neural networks. Our tests demonstrate that NSK outperforms current techniques, outperforming the best-tested approach by 5.02% and 3.05%, respectively, with an average accuracy of 96.568% for the Persian speech emotion dataset and 82.56% for the Berlin speech emotion dataset.
Pedram Yamini, Fatemeh Daneshfar, Abuzar Ghorbani,
Volume 20, Issue 4 (11-2024)
Abstract
With the exponential growth of unstructured data on the Web and social networks, extracting relevant information from multiple sources; has become increasingly challenging, necessitating the need for automated summarization systems. However, developing machine learning-based summarization systems largely depends on datasets, which must be evaluated to determine their usefulness in retrieving data. In most cases, these datasets are summarized with humans’ involvement. Nevertheless, this approach is inadequate for some low-resource languages, making summarization a daunting task. To address this, this paper proposes a method for developing the first abstractive text summarization corpus with human evaluation and automated summarization model for the Sorani Kurdish language. The researchers compiled various documents from information available on the Web (rudaw), and the resulting corpus was released publicly. A customized and simplified version of the mT5-base transformer was then developed to evaluate the corpus. The model's performance was assessed using criteria such as Rouge-1, Rouge-2, Rouge-L, N-gram novelty, manual evaluation and the results are close to reference summaries in terms of all the criteria. This unique Sorani Kurdish corpus and automated summarization model have the potential to pave the way for future studies, facilitating the development of improved summarization systems in low-resource languages.