Novel approaches for robust speaker identification under noisy environments

機構典藏 > College of Engineering > Graduate Institute & Department of Electrical Engineering > Thesis > Item 987654321/35879

Please use this identifier to cite or link to this item: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/35879

Title:	Novel approaches for robust speaker identification under noisy environments
Other Titles:	雜訊環境下強健性語者辨認的新方法
Authors:	陳萬城;Chen, Wan-chen
Contributors:	淡江大學電機工程學系博士班謝景棠;Hsieh, Ching-tang
Keywords:	語者辨認;小波轉換;多層解析;特徵抽取;重要成份分析;高斯混和式模型;多階層向量量化;Speaker Identification;Wavelet transform;multi-resolution;feature extraction;principal component analysis(PCA);Gaussian mixture model(GMM);multi-stage vector quantization(MSVQ)
Date:	2009
Issue Date:	2010-01-11 07:16:02 (UTC+8)
Abstract:	當訓練環境與應用環境彼此不匹配時，語者辨認系統的辨識效能會嚴重下降。本論文主要針對語者辨認系統在環境不匹配所造成的問題，提出幾個改善強健性的技術。在語音特徵方面，提出一個多頻帶語音特徵抽取技術，利用離散小波轉換技術將語音訊號分解成幾個頻帶，並萃取出分佈於各個頻帶訊號的線性預估倒頻譜係數，最後在求出的語音特徵上作特徵向量正規化處理，以確保在不同的環境下能獲得相似的語音特徵。為有效利用所求出之多頻帶語音特徵，在辨識模型上我們提出幾種改良的方法。首先提出多頻帶特徵結合法與多頻帶機率結合法應用於高斯混和式模型。實驗顯示這兩種方法的辨識效能均優於使用線性預估倒頻譜係數與梅爾刻度倒頻譜係數語音特徵的高斯混和式模型。第二部分提出多頻帶二階向量量化模型。此辨識模型的量化誤差為每一個頻帶的二階向量量化器量化誤差總和。實驗顯示此一方法的辨識效能優於使用線性預估倒頻譜係數與梅爾刻度倒頻譜係數語音特徵的向量量化模型與高斯混和式模型的辨識架構。第三部分提出一改良型的多頻帶向量量化模型。此一辨識架構主要是利用分層處理的概念來消除不同頻帶間語音係數的干擾並以重要成份分析技術來表現各頻帶編碼簿的特性，使得所建構出的編碼簿更能有效描述音素的特性。實驗結果顯示此方法的辨識效能均優於先前所提的辨識模型。 The performance of speaker recognition system is seriously degraded due to mismatched condition between training and testing environments. This dissertation is mainly focused on some particular parts of the robustness issues of a speaker identification system. At first, a multi-band linear predictive cepstral coefficients (MBLPCC) speech feature is presented. Based on discrete wavelet transform (DWT) technique, the input speech signal is decomposed into various frequency subbands, and LPCC of the lower frequency subband for each decomposition process are calculated. Furthermore, cepstral domain feature vector normalization is applied to all computed features in order to provide similar parameter statistics in all acoustic environments. By using MBLPCC speech feature as the front-end of the speaker identification, three approaches are proposed to deal with the various robustness problems of a text-independent speaker identification system. Firstly, we use feature recombination and likelihood recombination methods in Gaussian mixture model (GMM) to evaluate the task of text-independent speaker identification. Experimental results show that both proposed methods achieve better performance than GMM using full-band LPCC and mel-scale frequency cepstral coefficients (MFCC) in noisy environments. Secondarily, we propose a multi-band two-stage vector quantization (VQ) as the recognition model. Various two-stage VQ classifiers are applied independently to each band, and then the errors of all two-stage VQ classifiers are combined to yield a total error. It is shown that the proposed method is more effective and robust than conventional VQ and GMM models using full-band LPCC and MFCC features. Thirdly, we propose a modified VQ as the identifier. This model uses the multi-layer concept to eliminate interference among multi-band speech features and then uses principal component analysis (PCA) technique to evaluate the codebooks in all bands for capturing a more detailed distribution of individual speaker’s phoneme characteristics. By evaluating the proposed method, we can see that the proposed method gives better performance than other recognition models proposed previously in both clean and noisy environments. Also, a satisfactory performance can be achieved in low signal-to-noise ratio (SNR) environments.
Appears in Collections:	[Graduate Institute & Department of Electrical Engineering] Thesis

Files in This Item:

File	Size	Format
	0Kb	Unknown	339	View/Open

Loading...