This paper proposes a two-stage speaker identification structure, using the average value of first formant (Vi) and zero crossing rate as the parameters and then preliminarily clustering the speech data by the distributed fuzzy rules to eliminate unnecessary silence and consonants. In order to fasten the operation speed to achieve the real time system, we also use the genetic algorithm to screen out unnecessary fuzzy rules. The results of the experiment show that the distributed fuzzy rules do effectively cluster the data and have the adaptability to independent speakers. Also, in screening the fuzzy rules, the genetic algorithm can greatly eliminate unnecessary fuzzy rules, and the difference of the recognition rate is under 1%. After preliminarily screening the speech data, we use the back-propagation neural network as the last speaker recognition structure. Since the system has eliminated silence and less stable consonants, we find that, according to the results of the experiment, the whole recognition rate can also get well- improved. Furthermore, this two-stage recognition structure proposed in the paper makes speaker identification automatic.
第十屆計算語言學研討會論文集=Proceedings of ROCLING X International Conference 1997 Research on Computational Linguistics，頁300-315