University News

SYSU Makes Research Progress on Speakers’ Voiceprint Recognition in Big Data Era

Source: SYSU-CMU Joint Institute of Engineering
Written by: SYSU-CMU Joint Institute of Engineering
Edited by: Wang Dongmei

Recently, Dr. Ming Li of SYSU-CMU Joint Institute of Engineering (hereinafter referred to as JIE) and his team proposed an unsupervised learning framework for speaker verification, which is of great significance to refine the clustering labels in the big data era.

As one of the main sources for people to acquire information, voice is the most convenient, effective and natural communication tool and information carrier for people to communicate. With the comprehensive informatization of the society, especially the rapid development of communications, multimedia and Internet technologies, intelligent voice technology is becoming increasingly important. Therefore, one of the current research hotspots is to find methods that can verify speakers’ identity through voice signal more accurately.

The research group led by Dr. Ming Li presented an unsupervised learning framework for speaker verification where they seek to address the speaker verification problem without any given data labels. To automatically retrieve the speaker labels of unlabeled training data, the project team proposed to use Affinity Propagation (AP) - a clustering method that takes pairwise data similarity as an input - to generate temporary class labels. The obtained labels then can be used to train a so called “Probabilistic LDA” model in order to generate similarity score for pairwise speech samples. In addition, Ming’s group further fed such similarity score to the input of AP clustering, establishing an iterative framework that updates the PLDA model repeatedly. With the final PLDA model after several iterations, the system can accordingly verify whether the two speakers belong to the same identity. The project team also evaluated the performance of different PLDA scoring methods for the multiple-enrollment task. Experiments show that the proposed iterative and unsupervised PLDA model learning approach outperformed the cosine similarity baseline by more than 20%.

On the 9th International Symposium on Chinese Spoken Language Processing (ISCSLP 2014) and the 15th annual conference of the International Speech Communication Association (INTERSPEECH 2014) held in Singapore, Dr. Ming Li presented three papers about speakers’ voiceprint recognition. Among which, the paper titled “An Iterative Framework for Unsupervised Learning in the PLDA based Speaker Verification” co-authored by Wenbo Liu, Zhiding Yu and Dr. Ming Li won the award of Best Student Paper. Wenbo Liu is a first-year dual-degree Ph.D. student affiliated with the SYSU-CMU Joint Institute of Engineering and the Department of ECE, Carnegie Mellon University, advised by Dr. Ming Li. Zhiding Yu is a third-year Ph.D. candidate at the Department of ECE, Carnegie Mellon University.

Ph.D. programs in JIE are committed to cultivating research talents who explore in depth the theory, methodology, techniques and instruments in the field of electrical and computer engineering, so as to enrich and improve the knowledge system in electrical and computer engineering. Students participating in the Ph.D. JIE double-degree program will study at Carnegie Mellon’s Pittsburgh campus for two years and will receive two degrees upon graduation — one from Sun Yat-sen University and one from Carnegie Mellon University.