Division of
Natural and Applied Sciences

Research offers improvements to voice conversion technology

Research produced by a student-professor team from Duke Kunshan University was presented at the International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022, the flagship event of the Institute of Electrical and Electronics Engineers’ Signal Processing Society. First authored by Haozhe Zhang, a Class of 2022 student who majored in data science, and supervised by Ming Li, associate professor of electrical and computer engineering, the research offers an improved approach to voice conversion, a technology that converts one person’s voice to sound like someone else while keeping the linguistic content unchanged.

Haozhe Zhang

Titled, “SIG-VC: A Speaker Information Guided Zero-Shot Voice Conversion System for Both Human Beings and Machines,” the six-month long research project, used technology to develop a new method of voice conversion that disentangles spoken linguistic content from the speaker’s voice better than traditional systems, reducing the trade-off between keeping linguistic content accurate and simulating the sound of the target speaker.

The study’s authors, who also included DKU research interns Zexin Cai and Xiaoyi Qin, were invited to present their work at this year’s International Conference on Acoustics, Speech and Signal Processing, which combined a virtual event from May 7 to 13 and an in-person gathering from May 22 to 27 in both Singapore and Shenzhen, China.

Dr. Ming Li’s lab held the 2019 Symposium on Speaker Recognition Research and Applications

“I really liked the working style in our lab,” said Zhang. “While there may exist potential competition in research, Dr. Li Ming always emphasizes the importance of cooperation. The graduate students in the team were very patient with me and respected my ideas even when I was yet to fully understand some research details, which allowed me to develop my independent thinking.”

On-campus science event co-hosted by Dr. Ming Li’s Lab and DKU Library on April 2, 2019

Li said the research was “meaningful” for the video and entertainment industries where voice conversion systems are commonly used. He also praised Zhang for his laboratory work on applying voice conversion technology to electrolarynx devices that assist people

whose biological larynxes are removed, which he said had won recognition by the Department of Ophthalmology and Otorhinolaryngology at Kunshan First People’s Hospital, a collaborator in the research.

Zhang, who was a member of DKU’s inaugural Class of 2022, hopes to continue his research in the field of speech technology at Carnegie Mellon University, in Pittsburgh, Pennsylvania, where he has been accepted onto a master’s degree program.

Author: Ge Gao