Academic Lecture—Application of Spatial Information Deep Learning in Speech Enhancement

Author：Administrator Source：website Time：2025-06-15 12:00:00

Presentation Time: 10:30 AM, June 15, 2025 (Sunday)

Venue: Conference Room B621, School of Electronic Information, Wuhan University (Yu Gang & Song Xiao Building)

Presentation Title: Application of Spatial Deep Learning in Speech Enhancement

Presentator: Dr. Li Xiaofei

Inviter: Professor Huang Gongping

Abstract:

When recording speech signals using microphone arrays, speech enhancement techniques aim to suppress interfering signals such as noise, reverberation, and interfering speakers. The key to speech enhancement lies in leveraging spectral and/or spatial information to distinguish speech from interfering signals. Spectral information refers to the characteristic patterns of a signal's spectrum and has been widely exploited in the era of deep learning. Spatial information, on the other hand, refers to information about the signal's propagation path and sound field. Current research in this field focuses on combining spectral deep learning methods with traditional linear spatial filters (such as beamforming). This presentation will introduce our recent research on how to use neural networks to directly learn spatial information for end-to-end speech enhancement. We have designed a series of narrowband and cross-band networks based on both narrowband and cross-band modeling of spatial information. These networks can be interpreted, to some extent, as nonlinear spatial filters, and their performance significantly outperforms traditional linear filtering methods.

About the presenter:

Dr. Xiaofei Li has been an Assistant Professor at Westlake University since March 2020. Prior to that, he was a Postdoctoral Fellow in the PERCEPTION team at INRIA Grenoble Rhône-Alpes Research Center in France from February 2014 to January 2016, and a Junior Research Scientist from February 2016 to December 2019. He received his Ph.D. in Electronics from Peking University in 2013. Dr. Li has a long-standing research interest in acoustics, audio, and speech signal processing. His research focuses on speech enhancement, sound source localization and tracking, and semi-supervised and self-supervised learning methods for audio and speech. His research focuses on combining deep learning methods with multi-channel signal processing theory to enhance the perception capabilities of intelligent speech systems in complex sound environments.