Thomas Hueber

Thomas Hueber
Computational model of speech learning, a focus on the acoustic-articulatory mapping

Speech production is a complex motor process involving several physiological phenomena, such as the neural, nervous and muscular activities that drive our respiratory, laryngeal and articulatory movements. Modeling speech production, in particular the relationship between articulatory gestures (tongue, lips, jaw, velum) and acoustic realizations of speech, is a challenging, and still evolving, research question. From an applicative point of view, such models could be embedded into assistive devices able to restore oral communication when part of the speech production chain is damaged (articulatory synthesis, silent speech interface). They could also help rehabilitate speech sound disorders using a therapy based on biofeedback (and articulatory inversion). From a more fundamental research perspective, such models can also be used to question the cognitive mechanisms underlying speech learning, perception and motor control. In this talk, I will present three recent studies conducted in our group to address some of these fundamental questions. In the first one, we quantified the benefit of relying on lip movement when learning speech representations in a self-supervised manner using predictive coding techniques. In the second one, we integrated articulatory priors into the latent space of a variational auto-encoder, with potential application to speech enhancement. In the third one, I will describe a first attempt toward a computational model of speech learning, based on deep learning, which can be used to understand how a child learns the acoustic-to-articulatory inverse mapping in a self-supervised manner.


Thomas Hueber is a senior research scientist at CNRS (« Directeur de recherche ») working at GIPSA-lab in Grenoble, France. He is head of the CRISSP research team (cognitive robotics, interactive systems and speech processing). He holds a Ph.D. in Computer Science from Pierre and Marie Curie University (Paris) in 2009. His research activities focus on automatic speech processing, with a particular interest in (1) the capture, analysis and modeling of articulatory gestures and electrophysiological signals involved in its production, (2) the development of speech technologies that exploit these different signals, for speech recognition and synthesis, for people with a spoken communication disorder, and (3) the study, through modeling and simulation, of the cognitive mechanisms underlying speech perception and production. He received in 2011 the 6th Christian Benoit award (ISCA/AFCP/ACB) and in 2015 the ISCA Award for the best paper published in Speech Communication. In 2017, he co-edited in IEEE/ACM Trans. Audio Speech and Language Processing, a special issue on Biosignal-based speech processing. He is also associate editor of EURASIP Journal on Audio, Speech, and Music Processing.

Thursday, February 2, 2023 - 15:00