45 - Beyond the Patterns - Karen Livescu (TTI Chicago): Learning Speech Models from Multi-modal Data

45 - Beyond the Patterns - Karen Livescu (TTI Chicago): Learning Speech Models from Multi-modal Data - A survey [ID:39111]

Dieser Clip ist ausschließlich für angemeldete Benutzer zugänglich.

Teil einer Videoserie :

Beyond the Patterns

Presenters

Prof. Dr. Andreas Maier

Zugänglich über

Nur für Portal

Dauer

01:04:50 Min

Aufnahmedatum

2021-12-09

Hochgeladen am

2021-12-09 13:36:04

Sprache

en-US

We have the great honor to welcome Karen Livescu to our lab for an invited presentation!

Abstract: Speech is usually recorded as an acoustic signal, but it often appears in context with other signals. In addition to the acoustic signal, we may have available a corresponding visual scene, the video of the speaker, physiological signals such as the speaker’s movements or neural recordings, or other related signals. It is often possible to learn a better speech model or representation by considering the context provided by these additional signals, or to learn with less training data. Typical approaches to training from multi-modal data are based on the idea that models or representations of each modality should be in some sense predictive of the other modalities. Multi-modal approaches can also take advantage of the fact that the sources of noise or nuisance variables are different in different measurement modalities, so an additional (non-acoustic) modality can help learn a speech representation that suppresses such noise. This talk will survey several lines of work in this area, both older and newer. It will cover some basic techniques from machine learning and statistics, as well as specific models and applications for speech.

Short Bio: Karen Livescu is an Associate Professor at TTI-Chicago. She completed her PhD in electrical engineering and computer science at MIT. Her main research interests are in speech and language processing, as well as related problems in machine learning. Some specific interests include multi-view representation learning, visually grounded speech models, acoustic word embeddings, new models for speech recognition and understanding, unsupervised and weakly supervised models for speech and text, and sign language recognition from video. Her professional activities include serving as a program chair of ICLR 2019, ASRU 2015/2017/2019, and Interspeech 2022, and on the editorial boards of IEEE OJ-SP and IEEE TPAMI. She is an ISCA fellow and an IEEE SPS Distinguished Lecturer.

References

Karen's Talk at Interspeech

This video is released under CC BY 4.0. Please feel free to share and reuse.

For reminders to watch the new video follow on Twitter or LinkedIn. Also, join our network for information about talks, videos, and job offers in our Facebook and LinkedIn Groups.

Music Reference:
Damiano Baldoni - Thinking of You (Intro)
Damiano Baldoni - Poenia (Outro)