45 - Beyond the Patterns - Karen Livescu (TTI Chicago): Learning Speech Models from Multi-modal Data - A survey/ClipID:39111 vorhergehender Clip nächster Clip

Aufnahme Datum 2021-12-09

Kurs-Verknüpfung

Beyond the Patterns

Sprache

Englisch

Einrichtung

Lehrstuhl für Informatik 5 (Mustererkennung)

Produzent

Friedrich-Alexander-Universität Erlangen-Nürnberg

We have the great honor to welcome Karen Livescu to our lab for an invited presentation!

Abstract: Speech is usually recorded as an acoustic signal, but it often appears in context with other signals. In addition to the acoustic signal, we may have available a corresponding visual scene, the video of the speaker, physiological signals such as the speaker’s movements or neural recordings, or other related signals. It is often possible to learn a better speech model or representation by considering the context provided by these additional signals, or to learn with less training data. Typical approaches to training from multi-modal data are based on the idea that models or representations of each modality should be in some sense predictive of the other modalities. Multi-modal approaches can also take advantage of the fact that the sources of noise or nuisance variables are different in different measurement modalities, so an additional (non-acoustic) modality can help learn a speech representation that suppresses such noise. This talk will survey several lines of work in this area, both older and newer. It will cover some basic techniques from machine learning and statistics, as well as specific models and applications for speech.

Short Bio: Karen Livescu is an Associate Professor at TTI-Chicago. She completed her PhD in electrical engineering and computer science at MIT. Her main research interests are in speech and language processing, as well as related problems in machine learning. Some specific interests include multi-view representation learning, visually grounded speech models, acoustic word embeddings, new models for speech recognition and understanding, unsupervised and weakly supervised models for speech and text, and sign language recognition from video. Her professional activities include serving as a program chair of ICLR 2019, ASRU 2015/2017/2019, and Interspeech 2022, and on the editorial boards of IEEE OJ-SP and IEEE TPAMI. She is an ISCA fellow and an IEEE SPS Distinguished Lecturer.

Register for more upcoming talks here!

References

Karen's Talk at Interspeech

This video is released under CC BY 4.0. Please feel free to share and reuse.

For reminders to watch the new video follow on Twitter or LinkedIn. Also, join our network for information about talks, videos, and job offers in our Facebook and LinkedIn Groups.

Music Reference: 
Damiano Baldoni - Thinking of You (Intro)
Damiano Baldoni - Poenia (Outro)

Mehr Videos aus der Kategorie "Technische Fakultät"

2024-04-23
Studon
geschützte Daten  
2024-04-22
Studon
geschützte Daten  
2024-04-23
Frei
freie Daten  
2024-04-23
Frei
freie Daten  
2024-04-23
IdM-Anmeldung
geschützte Daten  
2024-04-22
Studon
geschützte Daten