Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Dialocalization: Acoustic speaker diarization and visual localization as joint optimization problem

Dialocalization: Acoustic speaker diarization and visual localization as joint optimization problem Dialocalization: Acoustic Speaker Diarization and Visual Localization as Joint Optimization Problem GERALD FRIEDLAND, International Computer Science Institute CHUOHAO YEO, Institute for Infocomm Research HAYLEY HUNG, IDIAP Research Institute The following article presents a novel audio-visual approach for unsupervised speaker localization in both time and space and systematically analyzes its unique properties. Using recordings from a single, low-resolution room overview camera and a single far- eld microphone, a state-of-the-art audio-only speaker diarization system (speaker localization in time) is extended so that both acoustic and visual models are estimated as part of a joint unsupervised optimization problem. The speaker diarization system rst automatically determines the speech regions and estimates œwho spoke when,  then, in a second step, the visual models are used to infer the location of the speakers in the video. We call this process œdialocalization.  The experiments were performed on real-world meetings using 4.5 hours of the publicly available AMI meeting corpus. The proposed system is able to exploit audio-visual integration to not only improve the accuracy of a state-of-the-art (audio-only) speaker diarization, but also adds visual speaker localization at little incremental engineering and computation costs. The combined algorithm has different properties, such as increased robustness, http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) Association for Computing Machinery

Dialocalization: Acoustic speaker diarization and visual localization as joint optimization problem

Loading next page...
 
/lp/association-for-computing-machinery/dialocalization-acoustic-speaker-diarization-and-visual-localization-e7al5KwELK
Publisher
Association for Computing Machinery
Copyright
Copyright © 2010 by ACM Inc.
ISSN
1551-6857
DOI
10.1145/1865106.1865111
Publisher site
See Article on Publisher Site

Abstract

Dialocalization: Acoustic Speaker Diarization and Visual Localization as Joint Optimization Problem GERALD FRIEDLAND, International Computer Science Institute CHUOHAO YEO, Institute for Infocomm Research HAYLEY HUNG, IDIAP Research Institute The following article presents a novel audio-visual approach for unsupervised speaker localization in both time and space and systematically analyzes its unique properties. Using recordings from a single, low-resolution room overview camera and a single far- eld microphone, a state-of-the-art audio-only speaker diarization system (speaker localization in time) is extended so that both acoustic and visual models are estimated as part of a joint unsupervised optimization problem. The speaker diarization system rst automatically determines the speech regions and estimates œwho spoke when,  then, in a second step, the visual models are used to infer the location of the speakers in the video. We call this process œdialocalization.  The experiments were performed on real-world meetings using 4.5 hours of the publicly available AMI meeting corpus. The proposed system is able to exploit audio-visual integration to not only improve the accuracy of a state-of-the-art (audio-only) speaker diarization, but also adds visual speaker localization at little incremental engineering and computation costs. The combined algorithm has different properties, such as increased robustness,

Journal

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)Association for Computing Machinery

Published: Nov 1, 2010

There are no references for this article.