Y. Denda, T. Nishiura, and Y. Yamashita (Japan)
Robust talker localization, Audio-visual data fusion, Weighted CSP analysis, CSP coefficient subtraction, Back ground subtraction, Skin color detection.
This paper proposes a novel, robust, omnidirectional, audio-visual talker localizer that not only exploits audio feature parameters, but also subordinately uses visual fea ture parameters. To achieve omnidirectional audio-visual talker localization, we used a pair of omnidirectional mi crophones as an audio sensor, and an omnidirectional cam era as a visual sensor. For robust audio-visual talker lo calization, the audio feature parameters are extracted us ing weighted cross-power spectrum phase (CSP) analysis and CSP coefficient subtraction, and visual feature param eters are extracted using background subtraction and skin color detection. The talker is finally located by the fusing of weighted audio / visual feature parameters, and without any pre-training, the weight of this feature parameter fusion is automatically controlled based on the reliable criterion of audio feature parameters. The results of talker localization experiments in an actual room confirmed that the localiza tion achieved with the proposed audio-visual localizer is superior to that of conventional localizers that only use au dio feature parameters or visual feature parameters, but not both.
Important Links:
Go Back