Fusion Framework for Video Event Recognition

Qiao Ma, Baptiste Fosty, Carlos F. Crispim-Junior, and François Brémond


Object recognition and motion, architecture and implementation, multisensor fusion


This paper presents a multisensor fusion framework for video activities recognition based on statistical reasoning and D-S evidence theory. Precisely, the framework consists in the combination of the events’ uncertainty computation with the trained database and the fusion method based on the conflict management of evidences. Our framework aims to build Multisensor fusion architecture for event recognition by combining sensors, dealing with conflicting recognition, and improving their performance. According to a complex event’s hierarchy, Primitive state is chosen as our target event in the framework. A RGB camera and a RGB-D camera are used to recognise a person’s basic activities in the scene. The main convenience of the proposed framework is that it firstly allows adding easily more possible events into the system with a complete structure for handling uncertainty. And secondly, the inference of Dempster-Shafer theory resembles human perception and fits for uncertainty and conflict management with incomplete information. The cross-validation of real-world data (10 persons) is carried out using the proposed framework, and the evaluation shows promising results that the fusion approach has an average sensitivity of 93.31% and an average precision of 86.7%. These results are better than the ones when only one camera is used, encouraging further research focusing on the combination of more sensors with more events, as well as the optimization of the parameters in the framework for improvements.

Important Links:

Go Back