Explainable activity recognition in videos: Lessons learned

TLDR

Activity recognition can be performed with deep neural networks or conventional classifiers, but while neural networks achieve higher predictive performance, they lack interpretability and explainability. This study proposes a framework that combines a deep neural network with dynamic cutset networks to jointly reason about activity recognition and bridge the accuracy–explainability gap. The framework feeds DNN predictions into dynamic cutset networks, exploiting their polynomial‑time probabilistic reasoning to generate explainable activity assignments, and is evaluated through human‑subject studies on cooking‑video prototypes. The approach was validated on three cooking‑video prototypes that successfully handled human‑machine tasks of varying difficulty, demonstrating its effectiveness.

Abstract

Abstract We consider the following activity recognition task: given a video, infer the set of activities being performed in the video and assign each frame to an activity. This task can be solved using modern deep learning architectures based on neural networks or conventional classifiers such as linear models and decision trees. While neural networks exhibit superior predictive performance as compared with decision trees and linear models, they are also uninterpretable and less explainable. We address this accuracy‐explanability gap using a novel framework that feeds the output of a deep neural network to an interpretable, tractable probabilistic model called dynamic cutset networks, and performs joint reasoning over the two to answer questions. The neural network helps achieve high accuracy while dynamic cutset networks because of their polytime probabilistic reasoning capabilities make the system more explainable. We demonstrate the efficacy of our approach by using it to build three prototype systems that solve human‐machine tasks having varying levels of difficulty using cooking videos as an accessible domain. We describe high‐level technical details and key lessons learned in our human subjects evaluations of these systems.

References

Page 1

	Year	Citations

Page 1