Joint Reasoning of Visual and Text Data for Safety Hazard Recognition

Abstract

Hazard recognition is the first step in mitigating safety incidents on a construction site. The unprecedented growth in jobsite visual data provides a unique opportunity to leverage computer vision systems to handle various aspects of safety inspections. Such systems can be widely deployed on jobsites for safety monitoring and safety performance measurement. They also provide a semantic-rich object-oriented way to represent safety hazards to as-built BIM and VR-based safety training. To achieve this goal, one needs to properly introduce safety domain knowledge in the design of computer vision systems. Towards this end, this paper presents the design of a language-image framework that aims at understanding and detecting semantic roles of activities mentioned in safety rules. We define this task as visual safety checking. The framework, which is based on visual semantic role labeling, detects visual grounding of ‘verbs’ described in safety rules, such ‘verbs’ represent states or processes of visual activities. This framework includes (1) semantic parsing of safety rules, (2) training construction object detectors, and (3) training semantic role detectors. We present preliminary results in semantic safety rule parsing and the construction of a new visual safety dataset based on parsed construction objects. Preliminary results are presented from implementing two state-of-the-art object detectors on this dataset and the benefits and the limitations are discussed in details.