A Low Power, Fully Event-Based Gesture Recognition System

TLDR

The Dynamic Vision Sensor transmits data only on pixel changes, enabling sparse asynchronous operation that can reduce power compared to frame‑based cameras, but this advantage is lost when conventional synchronous processors interpret the event stream. The study aims to develop the first end‑to‑end gesture recognition system on event‑based hardware that uses a TrueNorth neurosynaptic processor to recognize hand gestures in real time at low power from live DVS streams. The system processes live DVS event streams on the TrueNorth neurosynaptic processor, which contains one million spiking neurons and operates natively on event‑based data. The TrueNorth‑based convolutional neural network identifies gesture onset in 105 ms while consuming under 200 mW and achieves 96.5 % out‑of‑sample accuracy on a new DvsGesture dataset of 11 hand gestures from 29 subjects across three lighting conditions.

Abstract

We present the first gesture recognition system implemented end-to-end on event-based hardware, using a TrueNorth neurosynaptic processor to recognize hand gestures in real-time at low power from events streamed live by a Dynamic Vision Sensor (DVS). The biologically inspired DVS transmits data only when a pixel detects a change, unlike traditional frame-based cameras which sample every pixel at a fixed frame rate. This sparse, asynchronous data representation lets event-based cameras operate at much lower power than frame-based cameras. However, much of the energy efficiency is lost if, as in previous work, the event stream is interpreted by conventional synchronous processors. Here, for the first time, we process a live DVS event stream using TrueNorth, a natively event-based processor with 1 million spiking neurons. Configured here as a convolutional neural network (CNN), the TrueNorth chip identifies the onset of a gesture with a latency of 105 ms while consuming less than 200 mW. The CNN achieves 96.5% out-of-sample accuracy on a newly collected DVS dataset (DvsGesture) comprising 11 hand gesture categories from 29 subjects under 3 illumination conditions.

References

Page 1

	Year	Citations

Page 1