μLayer - Concepedia

Abstract

Emerging mobile services heavily utilize Neural Networks (NNs) to improve user experiences. Such NN-assisted services depend on fast NN execution for high responsiveness, demanding mobile devices to minimize the NN execution latency by efficiently utilizing their underlying hardware resources. To better utilize the resources, existing mobile NN frameworks either employ various CPU-friendly optimizations (e.g., vectorization, quantization) or exploit data parallelism using heterogeneous processors such as GPUs and DSPs. However, their performance is still bounded by the performance of the single target processor, so that realtime services such as voice-driven search often fail to react to user requests in time. It is obvious that this problem will become more serious with the introduction of more demanding NN-assisted services.

References

Page 1

	Year	Citations

Page 1