IWES 2018 - 3rd Italian Workshop on Embedded Systems

Presentation title

Energy-Efficient Deep Learning

Authors

Valentino Peluso, Daniele Jahier Pagliari, Andrea Calimera, Massimo Poncino and Enrico Macii

Institution(s)

Politecnico di Torino, Italy

Presentation type

Technical presentation

Abstract

Deep Neural Network (DNN) models have reached state-of-the-art performance in many machine learning tasks, from image classification to machine translation. However, due to their significant complexity, DNN-based inference tasks involved in mobile and Internet of Things (IoT) applications are typically offloaded to high-performance could-based systems. In many instances this approach is sub-optimal, as significant benefits in terms of latency, energy consumption and network band-width can be obtained if DNNs are evaluated directly on IoT edge nodes. This calls for implementations of deep learning models that can run in resource limited environments with low energy footprints. Both academia and industry have recently investigated these aspects, coming up with specialized network models and hardware accelerators for energy-constrained DNN inference. Fixed-point arithmetic has been demonstrated to be one of the most promising technique to achieve energy-efficiency in DNN accelerators, guaranteeing the same accuracy of floating-point models in the inference task. The advantages of integer arithmetic are twofold: i) reduce model size, hence the memory footprint ii) improve latency. Thanks to the error resilience of neural networks models, their performance degrades gracefully when internal operations are performed at reduced precision. In state-of-the-art approaches, bit-widths are set statically for a given network. We present a framework to implement adaptive DNNs where the bit-widths are changed dynamically at run-time. We resorted to two different strategies: i) input-dependent dynamic bit-width reconfiguration ii) energy-aware precision scaling. In the first technique, occasional hard-to-classify inputs are identified, for which a larger bit-width is used. Using data from a real deep learning accelerator chip, we show that 50% energy reduction can be achieved with respect to a static bit-width selection, with less than 1% top-1 accuracy loss. The second method presents an energy-driven optimization that delivers per-layer precision scaling under a user-defined accuracy constraint. The tool is conceived for accelerators that dynamically adapt their energy and accuracy through multi-precision and variable latency MAC instructions. Simulation results collected on different DNNs show substantial energy savings and improved energy-accuracy tradeoffs w.r.t. conventional fixed-point networks.

Additional material

Presentation slides: [pdf]

For more details on this presentation please click the button below:

Additional material on this presentation