The following is a special contribution to this blog by CCC Executive Council Member Mark D. Hill of the University of Wisconsin-Madison.
Even with the slowing of Moore’s Law and the end of Dennard scaling, computer chips can still get dramatically better performance—without dramatically more power—by using specialized “accelerator” blocks to perform key tasks much faster (> 100x) and/or at lower power. Classic accelerators include floating-point hardware (a separately chip back in the days of the Intel 8087), graphics processing units (GPUs), and field-programmable gate arrays (FPGAs).
The recent explosion in the progress and importance of deep learning makes artificial neural networks a promising target for hardware acceleration. To this end, at least NINE papers at the recent International Symposium on Computer Architecture (ISCA 2016) in Seoul, Korea, targeted neural network acceleration through reducing the cost of computation, storage, and/or communication. Ideas included eliminating zeros (like sparse matrices), using low-precision or even analog values, exploiting emerging memory technologies, incorporating predication and pruning, minimizing data movement, leveraging 3D die stacking, and developing a new instruction set architecture. To find these papers, please see the Main Program and look for creatively-named sessions: Neural Networks 1, Neural Networks 2, and Neural Networks 3.