Behind the Paper

Analog AI: Training larger-scale DNNs for deployment on future analog in-memory computing hardware without accuracy loss

Published in Electrical & Electronic Engineering

Oct 03, 2023

Malte Rasch

Research Staff Member, IBM

Liked by India Ambler and 1 other

Explore the Research

The ever-increasing compute needed to use deep neural networks (DNNs) have made hardware latency and energy efficiency a growing concern. However, conventional processor architectures (such as CPUs or GPUs) incessantly transfer data between memory and processing through the ‘von Neumann bottleneck’, inducing time and energy overheads that significantly degrade latency and energy efficiency.

Analog in-memory computing (AIMC) using non-volatile memory (NVM) elements is a promising mixed-signal approach for DNN acceleration [1,2,3]. Here, weights of a deep neural network are stored using crossbar arrays of tunable conductive elements. This enables approximate matrix-vector computation directly in-memory, by applying activation vectors (as voltages or pulse durations) to the crossbar array, and then reading out analog physical quantities (instantaneous current or accumulated charge) [4,5,6]. As a ‘non-von Neumann’ architecture, AIMC performs matrix-vector multiplications (MVMs) at the location of the stored weights, in a highly-parallel, fast, and energy-efficient manner [5] – but only approximately.

In general, this approximative computing means that when DNNs – pre-trained using conventional digital hardware – are directly programmed onto the analog crossbar arrays, the resulting accuracy will drop significantly. This situation is similar to other AI acceleration approaches, such as reduced-precision digital AI accelerators, where DNNs (and thus MVMs) are heavily quantized and thus approximated. One solution is to re-train these DNNs with these approximations in mind in software, making inference of the DNNs more robust to quantization. This approach was very successful for quantized DNNs, where the approximations are very well defined, namely the inaccuracies induced by the quantization of formerly real-valued weights and activations. Typically, the DNN accuracy can be (almost) fully recovered to the original accuracy before quantization.

In a recent paper [7], we investigated applying a similar approach for analog in-memory computing. The challenging part is that because MVMs are computed with physical quantities in AIMC, such as electric current and conductance of real material substrates, the approximations are much more difficult to characterize and will also be highly stochastic both across chip replications and computation repeats.

In our study, we developed a model that characterizes the kind of approximations of typical AIMC hardware in an abstract manner, including temporal conductance fluctuations, material and electric effects, quantization and range clipping, and other nonidealities. We then improve on earlier approaches for pre-training with AIMC hardware in mind, which we call hardware-aware training. One key aspect is to inject the different noise sources into the pre-training process that are expected to happen during deployment of the DNN in analog hardware, while also being mindful of the inherent limitation of input-output dynamical ranges and other design constraints.

While earlier approaches to hardware-aware training for AIMC exist, they were typically limited to one or a few small toy DNN examples. More critically, hardware assumptions vary widely among studies, making the effectiveness of different approaches to hardware-aware training difficult to compare.

In our study, in a great simulation effort, we expanded greatly on the investigated DNN models in scale and number with the identical underlying hardware assumptions, so that one can for the first time compare quantitatively the suitability of the HWA training approach for AIMC per se, as well as the suitability of the DNNs of different AI domains and topologies in general, such as recurrent DNNs for speech- to-text translation, convolutional networks for image classification, or transformers on natural language understanding.

Interestingly, we find that the hardware-aware training approach is also very promising for larger DNNs, but there are differences for the robustness towards AIMC nonidealities depending on the DNN topology. For instance, recurrent DNNs (such as LSTMs and RNN-T), are very easily trained to nearly the same accuracy and perform generally very well, while convolutional networks are typically the most challenging.

We further looked at how much worse certain nonidealities could get to still support high accuracy and thus measuring the impact of different noise sources. These insights give valuable feedback to the chip designers that can target on reducing the nonidealities that are the most detrimental in terms of accuracy drop for future chip design improvements.

We hope that our “standard” AIMC model – which we have released as part of the open source software toolkit, IBM AI Hardware Acceleration Kit [8] based on Torch – will make it easier for the research community to compare and benchmark future algorithmic improvements for hardware-aware training even without having access to particular hardware chip prototype, which are currently under development in only a few selected big labs and companies (such as IBM’s low-power AI chip prototypes).

References

[1] G. W. Burr, R. M. Shelby, A. Sebastian, S. Kim, S. Kim, S. Sidler, K. Virwani, M. Ishii, P. Narayanan, A. Fumarola, L. L. Sanches, I. Boybat, M. Le Gallo, K. Moon, J. Woo, H. Hwang, and Y. Leblebici, “Neuromorphic computing using non-volatile memory,” Advances in Physics: X, vol. 2, no. 1, pp. 89–124, 2017.

[2] A. Sebastian, M. Le Gallo, R. Khaddam-Aljameh, and E. Eleftheriou, “Memory devices and applications for in-memory computing,” Nature Nanotechnology, vol. 15, pp. 529–544, 2020.

[3] G. W. Burr, A. Sebastian, T. Ando, and W. Haensch, “Ohm’s law plus Kirchhoff’s current law equals better AI,” IEEE Spectrum, vol. 58, no. 12, pp. 44–49, 2021.

[4] F. Merrikh-Bayat, X. Guo, M. Klachko, M. Prezioso, K. K. Likharev, and D. B. Strukov, “High-performance mixed-signal neurocomputing with nanoscale floating-gate memory cell arrays,” IEEE Trans. Neur. Netw. Learn. Sys., 2017.

[5] H.-Y. Chang, P. Narayanan, S. C. Lewis, N. C. P. Farinha, K. Hosokawa, C. Mackin, H. Tsai, S. Ambrogio, A. Chen, and G. W. Burr, “AI hardware acceleration with analog memory: micro-architectures for low energy at high speed,” IBM Journal of Research and Development, vol. 63, no. 6, pp. 1–14, 2019.

[6] B. Murmann, “Mixed-signal computing for deep neural network inference,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 29, no. 1, pp. 3–13, 2020.

[7] M. J. Rasch, C. Mackin, M. Le Gallo, A. Chen, A. Fasoli, F. Odermatt, N. Li, S. R. Nandakumar, P. Narayanan, H. Tsai, G. W. Burr, A. Sebastian, and V. Narayanan, “Hardware-aware training for large-scale and diverse deep learning inference workloads using in-memory computing-based accelerators,” Nature Communications, vol. 14, no. 1, pp. 5282, 2023.

[8] M. J. Rasch, D. Moreda, T. Gokmen, M. Le Gallo, F. Carta, C. Goldberg, K. E. Maghraoui, A. Sebastian, and V. Narayanan, “A flexible and fast pytorch toolkit for simulating training and inference on analog crossbar arrays,” IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), pp. 1–4, 2021.

Malte Rasch

Research Staff Member, IBM

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Electrical and Electronic Engineering

Technology and Engineering > Electrical and Electronic Engineering

Nature Communications

Nature Communications

An open access, multidisciplinary journal dedicated to publishing high-quality research in all areas of the biological, health, physical, chemical and Earth sciences.

More about the journal

Related Collections

With Collections, you can get published faster and increase your visibility.

Women's Health

A selection of recent articles that highlight issues relevant to the treatment of neurological and psychiatric disorders in women.

Publishing Model: Hybrid

Deadline: Ongoing

Explore this Collection

Biosensing

With this cross-journal Collection, the editors of Communications Biology, Nature Biomedical Engineering, Nature Sensors, Nature Communications, and Scientific Reports welcome the submission of primary research Articles focusing on the development of engineered biosensing devices with the potential to be applied in biomedical research and in the management of disease conditions.

Publishing Model: Hybrid

Deadline: Jun 30, 2026

Explore this Collection

Latest Content

Behind the Paper, Opportunities

When Food Choices Tell a Bigger Story: Understanding Food Literacy among Young Adults in India

Empower Your Research, ECR Hub

Understanding Peer Review

Behind the Paper

Modelling realistic dispersal reveals seabird responses to a rapidly changing climate.

Behind the Paper

When two proteomic platforms look at the same tumour sample, do they see the same biology?

Behind the Paper

The R² Illusion: Why Retrospective Curve-Fitting is Failing Nanomedicine Translation (And How D-PARMO Fixes It)

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

Analog AI: Training larger-scale DNNs for deployment on future analog in-memory computing hardware without accuracy loss

Share this post

Share with...

...or copy the link