Behind the Paper

Analogue in-memory computing coming of age

IBM introduces the IBM HERMES project chip: a fully integrated in-memory compute chip comprising 64 analogue in-memory computing cores, digital processing units and an on-chip communication network.

Published in Electrical & Electronic Engineering

Aug 10, 2023

Athanasios Vasilopoulos and Julian Büchel

2 contributors

Analogue in-memory computing coming of age

Liked by Abbas Rahimi and 3 others

Explore the Research

Deep neural networks (DNNs) are at the forefront of the field of artificial intelligence (AI) due to their super-human capabilities in various cognitive tasks such as image recognition and text generation. However, the key challenge is that of running DNNs in a performant and energy efficient manner. While there have been significant advancements in specialized AI hardware, the shuffling of data between memory and processing units poses as the main hurdle towards that goal.

A radically different approach to addressing this challenge is by localizing the synaptic weights in terms of the conductance values of nanoscale resistance-based (memristive) storage devices and performing the synaptic computations in the memory arrays themselves. This architecture, called analogue in-memory computing (AIMC), like the human brain, obviates data movement between memory and processing units and therefore consumes significantly less energy. Moreover, the synaptic computations are performed in the analogue domain, exploiting circuit laws, for fast and energy efficient computations.

However, there are still key questions to be answered. Can this approach provide sufficient compute precision such that application accuracies are not impacted? Is it possible to seamlessly integrate analogue compute with digital compute, as a pathway towards full end-to-end realization of DNNs on-chip?

In our article, featured in Nature Electronics, we answer these questions by unveiling the first of its kind fully integrated mixed-signal in-memory compute chip based on back-end integrated phase-change memory (PCM) in a 14-nm complementary metal-oxide-semiconductor (CMOS) process. The chip comprises 64 AIMC cores (or tiles), each with a memory array of 256x256 unit cells. The unit cells are constructed with four PCM devices for a total of over 16M devices. In addition to the analogue memory array, each core contains a light digital processing unit performing activation functions, accumulations, and scaling operations. A global digital processing unit is placed in the middle of the chip for heavier digital computations associated with LSTM networks. Finally, an on-chip communication network enables core-to-core movement of partial results to enable accumulations.

Illustration of the IBM HERMES Project chip from a high-resolution optical micrograph.

The chip demonstrates massively parallel execution, with a maximal throughput of 63.1 tera-operations per second (TOPS), enabled by compact current-controlled oscillator Analog-to-Digital Converters (ADCs) placed on each unit-cell row. In the high-precision mode, using 4-phase read operation, the chip achieves 16.1 TOPS and the measured MVM throughput per area of the chip, 0.4 TOPS/mm²,is more than 15 times higher than comparable multi-core resistive memory AIMC chips.

The challenge in using PCM devices, however, is achieving sufficient computational precision. We characterized all 64 cores of the chip and demonstrated that they achieve an effective MVM precision between 3 and 4 bits. This is attributed to the proper on-chip ADC calibration circuits that remove unwanted nonlinear response. To assess whether the state-of-the-art MVM accuracy translates to inference accuracy, we implemented a variety of networks, covering all usable features of the chip.

First, we implemented a convolutional neural network for CIFAR-10 image classification. The network’s weights were programmed on 40 AIMC cores and the auxiliary network operations, namely batch normalization and ReLU activation, were implemented in the light digital processing units of the cores. Furthermore, accumulations of partial MVM results were also executed in the digital processing units, using the on-chip links to move the data between cores. The CIFAR-10 test accuracy achieved by the chip, 92.81%, is the highest amongst currently reported chips using similar technology.

In the next network, we aimed to answer the question “What is the achievable accuracy on a workload that utilizes all features of the chip?”. To answer that, we employed an LSTM network to generate captions of images in the Flickr8k dataset. The network utilizes all 64 cores, all digital processing units, and the on-chip communication network. In this case, the hardware accuracy, measured in BLEU scores, exactly matches the software floating-point baseline, despite of the full utilization of the chip.

As most fields transition towards AI-based technologies, the need for fast, low-power, and accurate inference hardware will be paramount. With the IBM HERMES project chip, we demonstrated that Analog-AI is a viable alternative to the conventional digital accelerator approaches. While further advances are required to mature the technology and move towards chips that can execute end-to-end inference workloads, this work demonstrates many of the building blocks needed to realize the full potential of Analog-AI.

Multiple Contributors

Athanasios Vasilopoulos and Julian Büchel

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Electrical and Electronic Engineering

Technology and Engineering > Electrical and Electronic Engineering

Nature Electronics

Nature Electronics

This journal publishes both fundamental and applied research across all areas of electronics, from the study of novel phenomena and devices, to the design, construction and wider application of electronic circuits.

More about the journal

JMCR: Clinical Reasoning From Case Reports

Behind the Paper

When More Blood Does Not Mean More Neuronal Energy: A Hidden Paradox of REM Sleep

Behind the Paper

Structural and Functional Insights into Akkermansia Sulfatases

Behind the Paper

From Static Efficiency to Predictive Intelligence: A Review of Data Envelopment Analysis Integrated with Machine Learning

Behind the Paper

Behind the Paper: The Key Role of Lone-pair-electrons in Governing Gas-recognizable Flexibility of MOFs for Gas Separation and the Aqueous Scalable Synthesis

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

Analogue in-memory computing coming of age

Share this post

Share with...

...or copy the link