Abstract:

This proposal proposes the design of a high-speed accelerator that uses Resistive Random Access Memory (ReRAM) cells to perform analog computation. The accelerator will implement convolution neural networks or deep neural networks. The design is inspired by the reference paper [1]. The design of this accelerator has many advantages over conventional digital accelerators. The architecture yields improvements of 14.8×, 5.5×, and 7.5× in throughput, energy, and computational density respectively, over the state-of-the-art design.

Motivation:

Many recent efforts have been made to design accelerators for popular machine learning algorithms, such as those involving convolutional and deep neural networks (CNNs and DNNs). These algorithms typically involve a large number of multiply-accumulate (MAC) operations. The bulk of the operation of these accelerators consists of these MAC operations. The purpose of designing these accelerators is to speed up these MAC operations so we can have lower inference times.

Proposed Solution:

Typically, MAC operations are performed using adders. But recently advancements in technology have allowed us to use analog computation for speeding up these operations. Figure 1 explains how the addition and multiplication of vectors is made possible through Analog computation. This crossbar structure performs the 4x1 vector multiplication with a 4x4 matrix. This calculation is instant as there is no delay element in this structure. V represents the inout vectors and W represents the weights of the matrix. I is the ouput after multiplication

Crossbar architecture

Figure 1: Crossbar architecture for matrix multiplication using analog in-memory-computation

As seen in figure 2, we first convert our digital data into an analog signal using a DAC. After the signal is processed in the analog domain, it will reach a Sample and Hold circuit (S&H). The S&H circuit is used here to enable pipelining. The signal is then sent to an ADC where it is then converted back to a digital domain so that it can be read by the processor.

Crossbar with ADC and DAC

Figure 2: ReRAM crossbar with ADC and DAC (MAC unit)

To design our accelerators, we will use multiple vector-multiplication matrix cells and combine them to realize a single tile for our accelerator as shown in Figure 3. We will then combine the tiles to realize the whole accelerator chip. The accelerator will then be to the RISC-V processor, enabling us to load any popular algorithm on our chip.

MAC units

Figure 3 Block diagram for a single layer of the accelerator.
DRAM: Dynamic Random Access Memory, OR: Output Register, S + A: Shift and Add unit

Figure 4The block diagram for the accelerator chip

The project has a lot of room for novelty. There are many different ways and methods to make this chip more area efficient and error tolerant. One of the techniques that we will implement to realize energy-efficient, sign-aware, and robust deep neural network (DNN) processing is the two transistor two ReRAM (2T2R) technique which uses two transistors and two memristors to realize a single crossbar as shown in figure 6 [7].

Figure 6 2T2R Cell [7]

Table 1 shows the parameters of the chip

Table 1 Performance parameters

Performance Parameters of the chip
Area	< 5mm²
Power consumption	< 4 Watts
Computational efficiency	>363.7 GOPs/W
Operating frequency	1.2GHz

References:

Sharie, Ali, Anirban Nag, Naveen Murali Manohar, Rajeev Balasubramanian, John Paul Strachan, Miao Hu, R. Stanley Williams, and Vivek Sreekumar. "ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars." ACM SIGARCH Computer Architecture News 44, no. 3 (2016): 14-26.
Wang, Queen, Xinmin Wang, Seung Hwan Lee, Fan-Hsian Meng, and Wei D. Lu. "A deep neural-network accelerator based on tiled RRAM architecture." In 2019 IEEE international electron devices meeting (IEDM), pp. 14-4. IEEE, 2019.
Marek, K., Riga, F.O., Khedira, S. et al. On the design and analysis of a compact array with 1T1R RRAM memory element. Analog Integer Circ Sig Process 102, 27–37 (2020).
Ni, Leibniz, Yahoo Wang, Hao Yu, Wei Yang, Chu liang Weng, and Jun Feng Zhao. "An energy-efficient matrix multiplication accelerator by distributed in-memory computing on binary RRAM crossbar." In 2016 21st Asia and +South Pacific Design Automation Conference (ASP-DAC), pp. 280-285. IEEE, 2016.
Tang, Tianqi, Ligue Xia, Boxun Li, Yu Wang, and Huazhong Yang. "Binary convolutional neural network on RRAM." In 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 782-787. IEEE, 2017.
Zhu, Shenhua, Hanbok Sun, Yujun Lin, Guha Dai, Ligue Xia, Song Han, Yu Wang, and Huazhong Yang. "A configurable multi-precision CNN computing framework based on single bit RRAM." In 2019 56th ACM/IEEE Design Automation Conference (DAC), pp. 1-6. IEEE, 2019.
Zhou, Z., P. Huang, Y. C. Xiang, W. S. Shen, Y. D. Zhao, Y. L. Feng, B. Gao et al. "A new hardware implementation approach of BNNs based on nonlinear 2T2R synaptic cell." In 2018 IEEE International Electron Devices Meeting (IEDM), pp. 20-7. IEEE, 2018.

Owner

i212426 Talha Bin Azmat

Organization URL

http://isb.nu.edu.pk/rfcs2/

Description

We propose the design of a high-speed accelerator that uses Resistive Random Access Memory (ReRAM) cells to perform analog computation. The accelerator will implement convolution neural networks or deep neural networks. The design is inspired by the reference paper [1]. The design of this accelerator has many advantages over conventional digital accelerators. The architecture yields improvements of 14.8×, 5.5×, and 7.5× in throughput, energy, and computational density respectively, over the state-of-the-art design.

Git URL

https://github.com/Talhabinazmat-icd/ReRAM_DNN.git

Labels

chipIgnite

SSCS-22

Team Members:

Muhammad Uzair