Efabless Logo
Systolic...
public project

Executive Summary:

We are developing and implementing domain specific hardware Accelerator  for computation of Deep Neural Networks.

Computation of AI inference over systolic array architecture based Hardware accelarator should improve performance and efficiency  as compare to performance and efficiency achieved using general purpose processors.

 

Why we decided to make it:

Problem statement

Deep Neural Networks have their penetration among number of applications including robotics, self driving cars, IoT based gadgets and many more, which is only increasing.

 

For edge and real time applications these application are limited by speed and power energy requirements.

 

Trivial approach

These deep learning algorithms can be mapped to matrix multiplication which are computed over CPUs and GPUs using OpenBLAS, cuDNN etc.

Problem with Trivial Approach

These CPU and GPU based implementations require a lot of memory accesses which increase the energy requirements and lower the speed, both of which are undesirable.

 

Project Description:

Way to go:

Domain specific hardware using systolic array can exhibit high parallelism and decreased number of memory accesses while increasing the throughput and speed. Simple Specialized and low-cost computational hardware units communicate among themselves in a specific pattern exploiting spatial and temporal flow of processing maximizing data reuse. 

Computational Processing Element

Individual Computation block can perform MAC operation using following configuration of hardware. These units are interconnected to work together in special arrangement (systolic manner) to perform matrix multiplication

Computation Core

Matrix multiplication can be performed using following well known configuration of 2D interconnected systolic array. Where the data is provided to interconnected processing elements using shift register on the side in certain pattern. The matrix multiplication results will be stored in downward placed shift registers.

 

Auxiliary and Supporting Blocks

Diagram Below shows the individual block which can be used to implement matrix multiplication using systolic array based architecture while mapping deep neural network computation over matrix multiplication

 

 

This type of implementation with necessary software support is  capable of performing matrix multiplication (which is primary computational effort in Deep Neural Networks) at a considerably faster rate as compared to number of cycles required for computation over John von Neumann architecture based general purpose processor.

 

Owner
RM Danish
Description

John von Neumann architecture of computation is slower than Systolic Array based architecture for computations of deep learning algorithms. This project involves development of open source, 2D systolic array based stationary weight matrix multiplier with MAC (multiply and accumulate ) processing elements and data orchestrating controllers for memory access, which will increase the inference speed of deep learning based applications while reducing the power consumption for edge applications.

Category

acc