Moore’s law is coming to an end. At least, that is what a large proportion of hardware designers believe. However, the great open challenges of science and engineering require ever more powerful processors. To meet these needs, tech giants like Google or Nvidia are turning to the development of ASICs (Application Specific Integrated Circuits). Unlike ordinary processors, these chips can perform only a reduced set of operations but at unparalleled speed.
However, designing these chips is very complex and requires the collaboration of several teams iterating on their design for months. For this reason, a team at Google Brain has explored the use of artificial intelligence techniques for the design of its fifth generation of AI processors (TPU-v5). Using reinforcement learning techniques, they have reduced the duration of one of the design stages from months to just 6 hours.
Chip design process (simplified):
Chip design goes through several stages. First, the specifications that the chip must meet are obtained from the stakeholders. Then, from these requirements, a logic design is created, detailing the chip’s functioning abstractly. After that, we must verify that the design works properly. Later, from this logical design, the physical design is created, with the components that will integrate the chip.
This physical design process also goes through several stages. First, the logic design of the chip must be translated into a Netlist. The Netlist is a description of the circuit components and their connections to other components.
Then, the large blocks, called macro blocks, have to be placed on the chip’s canvas, so that a floorplan is generated. After that, the rest of the standard cells are placed on the chip in the spaces left by the macro blocks.
As you can imagine, there are a huge number of possible ways to place the components (approx. 10²⁵⁰⁰ possible placements), and, obviously, some are better than others. It is precisely this very complex problem that Google Brain researchers have managed to solve in just 6 hours using reinforcement learning techniques.
Macro block placement
The goal of this design phase is to find a way to place the components that minimizes the length and congestion of their wiring as well as the density in their placement. To do this, we can define the following target metric to maximize:
To facilitate the solution of this complex problem, the researchers divided it into two parts:
Part 1: Supervised learning.
First, they focused on identifying the relationship that exists between the placement of the parts and the target metric, using supervised learning.
To do this, they took a set of different placements for which they already knew their corresponding value and trained a neural network to estimate it.
The network receives information about the chip (the Netlist) and the placement of the parts and generates an estimate. By comparing these estimates with the actual values, the neural network adjusts its connections to make better and better estimates. Once trained, it is able to estimate these values for chips it has never seen before.
Since the Netlist is a graph (its vertices are the components and its edges are their connections), the researchers developed a neural network capable of processing these data structures, which they named Edge-GCN (Edge Graph Convolutional Network). This neural network stores an internal representation of the vertices and edges of the Netlist, based on which it generates the estimate of its target value.
Part 2: Reinforcement Learning
Next, they focused on the second part of the problem. In it, a new neural network will take as input the Netlist, the part to be placed, and information about the state of the chip and it will generate as output the placement for that part.
This task can be cast as a reinforcement learning problem, in which an agent (the neural network), observes the state of the task St (the Netlist and the chip) and based on it takes an action At (place the piece) that generates a certain reward Rt.
In this way, the neural network will place all the pieces one by one. When it is finished, it will receive as a reward the target value we saw above, based on the length of the wiring, congestion, and density.
Neural Network architecture
The previous neural network (Graph conv. in the diagram) will be attached to a new one. Thanks to that, the new network will not have to learn to place the parts from scratch but will use the representations learned in the previous step to explore the most promising part placements.
The new neural network has two parts (value network and policy network) because it learns based on an algorithm of the actor-critic family (Proximal Policy Optimization in particular). These algorithms simultaneously learn to make decisions and to evaluate their quality, using this evaluation as a guide to find the best actions.
The policy network will choose where to place the pieces, while the value network will evaluate the quality of the action taken. Each time the agent performs an action, we will compare the results obtained by that action with the evaluation made by the value network. If the results are better than expected, the policy network will perform the action more often. Otherwise, it will carry it out less often.
This cycle of trial and error is carried out millions of times until by experimenting and observing the consequences, the actor learns to place the pieces in the right location.
The result: a neural network capable of placing the parts of a chip in 6 hours, while a team of skilled engineers would take weeks or months.