A Cell Generator for Static CMOS Circuits

Abstract

Logic functions can be implemented by means of circuits consisting of basic components like NAND or NOR gates or they may be implemented as a single functional cell. Currently the former approach is more prevalent since the latter cannot easily be done manually for complex designs. This approach leads to a lot of inefficiency that can be avoided if a program that can generate optimized layouts for functional cells directly from a netlist representation is used. This paper discusses one such program that generates symbolic layouts. The approach though loosely based on methods proposed by Maziasz, Hayes, Uehara and van Cleemput uses some newly improvised algorithms.

Introduction

It is usually possible to implement a complex logic function more efficiently using a single functional cell rather than implementing it with basic cells like NAND and NOR gates. A single functional cell implementation will occupy lesser area and provide smaller propagation delay. This paper describes a program that automatically generates such an optimized implementation from boolean equations describing the circuit.

The program generates a static CMOS circuit with a series parallel style NMOS stack and the PMOS stack is constrained to be the exact dual of the NMOS stack. The height of the cell is assumed to be fixed and only the width is varied. When the source or drain of two adjacent transistors may be connected by abutting them and making the diffusion area continuous or by leaving them apart and connecting them with a metal layer. Either way when two transistors are not abutted against each other there should be a diffusion gap between them. This gap is usually as wide as a transistor if not more. To minimize area we need to use as few diffusion gaps as possible. So we need to be able to generate layouts where as many transistors abutt each other as possible. It is possible to implement any static CMOS circuit as a single row of P transistors and a single row of N transistors below the P transistors with the gates of vertically aligned PMOS and NMOS transistors sharing a common input. In fact a single vertical poly line is used to implement the gates of both the transistors. This is the implementation style used by the cell generator described here. Given such an organization the area of the circuit grows linearly with the number of inputs and diffusion gaps. To minimize the area the cell generator needs to reduce the number of gaps by finding continuous stretches of transistors that can be abutted.

Fig 1. Static CMOS implementation style
                                            |  Vari    |
                                            |->able<---|
    ___        ___        ___               |   ___    |       ___
   |###|      |###|      |###|                 |###|          |###|
+---###----+---###----+---###----+       -- +---###------------###----+
|  |###|   |  |###|   |  |###|   |        | |  |###|   |   |  |###|   |
|  |###|   |  |###|   |  |###|   | PMOS   | |  |###|   |   |  |###|   |
|  |###|   |  |###|   |  |###|   | STACK  + |  |###|   |...|  |###|   |
|  |###|   |  |###|   |  |###|   |        F |  |###|   |   |  |###|   |
|  |###|   |  |###|   |  |###|   |   ...  i |  |###|   |...|  |###|   |
|  |###|   |  |###|   |  |###|   |        x |  |###|   |   |  |###|   |
|  |###|   |  |###|   |  |###|   |        e |  |###|   |   |  |###|   |
|  |###|   |  |###|   |  |###|   |        d |  |###|   |   |  |###|   |
|  |###|   |  |###|   |  |###|   |        ^ |  |###|   |   |  |###|   |
|  |###|   |  |###|   |  |###|   |        | |  |###|   |   |  |###|   |
+---###----+--|###|---+---###----+       -- +----##-------------##---+-
   |###|      |###|      |###|                 |###|          |###|
   |###|      |###|      |###|                 |###|          |###|
   |###|      |###|      |###|                 |###|          |###|
+---###----+---###----+---###----+          +---###------------###----+
|  |###|   |  |###|   |  |###|   |          |  |###|   |   |  |###|   |
|  |###|   |  |###|   |  |###|   |  NMOS    |  |###|   |   |  |###|   |
|  |###|   |  |###|   |  |###|   |  STACK   |  |###|   |   |  |###|   |
|  |###|   |  |###|   |  |###|   |          |  |###|   |...|  |###|   |
|  |###|   |  |###|   |  |###|   |   ...    |  |###|   |   |  |###|   |
|  |###|   |  |###|   |  |###|   |          |  |###|   |   |  |###|   |
|  |###|   |  |###|   |  |###|   |          |  |###|   |...|  |###|   |
|  |###|   |  |###|   |  |###|   |          |  |###|   |   |  |###|   |
|  |###|   |  |###|   |  |###|   |          |  |###|   |   |  |###|   |
|  |###|   |  |###|   |  |###|   |          |  |###|   |   |  |###|   |
+---###----+--|###|---+---###----+          +---###------------###----+
   |###|      |###|      |###|                 |###|          |###|
   -----      -----      -----                 -----          -----
   Poly       Poly       Poly                  Poly           Poly

|-----------> Run <--------------|-->GAP<---|-------> Run <------------|

In addition, the requirement that vertically aligned PMOS and NMOS transistors share a common input means that such a continuous stretch of transistors should describe a path through the graph of the dual circuits that have the same sequence of labels on the edges. Such a path is called a dual Eulerian path and the next section provides a brief introduction to the graph theoretic background required to understand how the cell generator works.

Theoretical Background

Two Terminal Graphs

A CMOS circuit may be represented as a two terminal series parallel multi-graph or TTSPMG. A two terminal graph may be described recursively. Any edge is a TTSPMG representing a single transistor. The label on the edge is the input at the gate of the transistor and the vertices of the edge may be considered the source and drain regions of the transistor.

eg:
    0--------0
        A

Fig. 2
If A and B are TTSPMGs then A * B and A + B are also TTSPMGs where '*' is the series operator and '+' is the parallel operator.

eg:

A * B:   0-------0-------0
             A       B

A + B:    -------
         /   A   \
        /         \
       0           0
        \    B    /
         \-------/

Fig. 3
The only possible TTSPMGs are ones produced by repeated application of these rules. The NMOS and PMOS parts of a circuit may considered as independent TTSPMGs. Given the a boolean and or equation for compliment of a function the two terminal representation of the NMOS circuit for the function may be constructed by replacing each boolean and by the series operator '*' and every boolean or by the parallel operator '+' and constructing the graph described by the resulting equation. Similarly the dual PMOS circuit may be constructed by replacing each boolean and by the parallel operator '+' and every boolean or by the series operator '*' and constructing the graph described by the resulting equation. Normally we keep only a single representation of the TTSPMG equation and generate both the NMOS and PMOS circuits from it. For this purpose we define the combined series/parallel operator */+ such that A */B is equivalent to doing A*B to the NMOS half and A+B to the PMOS half. Similarly the parallel/series operator +/* when applied to A and B does A+B to the NMOS circuit and A*B to the PMOS half. Internally the cell generator represents boolean functions as the parse tree of the TTSPMG equation describing their static CMOS implementation.

Eulerian Path

An Eulerian path through a graph is one which visits each edge of the graph once and only once. No edge is left unvisited or revisited. It can be proved that any graph that has at most two vertices of odd degree has an Eulerian path. The relevance of an Eulerian path to the cell generation problem is that maximal run of abutting transistors for any one of the stacks is an Eulerian path of its TTSPMG. However it may not always be the case that a given graph has an Eulerian path. If it does not have an Eulerian path we may need to split the graph into multiple sub-graphs each of which has an Eulerian path. It is also possible that multiple Eulerian paths exist for a graph.

Our implementation style requires the same input to be fed to the gates of vertically aligned PMOS and NMOS transistors. For this purpose we need to find an Eulerian path which is identical for both the stacks. Such a path is called a dual Eulerian path. If a single dual Eulerian path does not exist we need to find dual Eulerian paths for sub-graphs of both the stacks. In the course of subdividing the graph we are free to reorder them without changing the functionality of the circuit. It is easy to see that the problem is not easy to implement efficiently since there is the potential for a combinatorial explosion in the number of possibilities.

Solutions from Previous Work

A heuristic method of making a dual Eulerian path was described by Uehara and van Cleemput. As mentioned previously every graph with at most two vertices of odd degree has an Eulerian path. The Uehara-van Cleemput method generates a Eulerian path for every circuit graph by converting it to a graph with no vertices with odd degree. This is done by searching the logic circuit graph for gates with an even number of inputs and converting their number of inputs to an odd number by adding a pseudo input. After this transformation the circuit graph will always have an Eulerian path. The places occupied by pseudo edges in the Eulerian path corresponds to diffusion gaps. Thus the actual output is not a single run of abutting transistors, but a series of runs separated by diffusion gaps that occur in the run in positions in the Eulerian path where pseudo edges occur. As described in [1] the Uehara-vanCleemput method generates the TTSPMG for the series parallel representation and uses the same transistor ordering for the dual parallel series circuit. The same order will work for both the NMOS and PMOS circuits, but parallel and series pseudo edges may be optimized away.

Maziasz and Hayes demonstrate using an exhaustive search of all possible circuits of reasonable size that there are only 42 basic types of TTSPMG representations of static circuits and that the Uehara-vanCleemput method optimally handles only a few of the 42 cases. However their treatment of the subject is not detailed enough (and clearly written) to base an actual implementation on. Most of the available literature refers to only the Uehara-vanCleemput method. The cell generator program described here borrows the concept of the TTSPMG parse tree (called Composition tree in the original paper) from Maziasz and Hayes and applies the osuedo-edge technique to the parse tree to achieve very fast layout generation.

The Cell Generator Program

Input Representation and Restrictions

The input to the cell generator is a set of minimized boolean equations in postfix form (RPN - Reverse Polish notation). Postfix was used because of the ease manipulating equations in that form without writing a full fledged parser for boolean expressions. The input language is described below

Boolean form      Postfix form
A and B           A B *
A or  B           A B +
( A )             A
Any boolean equation may be transformed into its equivalent RPN form by repeated application of the above rules. Note that RPN does not need parenthesis for expressing the precedence.

eg:

Boolean           Postfix
(A + B)* C        A B + C *
(A + B)* (C + D)  A B + C D + *
(A + B)+C         A B + C +

The cell generator will transform each equation into a functional cell. Equations may depend on each other.

The input file consists of a set of equations specified in the format

var_name: postfix_equation ;
or
!var_name: postfix_equation ;

eg:
z1: a b * z2 +;
!z2: p q * r *;

Where var_name appears on the left hand side of only one equation while it may appear anywhere in the body of any other equation. It is possible to refer to an variable name before it is defined as in the case of the variable z2 in the previous example. All constructs are terminated by a semi-colon (*not end of line*). The !var_name form indicates that the equation given is actually the compliment of the function to be implemented. This form is more efficient since it can save an inverter. There are two additional constructs named '.input' and '.output' to specify the inputs and observable outputs respectively. The keywords VDD, VSS, 1 and 0 are predefined with their conventional meanings.

Assumptions

It is assumed that the transformation from a large logic system to a set of equations was done by a technology mapping program that knows what kind and size of equations lead to good implementations in static CMOS. The cell generator does not guarantee that it will implement each equation according to its literal specification. It retains the freedom to rearrange each equation for optimal layout structure of the circuit it generates from each equation. Each equation is implemented as a functional cell and the generator will not merge equations or split them. Technology dependent decisions are not currently made in the cell generator. It is assumed that the technology mapping program takes care of issues like limiting the maximum number of transistors in series in a stack when it splits the circuit into equations. The generator makes only a symbolic layout. Actual transistor sizing and layout are done by a post processor. The post processor also does the job of routing the metal and poly. The cell generator symbolically indicates their interconnections.

Operation

The program operates in 3 phases.

Parsing
The cell generator reads boolean equations in RPN form from an input file and uses a simple stack machine to convert it into a parse tree of the corresponding TTSPMG. The combined operators */+ and +/* are used in a manner similar to the use of composition trees in [2] to keep a single copy of the tree from which both the NMOS and CMOS circuits may be derived. It also builds a symbol table that uses two level indirection (from the parse tree) to handle forward references without using a second pass over the input.

Gate transformation and Pseudo edge addition
The parse tree is then analyzed to combine smaller gates into larger ones. For example if the original equation corresponds to (a*b)*(c+d)*e it will be transformed to AND(a,b,e,OR(c,d)) i.e., the original two input AND gates are transformed to a single 4 input AND gate since this leads to a better implementation in CMOS.

During the same search pseudo inputs are added to gates that have an even number of inputs since this helps in generating the dual Eulerian path as described in a previous section. To minimize the number of gaps the position of the pseudo edges has to be carefully adjusted. A good heuristic is to add the pseudo edge on the side of the gate that is closest to the edge of the graph in a planar representation of the logic circuit.

                         +/*
                          |
                          |
      ----------------------------------------
      |               |                      |
      |               |                      |
      |               |                      |
     */+             */+                    */+
      |               |                      |
      /\              /\        .....        /\
     /  \    ...     /  \                   /  \
    /    \          /    \                 /    \
   / ...  \        /      \               /      \
  /        \      /        \             /        \
   Logic
   Cone          <===== Pseudo ===    == Pseudo ====>
                        Edges            Edges
                        Oriented         Oriented
                        towards          towards
                        nearest          nearest
                        edge of          edge of
                        graph            graph


Fig. 4

Depending on the success of this heuristic the pseudo-transistors will not interlace much with the real ones and will turn up at either the top end of the circuit (near VDD) or the bottom end (near VSS) and may be optimized out leading to lesser number of diffusion gaps.

Bottom-up Cell Generation
In this phase the actual symbolic layout is generated. A recursive in-order traversal of the parse tree is done and the actual transistor allocation is done bottom-up as the deepest level of recursive calls return. First, transistors are allocated both in the NMOS and PMOS stack for each literal in the equation. They are ordered left to right in the same order as the leaf nodes of the graph would occur in an in-order traversal of the graph. Runs of transistors are identified and transistors in the same run are connected by abutting their source and drain. If a pseudo-edge is encountered a diffusion gap is created. Each recursive call returns the two terminal sub-graph it mapped to actual transistors and the terminals of the sub-circuit are used higher up in the recursion tree as if they are the terminals of a single transistor. Interconnections between the sub-circuits (on separate runs) are indicated by metal connections. Runs of transistors that correspond to series or parallel sets of transistors are wired up in metal. For the series sets of transistors nothing much need be done since their abutment already places them in series electrically. In case there are gaps in between, the transistors separated by gaps are placed in series by metal interconnections. The wiring for parallel transistors is more complex.


   +--------------------+---------------------+---- Terminal-1
   |                    |                     |
------+   +------+   +------+   +------+   +-------
      |   |  |   |   |      |   |  |   |   |
      +---+  |   +---+      +---+  |   +---+
      -----  |   -----      -----  |   -----
        0    |     0          0    |     0
        |    |     |          |    |     |
        A    |     B          C    |     D
             +---------------------+--------------- Terminal-2


Fig: 5a Parallel connection of even number of transistors




                 +--------------------+
                 |                    |
    Terminal-1------+   +------+   +------+   +------Terminal-2
                    |   |  |   |   |      |   |  |
                    +---+  |   +---+      +---+  |
                    -----  |   -----      -----  |
                      0    |     0          0    |
                      |    |     |          |    |
                      A    |     B          C    |
                           +---------------------+


Fig: 5b Parallel connection of an odd number of transistors

Circuit generation is slightly complicated by the fact that the even numbered stack in 5b has a terminal at either end and may be easily incorporated into the stack by abutting while the odd numbered one in 5a may be abutted only at one end and the other end needs to be protected to avoid a short circuit through the metal of terminal-1. Since this stack corresponds to one of the original even numbered gates (pseudo transistor not shown in fig 5a) there will be a gap on the side of the stack that was closest to the edge of the graph (ref Fig 4). For this configuration, care is taken during circuit generation to abut terminal-1 on the opposite side of the gap and use a metal interconnection to terminal 2 over the gap.

During the circuit generation phase all other sub-circuits that the primary output depends on are also recursively generated and each circuit is placed in its own functional cell. No functional cell is duplicated. Interconnection between the sub-circuits should be handled by the post-processor program that routes the metal. After generating the functional cell inverters are placed to correct the polarity of the output if the original equation represented the function. If the equation was given in the !var_name form (i.e. the compliment of the required function was given) then the inverter is not required.

Original Work

The method of generating the Euler path and circuit directly from the parse tree was newly improvised for this project. I haven't seen it in any of the references.

Output Format

The output of the program specifies the symbolic layout in a program specific format. With little effort th cell generator can be converted to generate a CIF style output file and also a Spice file suitable for analysis of circuit parameters. The output file is hierarchically structured with sub-circuits embedded within the layout of the top level circuit. The major constructs in the output file are:

.ntran src gate drain
.ptran src gate drain
Instantiates a transistor (n or p type) and assigns names for the source and the drain terminals and specifies the name of the variable applied to the gate of the transistor.

.metal t1 t2 t3 t4 t5 ....
Specifies that a post-processor should route a metal layer to interconnect points t1, t2 etc.

.abutt t1 t2
Specifies that terminals t1 and t2 are connected by abutting. Both t1 and t2 are source or drain terminals that would have been instantiated in a previous .ntran or .ptran section.

.alias varname t
Binds the variable named var_name to the terminal named t.

In addition the output file contains comments to help a human understand the output. The transistor order is shown in comments and the boundaries of functional cells and inverters are marked with comments.

Related Work

The cell generator described in this paper develops on ideas proposed by Uehara and van Cleemput, Maziasz and Hayes. Both groups develop ideas based on a graph theoretical perspective. Maihot, Michelli have proposed symbolic methods for generating optimal layouts. Wing et al have formulated methods for optimal generation of gate-matrix layout. Much work has been done in the area of optimizing PLA layouts too.

Further Directions

Cell generation though efficient is not often used because of the unpredictability it introduces into circuit parameters. As an example designers might prefer using a slow and inefficient standard cell based circuit whose timing characteristics they can predict over a generated layout whose propagation delay differs from the rest of a large circuit. Research needs to progress towards generating layouts that satisfy constraints on delay, noise margins etc. Another disadvantage of the static CMOS implementation described here is that under some conditions it needs more head-room to route the metal interconnect leading to an increase in the cell height. It needs to be seen if better results can be obtained by analyzing the interconnect requirements while generating the Euler path.

Conclusion

A cell generator that generates a symbolic layout from boolean equations was implemented. Since the publication of Maziasz and Hayes that describes an optimal cell generation technique lacked the clarity required to base an implementation upon a new technique that uses some of the features of both the optimal algorithm and the Uehara-van Cleemput heuristic algorithm was developed and implemented. The program has been tested on several small circuits and generates moderately good layouts.

References

[1] Sadiq M. Sait and Habib Youssef, "VLSI Physical Design Automation - Theory and Practice"
[2] Robert L. Maziasz and John P. Hayes, "Layout Optimization of Static CMOS Functional Cells"
Author:
My email address as an image - I hate spammers and their bots!

The Cgen program is available under GPL. You can download it here.