Next: 4.5 Branch Prediction
Up: 4 Defoe: An Example
Previous: 4.3 Instruction Encoding
Contents
A traditional VLIW with fixed width MultiOps has no need to disperse
operations. However, when using a compressed format like that of the
Defoe, there is a need to expand the operations, and insert NOPs for
function units to which no operation is to be issued. To make the
dispersal task easy we make the following assumptions:
- A few bits in the opcode specify the type of function unit (i.e. load/store,
simple arithmetic, complex arithmetic or branch) the operation needs.
- The compiler ensures that the instructions that comprise a MultiOp
are sorted in the same order as the function units in the processor.
This reduces the circuit complexity of the instruction dispersal stage.
For example, if a MultiOp consists of a load, 32-bit divide and a
branch, then the ordering (load, multiply, branch) is legal, but the
ordering (load, branch, multiply) is not legal.
- The compiler ensures that all the operations in the same MultiOp are
independent.
- The compiler ensures that the function units are not over subscribed.
For example, it is legal to have two loads in a MultiOp, but it is
not legal to have three loads.
- It is illegal to not have a stop bit in a sequence of more than 6
instructions.
- Basic blocks are aligned at 32-byte boundaries.
Apart from reducing wastage of memory, another reason to prefer a
compressed format VLIW over an uncompressed one is that the former
provides better I-Cache utilization. To improve performance, we use
a predecode buffer that can hold up to 8 uncompressed MultiOps. The
dispersal network can use a wide interface (say 512 bits) to the I-cache
to uncompress up to 2 MultiOps every cycle and save them in the predecode
buffer. Small loops of up to 8 MultiOps (maximum 48 operations) will
experience repeated hits in the predecode buffer. It may also help
lower the power consumption of a low-power VLIW processor. Defoe supports
in-order issue and out of order completion. Further, all the operations
in a MultiOp are issued simultaneously. If even one operation cannot
be issued, issue of the whole MultiOp stalls.
Next: 4.5 Branch Prediction
Up: 4 Defoe: An Example
Previous: 4.3 Instruction Encoding
Contents
Binu K. Mathew