The address generator can directly compute array references of the
form
and vector accesses when
both loop variables are nested loops, when one loop has been unrolled,
and more importantly when the inner loop has been modulo-scheduled.
For higher dimensional arrays, the base address is repeatedly recomputed
using an ALU, and the last two dimensions are handled by the address
generator.
Another important access pattern is indirect access of the form
.
This is a common ingredient of neural network evaluation and can be
used to implement bit-reversed addressing for FFT. It is also a generic
access pattern - any complex access pattern can be precomputed and
stored in
and used at runtime to access the data in
.
Vector indirect style accesses may be done by passing an ALU generated
address through the adder in Figure 9.7
thereby offsetting it with the base address of
. The ALU address
can be computed, or it can be streamed into the ALU from SRAM by another
address generator. Using two address generators and an ALU, complicated
access patterns may be realized with high throughput. If the cost
in terms of SRAM and function unit usage becomes too high, the address
generator may be extended for other application specific access patterns.
The stream address generator effectively converts the scratch-pad
memory into a vector register file that can operate over complex access
patterns and even interleave vectors for higher throughput. From an
operational perspective, associating stream address generators with
small scratch-pad memories unifies vector and VLIW architectures.