10.4.1 Instruction Level Parallelism
Figure 10.1 shows the IPC of the perception processor compared against the IPC measured using native performance counters on an SGI R14K processor. The benchmarks were compiled for the R14K using the highly optimizing SGI MIPSpro compiler suite. The perception processor achieved a mean improvement in IPC of 3.3 times over the sophisticated super-scalar out of order processor. Figure 10.1 also shows the breakdown of IPC between execution units and the memory system. It may be seen that a large fraction of the IPC improvement may be directly attributed to the memory system, which can transfer data at a high rate into and out of the function units. This leads to high function unit utilization and high IPC. Since each load/store instruction triggers an address calculation operation, the two are counted as separate instructions. Though an address calculation is counted as a single instruction it should be understood that it does the equivalent of several shift, mask, and add operations on a regular processor as explained in Section 9.6.2. The results clearly demonstrate that the design goal of high throughput through ILP has been achieved.