10.4.5 Energy Delay Product
Though CMOS circuits often have the ability to trade energy for performance, it is quite difficult to improve both energy and performance simultaneously. Gonzalez and Horowitz argue that the process normalized energy delay product (EDP) or alternately, , which corresponds to the inverse of EDP, is a relatively implementation neutral metric . They demonstrate that this metric causes the architectural improvements that contribute the most to both performance and energy efficiency to stand out. For example, their results demonstrate that pipelining is of fundamental importance to processor performance and energy efficiency, but super scalar issue is a lesser contribution. Figure 10.5 shows the process normalized energy delay product (EDP) of the four different designs. It may be seen that in spite of their radically different architectures, the XScale's EDP is within 31.4% of the EDP of the Pentium if we ignore the outliers FFT and Fleshtone. The FFT result is different because the XScale uses a simple radix-2 algorithm instead of the optimized FFTW library used on the Pentium. The Fleshtone result underlies the fact that for this floating point benchmark, the XScale is modeled as an ideal implementation. The floating point version of this algorithm has a performance problem on the Pentium as explained in Section 10.4.3.
It is evident from Figure 10.5 that the perception processor has a radically better EDP, which is often one or two orders of magnitude better than its competition. It is particularly noteworthy that in the case of FFT where the perception processor achieved only 64% of the throughput of the Pentium, it improves EDP by a factor of 24.5. This may be largely attributed to the higher energy efficiency of the perception processor. The perception processor on average improves on the EDP of the XScale by a factor of 159 and is only 12 times worse than the ASIC. The perception processor is thus able to bridge the wide gap in EDP between CPUs and ASICs.