Natural human interfaces built on technologies like speech recognition, gesture recognition, object detection and tracking are central to the widespread acceptance of future embedded systems. The chances for today's isolated embedded devices to develop into tomorrow's ubiquitous computing environment also depends on services like secure wireless networking, media processing and integration with visual and audio interfaces. The levels of performance and power efficiency required to achieve these goals are orders of magnitude beyond the ability of current embedded processors. Application specific processor architectures can effectively solve some of these challenges.
The performance characteristics of a face recognition system based on well-known algorithms and a leading research speech recognition system were analyzed. By recasting these perception algorithms as well as DSP and encryption algorithms on to an architecture optimized for stream processing, high levels of ILP and energy efficiency were demonstrated. The perception processor uses a combination of VLIW execution clusters, compiler directed dataflow and clock gating, hardware support for modulo scheduling and special purpose address generators to achieve high performance at low power for perception algorithms. Operationally, the combination of stream address generators and scratch-pad memories represent a unification of VLIW and vector styles of execution. The perception processor is a fairly minimal, yet programmable hardware substrate that can mimic the dataflow found in ASICs. It outperforms the throughput of a Pentium 4 by 1.75 times with an energy delay product that is 159 times better than an XScale embedded processor. Its energy delay product is just 12 times worse than that of an ASIC implementation. This approach has a number of advantages:
It has been shown that fine-grained management of communication and storage resources can improve performance and reduce energy consumption whereas simultaneously improving on both these axes using a traditional microprocessor approach has been problematic. The perception processor is an attractive choice when performance, power efficiency, programmability and rapid design cycles are important. For the first time, sophisticated real-time perception applications appear to be possible within an energy budget that is commensurate with the embedded space.