To understand how architectural strategies can provide high performance
for perception applications at low power levels, it is necessary to
look at the CMOS circuit dynamic power consumption equation:

(3.1) 
P is the power consumed, A is the activity factor, i.e., the fraction
of the circuit that is switching, C is the switched capacitance, V
is the supply voltage, and F is the clock frequency [109].
If a capacitance of C is charged and discharged by a clock signal
of frequency F and peak voltage V, then the charge moved per cycle
is and the charge moved per second is . Since the charge
packet is delivered at voltage V, the energy dissipated per cycle,
or the power, is . The data power for a clocked flipflop,
which can toggle at most once per cycle, will be
.
When capacitances are clock gated or when flipflops do not toggle
every cycle, their power consumption will be lower. Hence, a constant
called the activity factor () is used to model the average
switching activity in the circuit. Equation 3.1
is derived by incorporating this term into the power consumption.
Custom ASICs can drastically reduce the power consumption by using
specialized circuit structures and concurrency to lower and
respectively. The drawback is that custom ASICs are inflexible and
once fabricated, they cannot be reprogrammed. Also, their high production
costs and long design times often make them an unattractive choice.
While programmable perception processors are more desirable than ASICs,
ASICs still represent the ``gold standard'' against which perception
processors should be compared. This is because the specialized nature
of an ASIC gives it significant power, performance and die area advantages
when compared to a general purpose processor. So they represent the
best possible implementation of a particular algorithm for a given
CMOS technology.
Assume that an application is required to perform operations
every seconds to keep up with real time. Then it should be the
case that:
refers to the average number of instructions issued per
second across the whole application. Further, when
,
the processor has too much performance, i.e., its frequency is too
high and it wastes power. When handling constant rate realtime workloads,
it is not useful to finish the work early and power down the circuit
till the next realtime deadline. The overhead of reloading state
holding data memories and the instruction memory may be in the range
of several thousand cycles. It is better to slow down the processor
to have just enough performance to meet realtime deadlines rather
than paying the reload penalty tens or hundreds of times per second
depending on the nature of the constant rate workload. Thus the ideal
frequency of operation is:

(3.2) 
Substituting this back in the power equation we get:

(3.3) 
Binu Mathew