To report the algorithmic complexity of the various phases of the
visual feature recognizer, an
pixel square image is assumed
in this section. The flesh tone detector applies algebraic transformations
and inequalities on each pixel. Thus its complexity is
.
The face detectors sweep a basic detector across all pixel locations
in the image and then rescale the image, so their complexity grows
as
. This will be compounded by the complexity of
the base detector itself. For the Rowley method using
neurons
of length
, the base detector complexity is
giving an
overall complexity of
. The complexity of Viola/Jones
style base detector with
rectangle features7.1is
, which yields an overall complexity of
.
For each region where a face is likely to be present, EigenFaces performs
operations where
is the number of Eigen vectors
and
is the number of known faces. This complexity is due to the
dot products on vectors of length
done while projecting
the test image and the
dot products on
element vectors done
while finding the most similar known face.
In all cases, the workload scales faster than
as the image
size is increased, so high performance architectures are necessary
for larger images. For increased accuracy both the Rowley and Viola
detectors need a larger number of neurons and features respectively
leading to a linear increase in compute requirements. For EigenFaces,
increasing the discrimination by using a larger number of Eigen vectors
leads to a linear increase in the compute requirements as does increasing
the number of known faces to check against.
Each of the phases is a natural fit for a streaming architecture.
Since the flesh toner works on one pixel at a time, the image may
be streamed pixel or raster line at a time through the processor.
The face detectors work on rectangular regions of an image, thus the
ability to hold a
image window on chip and stream the
neurons or features through the processor is important. For a modest
increase in on chip storage to about 16 KB both the neuron and feature
descriptions can be held within the processor and image windows may
be streamed through the processor. Since both detectors sweep their
image windows row by row and column by column, the ability to hold
30 raster lines on chip will greatly reduce the number of image window
fetch operations. Since they both work on gray scale images, the additional
SRAM required is merely 9.3 KB for
sized images. While
they have very different conceptual backgrounds, the base detectors
of the Viola and Rowley algorithms are remarkably similar and both
involve indirect vector access and dot product operations. The Viola
algorithm uses a short vector length of nine and uses integer multiply
accumulate operations while the Rowley method uses longer vectors
with lengths ranging from 11 to 151 with the sizes 101, 151 and 26
covering 87.8% of all evaluated neurons. Currently, neural net evaluation
involves floating point multiply accumulate operations. Given the
limited range of weights and histogram equalized image pixels, this
could possibly be converted to scaled integer arithmetic.
Similarly, EigenFaces is dominated by floating point dot products which in turn depend on floating point multiply accumulate operations. Each test image needs to be projected to face space based on the stored Eigen vectors. This is a series of dot product operations, and each stored Eigen vector may be simply streamed through the processor while holding the flattened image within the processor. The vector length is equivalent to the number of Eigen vectors. Values of 50 or more are required in practice. Identification can be done by holding the projected test image constant in the processor and streaming the known projected images for computing dot products. Thus it can be seen that on the whole the nature of visual feature algorithms lends themselves to efficient stream processor implementations.