Segmentation is the process of clumping together individual pixels into regions where an object might be found. A common approach is to do a connected component analysis, which typically forms irregular regions. Since the Viola and Rowley algorithms used for face detection need rectangular regions, instead of connected component analysis, a simple algorithm to cut apart the flesh tone bit mask into rectangles was used instead [103,83].
Two operators from mathematical morphology are applied to the bit
mask: a
erosion operator followed by a
dilation
operator. This has the effect of cutting away small connections and
regions that are likely to be false positives and then smoothing the
bit mask by filling in any small holes in the middle of an otherwise
acceptable sized region. A logical OR of all the rows in the image
is then performed to make a single row. This step is called vertical
separation. Runs of ``1'' values in the single row represent vertical
stripes of the image that contain objects of interest. Runs of ``0''
values represent vertical stripes that may be discarded. For each
vertical stripe, the columns are logically OR-ed to create a single
column. This is called horizontal separation. Runs of ``1'' represent
the region of interest. This algorithm can be recursively applied
to isolate the rectangular regions of interest. In the actual implementation,
the horizontal separation steps for all the vertical stripes are done
together in an interleaved manner. This has the effect of converting
the column walk across the bitmap into a row walk giving better cache
performance. Recursion is stopped after two levels since this has
empirically provided adequate results. The flesh tone bitmap is discarded
at this stage. The output of this stage is a list of coordinates of
the top left and bottom right corners of rectangular regions of interest
and a gray scale version of the image.