Stream processors made their commercial debut in 2005 with the Cell
Broadband Engine Architecture (CBEA) from IBM that was developed under
collaboration with Toshiba and Sony [14]. This architecture
is optimized for a range of compute intensive applications varying
from computer games, cryptography, graphics transformations and lighting
to scientific workloads. The Cell consists of a 64-bit Power processor
that serves a similar function to the host processor of Imagine and
eight streaming units names Synergistic Processing Elements (SPE).
Each SPE is capable of 128-bit SIMD operations that may be two 64-bit,
four 32-bit, eight 16-bit or 16 byte-wide operations. An SPE consists
of two pipelines. The even pipeline executes floating point and integer
arithmetic while the odd pipeline handles branches, memory accesses
and permutations. Up to two instructions may be issued in-order per
cycle to a set of seven function units. The bandwidth hierarchy consists
of a 128 word 128-bit LRF in each cluster that is filled from a 256
KB local store (SRF) that is in turn serviced by a globally coherent
DMA engine. Interestingly, the SRF also serves as the instruction
store for an SPE. Like in the case of the RAW processor, each SPE
has its own thread of execution and the system is capable of performing
time, space and space-time multiplexing. The Cell processor was fabricated
in a
CMOS process, has a peak operating frequency of 4 GHz
and achieves a SIMD speedup of 9.9 times on a set of compiled benchmarks.