Stream processors made their commercial debut in 2005 with the Cell Broadband Engine Architecture (CBEA) from IBM that was developed under collaboration with Toshiba and Sony . This architecture is optimized for a range of compute intensive applications varying from computer games, cryptography, graphics transformations and lighting to scientific workloads. The Cell consists of a 64-bit Power processor that serves a similar function to the host processor of Imagine and eight streaming units names Synergistic Processing Elements (SPE). Each SPE is capable of 128-bit SIMD operations that may be two 64-bit, four 32-bit, eight 16-bit or 16 byte-wide operations. An SPE consists of two pipelines. The even pipeline executes floating point and integer arithmetic while the odd pipeline handles branches, memory accesses and permutations. Up to two instructions may be issued in-order per cycle to a set of seven function units. The bandwidth hierarchy consists of a 128 word 128-bit LRF in each cluster that is filled from a 256 KB local store (SRF) that is in turn serviced by a globally coherent DMA engine. Interestingly, the SRF also serves as the instruction store for an SPE. Like in the case of the RAW processor, each SPE has its own thread of execution and the system is capable of performing time, space and space-time multiplexing. The Cell processor was fabricated in a CMOS process, has a peak operating frequency of 4 GHz and achieves a SIMD speedup of 9.9 times on a set of compiled benchmarks.