Based on the percentage of execution time, Amdahl's law predicts a factor of 1.97 speedup if GAU and HMM processing could be entirely overlapped. It is clear that a special-purpose architecture for GAU can have significant speedup, as well as power and scaling benefits. Sphinx was multithreaded to see if there were any practical impediments to achieving good speedup. The parallel version of Sphinx, called PAR, runs each of the FE, GAU OPT and HMM phases on separate processors. In effect, this models an SMP version of Sphinx 3 as well as the case where each processor could be replaced by a special-purpose accelerator. As shown in Figure 5.6, the parallel version achieves a speedup of 1.67 over the original sequential version. A custom accelerator will likely be even better. The HMM phase was further multithreaded to use four processors instead of one, but the resulting five processor version was slower than the two processor version due to high synchronization overhead.