Sep 24, 2005
10 am - 1 pm
San Francisco, CA

Register for Tutorial


IBM Tutorial: Hardware and Software Architectures for the CELL processor
Presented by:
Peter Hofstee and Michael Day

Held in conjunction with the 2005 International Conference on Compilers, Architectures, and Synthesis for Embedded Systems (CASES 2005).



  • Cell goals and rationale
  • Broadband Processor Architecture
  • Cell processor overview, speeds and feeds
  • Power core microarchitecture
  • Synergistic processor microarchitecture
  • System architecture and real-time aspects
  • Programming models
  • Prototype software stack
  • Cell applications

The Cell processor is a first instance of a new family of processors intended for the broadband era. The processors will find early use in game systems (PlayStation3(TM)), a variety of other consumer electronics applications, a wide variety of embedded applications, and various forms of computational accelerators. Cell is a non- homogeneous multi-core processor, with one core (two threads) dedicated to the operating system and other control functions, and eight synergistic processors optimized for compute-intensive applications. Cell addresses two of the main limiters to microprocessor performance: increased memory latency, and performance limitations induced by system power limits. Memory latency is addressed by introducing another software managed level of private "local" memory, in-between the private registers and shared system memory. Data is transferred between this local memory and shared memory with asynchronous coherent DMA commands, and synergistic processor load and store commands access the local store only. This organization of memory makes it possible for the Cell processor to have over 100 memory transactions in flight at the same time, more than enough to cover memory latency. Power limitations are addressed by two main mechanisms; a non-homogeneous multi-core organization, and an ultra high-frequency design that allows the chip to be operated at 3.2GHz at low voltage. Cell achieves between one and two orders of magnitude of performance advantage over conventional single- core processors on compute-intensive (32-bit) applications, but does so at the expense of a new programming model at the hardware level.