It's interesting that the Cell Broadband Engine is reviving batch-based computing as one of their preffered development paradigms.
Maybe it makes sense in certain applications - when you have lots of cores, the main problem tends to be inter-cache synchronization techniques. So why not detaching the core internal memory from the DMA operations? It is a simple idea.
On the other side, I was surprised to see how this simple architectural change could be that intrusive on regular development practices. You have all these new challenges on explicitly splitting your code & data into various pieces to squeeze maximal benefits from the hardware.
I feel that we should all watch the Cell processor as an early example of how easy is to live in a multi-core world. Specifically, how would things look ten years from now, when you have 32 cores per chip with lots of caches at various levels?