![]() This model, coupled with the analysis procedure, simplifies algorithm design on the Cell and enables quick identification of potential implemen- tation bottlenecks. To estimate the execution time of the algorithm, we consider the computational complexity, memory access patterns (DMA transfer sizes and latency), and the com- plexity of branching instructions. ![]() We present a complexity model for designing algorithms on the Cell pro- cessor, along with a systematic procedure for algorithm analysis. The Sony-Toshiba-IBM Cell Broadband Engine is a heterogeneous multicore architecture that consists of a traditional microprocessor (PPE), with eight SIMD co- processing units (SPEs) integrated on-chip. We compare the models for SMPs and the MTA and discuss how the difference affects algorithm development, ease of programming, performance, and scalability. We present a performance model for each machine, and use it to analyze the performance of the two algorithms. Previous studies show that for SMPs perfor-mance is primarily a function of non-contiguous memory accesses, whereas for the MTA, it is primarily a function of the number of concurrent operations. In this paper, we consider the performance and scalability of two important combinato-rial algorithms, list ranking and connected components, on two types of shared-memory com-puters: symmetric multiprocessors (SMP) such as the Sun Enterprise servers and multithreaded architectures (MTA) such as the Cray MTA-2. Few parallel graph algorithms on distributed-or shared-memory machines can outperform their best sequential implementation due to long memory latencies and high synchronization costs. ![]() Irregular problems such as those from graph theory pose serious challenges for parallel machines due to non-contiguous accesses to global data structures with low degrees of local-ity.
0 Comments
Leave a Reply. |