Processor Partitioning: An Experimental Performance Analysis of Parallel Applications on SMP Cluster Systems

X. Wu and V. Taylor (USA)


Performance analysis, processor partitioning, parallel applications, and MPI benchmarks


Currently, clusters of shared memory symmetric multiprocessors (SMPs) are one of the most common parallel computing systems, for which some existing environments have between 8 to 32 processors per node. Examples of such environments include some supercomputers: DataStar p655 (P655 and P655m) and P690 at the San Diego Supercomputing Center, and Seaborg and Bassi at the DOE National Energy Research Scientific Computing Center. In this paper, we quantify the performance gap resulting from using different number of processors per node for application execution (for which we use the term processor partitioning), and conduct detailed performance experiments to identify the major application characteristics that affect processor partitioning. We use the STREAM memory benchmarks and Intel’s MPI benchmarks to explore the performance impact of different application characteristics. The results are then utilized to explain the performance results of processor partitioning using three NAS Parallel Application benchmarks. The experimental results indicate that processor partitioning can have a significant impact on performance of a parallel scientific application as determined by its communication and memory requirements.

Important Links:

Go Back