Analytical and Experimental Evaluation of SMP Superscalar

Student:Jan Ciesko
Title:Analytical and Experimental Evaluation of SMP Superscalar
Type:diploma thesis
Advisors:Veldema, R.; Philippsen, M.
State:submitted on November 2, 2010

Parallel processing reached mainstream computing. This technological advancement represents a natural approach to handle large workloads and turns out to be the right solution to overcome performance limitations of single core processors. These limitations are often references as ILP wall, power wall and memory wall. While performance and efficiency benefits are obvious, parallel programming introduces additional challenges to platform vendors and application developers, mainly optimizing for performance, ensuring correctness of concurrent code and increased development complexity. Surmounting these challenges requires strenuous development efforts. This creates opportunities for new libraries, programming models and tool support that will likely contribute to a competitive edge in future platform and tool releases for many vendors and will significantly contribute to developer efficiency and user experience. Often practicability yet needs be evaluated.
SMPSs (SMP Superscalar), discussed in this work, is a promising high-level, task based programming model for parallel application programming originating from the Barcelona Supercomputing Center. It contains a source-to-source compiler and runtime libraries. Similar to OpenMP, SMPSs allows tasks declarations but additionally enforces a definition of input and output variables. Knowing the intention of the developer and inter-tasks dependencies, automatic concurrent execution of independent tasks can be scheduled by the SMPSs task scheduler. Scheduling and execution efficiency is further increased through data renaming, locality awareness and workload distribution policies. A disciplined access to shared state (process global variables) is not enforced for performance reasons. Synchronization with non-task code is supported through barriers. For performance measuring, SMPSs offers a tracing facility to monitor events, timing and task activity.
Although SMPSs is likely to increase development efficiency, it comes at a cost of overhead impacting performance that needs to be assessed. Also it is unknown to what extent SMPSs or even task based programming models in general are suitable to handle applications demanding different parallelization strategies. Therefore performance benchmarking the NAS Parallel Benchmark is used in this work to derive further insight into:

  • SMPSs performance and scalability
  • SMPSs and task based approach suitability for parallelization strategies
  • SMPSs bottlenecks and short comings
  • Next generation SMPSs

The NAS Parallel Benchmark (NPB) suite, used in this work, has been developed at NASA Ames Research Center to study performance on parallel supercomputers. Currently it consists of 11 kernels, including multi grid (MG), conjugate gradient (CG), Gauss-Seidel (BT, SP, LU) and FFT (FT) solvers, embarrassingly parallel (EP) Marsaglia polar method and a set of composed benchmarks (UA, DC and DT). Together they mimic the computation and data movement characteristics of large scale computational fluid dynamics applications. They exhibit mostly fine-grain exploitable parallelism and are almost all iterative, requiring multiple data exchanges between threads within each iteration.

This thesis is structured as follows:

1. Implementation

  • Motivation: Need for high-lever parallel programing models and suitable benchmarks
  • SMP Superscalar: description and characteristics
  • NAS Parallel Benchmark: description and specification of kernel subset representing the object of study
  • Implementation strategy 2. Benchmarking
  • Performance, scalability and behavior profiling 3. Optimization
  • Performance tuning
  • SMPSs shortcomings and solutions
  • Solution implementation proposition and testing
watermark seal