Parallel code analysis on a GPU

Director:Philippsen, M.
Beginning:July 1, 2013
Coworker:Blaß, T.

In compiler construction there are analyses that propagate information along the edges of a graph and modify it, until a fix point is reached and the information no longer changes. In this project we build the ParCAn framework to accelerate such analyses by exploiting the massive parallelism of graphic cards.

In 2018 we completed our comparative study on the efficiency of graph data structures on GPUs. We analyzed the combination of 11 graph data structures, 10 static graph algorithms, 19 input graphs, and six graph exchange formats. The measurements were performed on four NVIDA GPUs with different architectures. We analyzed 752k data points for the study. We derived a decision tree that allows developer to choose the best possible graph data structure for their algorithm. We showed that our decision tree can speed up (static) graph algorithms by up to 45%. Results from this study have influenced the ParCAn framework.

In 2018 we also added more optimizations to our analyzing framework.

  • Not all instructions are relevant to an analysis. Irrelevant ones do not contribute to the analysis but only increase the runtime as analysis knowledge (stored in predicates) has to be propagated through those instructions, without any modifications. We implemented a parallel algorithm on the GPU that marks and sweeps such irrelevant instructions. Since removing redundant instructions reduces the length of the propagation paths, fewer iterations are needed before a fixpoint is reached.
  • We also implemented an algorithm that ensures that all threads of a warp have a homogenous amount of work. Without this algorithm, predicates can accumulate along control flow paths so that some threads have a higher workload than others. In this case threads with fewer work idle while others still process predicates since the execution time is dominated by the longest runtime of a thread. To avoid such imbalances, threads that finish the computation on their predicates now support other threads that still have work to do. We use characteristics of the GPU to avoid the use of slow synchronization mechanisms.

To show the effectiveness of our framework we integrated it into the LLVM compiler framework. We picked four LLVM analyses and parallelized them with ParCAn. Ample measurements show that our framework can accelerate LLVM’s compilation process by up to 40%. A publication was accepted at the "28th International Conference on Compiler Construction" and will receive the Best-Paper-Award.

watermark seal