LoopBench
Greyson Britt
Tempe, Arizona
- 0 Collaborators
LoopBench is a benchmarking tool to assess the effectiveness of available processors in accelerating common patterns of loops. LoopBench is originally written for openCL, but we are currently working on a oneAPI version. The current LoopBench version supports GPU and FPGA. ...learn more
Project status: Under Development
Intel Technologies
oneAPI,
DPC++,
Intel FPGA,
Intel Integrated Graphics
Overview / Usage
Computationally intensive applications usually consist of multiple nested or flattened loops. These loops are the main building blocks of the applications and embody a specific type of execution pattern. In order to reduce the running time of the loops, developers are analyzing the loops in the code and attempting to parallelize them (either spatially or temporally) on the target hardware accelerators in a heterogeneous system. Unfortunately, the lack of understanding of both the loop characteristics and the ability of hardware accelerators in handling different types of loops often prevents application developers from choosing the right platform. In addition, developing an accelerator specific code is a time-consuming effort. To address this issue, we have developed LoopBench, a bench-marking tool to assess the effectiveness of available processors in accelerating common loop patterns. LoopBench includes five important types of loops that commonly exist in real-world applications, and evaluates different processors in accelerating these loop patterns. The results from LoopBench explain architectural differences between different accelerators with regard to different loop patterns. In addition, LoopBench provides insights for the developers to choose the right accelerators for their applications, prior to coding. The current version of our benchmark supports both Field-Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs), which are the most versatile and widely available accelerators.
Methodology / Approach
Our approach to understanding how to choose the optimal accelerator for a given algorithm is by studying the performance characteristics of common loop patterns on GPUs and FPGAs. Following this approach, we designed LoopBench, a new benchmark suite that captures the key loop patterns extracted from real-world algorithms, and allows flexible testing of each type of loop by varying the following key parameters:
(1) Computational intensity, which is the total number of computational operations that each iteration of the algorithm performs. In our benchmark, it is defined as the number of cosine functions. The computational intensity can affect the size of the pipeline and the number of instructions on both FPGA and GPU. Changing this parameter can show how both platforms performances are susceptible to the amount of computation.
(2) Dependency and concurrency degrees, which defines how many iterations depends on each other and how many other iterations can be executed separately.
(3) Input data size, which specifies the total number of floating point variables that the algorithm processes. The size of the input data can affect the load of computation on a target platform, which can decide the suitability of one device over another.