Declarative Data Collections for Portable Performance based on oneAPI

Zhibo

Zhibo

Scotland

0 0
  • 0 Collaborators

This project introduces a high performance declarative data collection based on Intel oneAPI - where the programmer declares a data collection only with the properties, and high performance code will be auto generated for heterogeneous platforms such as GPU. ...learn more

Project status: Under Development

oneAPI, HPC

Intel Technologies
oneAPI, DPC++, Migrated To SYCL, XeSS, Intel NUC, Intel CPU, Intel Integrated Graphics

Links [3]

Overview / Usage

Modern programming languages provide programmers with rich abstractions for data collections as part of their standard libraries, e.g. Containers in the C++ STL, the Java Collections Framework, or the Scala Collections API. While convenient, this approach introduces problems which ultimately affect application performance due to users over-specifying collection data types limiting implementation flexibility.

To eliminate the over-specifying issues mentioned above, we have developed a prototyped library for Collection Skeletons which provide a novel, declarative approach to data collections. Using our framework, programmers explicitly select properties for their collections, thereby truly decoupling specification from implementation. Our library has been developed based on C metaprogramming thus minimum runtime overhead would be introduced and is friendly to many programmers, from C beginners to those experienced.

To further improve the computational efficiency as well as the portability of the Collection Skeletons, we are extending the prototype library to multiple platforms based on Intel oneAPI. With the extended library, the programmers can still specify the collections with properties without change any code, and our library will port the code to an optimal platform for computational performance whenever possible. The parallelized Collection Skeletons will work as a convenient programming model for high performance data-centric parallelism. Besides, ordinary users will also benefit from its concise and declarative programming paradigm.

Methodology / Approach

  1. The Concrete Data Structures as Backends

Our extended library will be based on Intel oneAPI - Firstly, the library will integrate with concurrent containers from Intel oneTBB as backends. New mapping rules will be developed based on the pattern-matching algorithm mentioned in our paper to host the concrete data structures from Intel oneTBB. Secondly, we will also develop more concurrent data structures based on Intel oneTBB to provide more concrete data structures.

  1. Portability based on SYCL

Based on SYCL, we can port the exactly same piece of code to multiple platforms, including CPU, GPU and FPGA. We are implementing an Automatic Computational Directives Discovery & Direction technique which translates user's code to SYCL-enabled code while preserving the functionality of the code. For example, the technique can discover a block of for loop and insert SYCL required environment as well as other definitions. This technique is transparent to the programmer thus the programmer can declare their collections with properties and write their code as usual.

  1. Parallelisation Profitability Analyser

Not every piece of program can be parallelised nor would benefit from parallelisation. We propose a Parallelisation Profitability Analyser works during runtime. For example, the size of input data can have significant impact on the performance of parallel computation. If the size is too small, then it can be possible that we are not gaining speedup from an accelerator. There are multiple factors and the Parallelisation Profitability Analyser will decide the optimal choice based on all the factors. We expect the parallelisation profitability analyser makes the right decision, i.e. pick the optimal target platform based on the availability, in most situations. Furthermore, the runtime overhead of the Parallelisation Profitability Analyser should be as little as possible.

Technologies Used

Libries used: Intel oneAPI including OneTBB for backends, oneDPL(SYCL) for portability, DPC for performance evaluation, oneMKL for additional numeric computing, oneCCL for communications between computing nodes; C Boost, Clang 14.0.5 and above with C++17 enable

Hardware: Intel platforms(Core, Xeon, Iris Xe, and FPGA)

Operating Systems: Linux(RHEL and Debian). Would not consider testing on Windows at this stage.

Comments (0)