A DPC++ Backend for the OCCA Portability Framework

Kris Rowe

Kris Rowe

Lemont, Illinois

OCCA—an open source, portable, and vendor neutral framework for parallel programming on heterogeneous platforms—is used by mission critical computational science and engineering applications of public and private sector organizations, including the U.S. Department of Energy and Shell. ...learn more

Project status: Published/In Market

oneAPI, HPC

Intel Technologies
oneAPI, DPC++, Intel Iris Xe, Intel Iris Xe MAX, Intel vTune

Docs/PDFs [1]Code Samples [1]Links [5]

Overview / Usage

OCCA is an open source, portable, and vendor neutral framework for parallel programming on heterogeneous platforms. The framework consists of several orthogonal components which can be used together or individually: the OCCA API and runtime, the OCCA kernel language, and the OCCA command line tool.

The OCCA API provides unified models for—such as a device, memory, or kernel—which are common to other programming models. The OCCA runtime provides several backends—including DPC++, CUDA, HIP, OpenMP, OpenCL, and Metal—which implement the API as a set of lightweight wrappers. Language support is provided for applications written in C, C, and Fortran.

The OCCA Kernel Language (OKL) enables the creation of portable device kernels using a directive-based extension to the C-language. During runtime, the OCCA Jitter translates OKL code to the programming language of the chosen backend, eventually generating the device binary using the chosen backend stack. Alternatively, kernels can be written as backend-specific code (e.g., OpenCL or CUDA) directly.

Mission critical computational science and engineering applications from the U.S. Department of Energy and Shell rely on OCCA. For example, NekRS—a new computational fluid dynamics solver from the Nek5000 team—is used simulate coolant flow inside of small modular reactors, and design more efficient combustion engines. The development of a DPC+ + backend for OCCA was jointly undertaken by Argonne Leadership Computing Facility and Intel in order to support these applications on platforms utilizing Intel Xe GPUs, including the Aurora exascale supercomputer.

Methodology / Approach

Since OCCA supports device kernels written using backend-specific code, the OCCA DPC+ + backend was developed in three phases.

  1. The OCCA API was implemented. Device kernels written in DPC+ + were used to verify correctness.
  2. The logic for OKL to DPC+ + translation was implemented. To verify correctness, OKL kernels corresponding to the DPC+ + kernels in step 1 (above) were translated using the OCCA command line tool.
  3. Combined usage of the OCCA API and OKL kernels was validated using OCCA's internal test harness, microbenchmark kernels and mini-apps, and the full NekRS application.

The OCCA runtime uses the PIMPL design pattern. An abstract interface is provided via a core collection of base classes. To create a new OCCA backend, developers need to extend these base classes and implement their virtual functions. Memory management in OCCA applications is handled through C-style malloc and free functions; subsequently, memory is passed to device kernels as pointer arguments. The DPC+ + Unified Shared Memory model was used in the implementation of the OCCA DPC+ + backend since it most closely aligns with OCCA's API. In contrast, using the DPC+ + buffer/accessor approach would have required significant redesigning of OCCA's internal API and would likely affect many downstream projects. Finally, OKL kernels are translated to extern "C" functions which invoke a DPC+ + kernel—defined as a lambda capture—using the nd_range flavor of parallel_for.

Technologies Used

Software

  • Intel oneAPI Base Toolkit
    • Intel oneAPI DPC++ Compiler
    • Intel Distribution for GDB
    • Intel Advisor
    • Intel VTune Profiler
  • CMake
  • Visual Studio Code
  • OCCA
  • NekRS

Hardware

  • Intel Iris Xe (Gen9) GPU
  • Intel Iris Xe MAX GPU
  • Intel Xe-HP GPU
  • Argonne National Laboratory's JLSE Testbeds

Acknowledgments

_This work was supported by Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357 and by the Exascale Computing Project (17-SC-20-SC), a joint project of the U.S. Department of Energy’s Office of Science and National Nuclear Security Administration, responsible for delivering a capable exascale ecosystem, including software, applications, and hardware technology, to support the nation’s exascale computing imperative. _

Documents and Presentations

Repository

https://github.com/libocca/occa

Collaborators

3 Results

3 Results

Comments (0)