Migrating and Tuning a CUDA-based stencil computation to DPC++ using OneAPI

Clicia Pinto

Clicia Pinto

State of Bahia

1 1
  • 0 Collaborators

We propose the tuning and migration of a CUDA-based RTM to a DPC++ application by applying DPC++ Compatibility Tool. We aim to demonstrate the versatility of OneAPI to build unified code capable of being executed in different processing units such as CPUs and GPUs with low implementation cost. ...learn more

Project status: Published/In Market

oneAPI, HPC

Intel Technologies
DevCloud, oneAPI, DPC++, Intel Integrated Graphics

Docs/PDFs [1]Code Samples [1]Links [2]

Overview / Usage

A stencil operation is an iterative method that updates the value of a field in one spatial location according to the neighboring ones. A sum of products is the typical computation form applied and its complexity increase as the length of the operator increase. Being an iterative method, one must provide re-use of local data between consecutive iterations to improve overall performance. In accelerators, this is a more difficult problem since it is not possible to transfer data between devices at each iteration. Avoid access to external memory is more important mainly for higher laplacian orders.

In several seismic imaging methods, stencil calculation applies the coefficients of the finite-difference scheme as a numeric solution for wave-equation. This is the case of imaging methods such as RTM, widely used in the oil and gas industry to generate images of subsurface structures. Despite the advantages inherent to the method, two major computational bottlenecks characterize it: the high number of floating-point operations during the propagation step and the difficulty in storing the wavefields in the main memory. To mitigate the effect of these bottlenecks, engineering seeks to explore both the intrinsic parallelism of tasks and the optimization of computational resources, designing solutions capable of running on different accelerated processing units, for example. The optimization of this method represents a great economic advantage for exploration geophysics since it reduces the chances of errors in welldrilling.

We propose explore OneAPI functionalities in order to:

  • Migrate 2D-RTM developed by SENAI CIMATEC from CUDA to DPC++ and evaluate its performance;
    • Focusing on DPC++ Compatibility Tool;
  • Evaluate migration process;
    • Focusing on Intel Advisor;
  • Review migrated source code and propose adjustments in memory management.

After the development of this project we could:

  • Achieve a successful proof of concept in migrating the entire RTM application from CUDA to DPC++ using the Compatibility Tool
  • Generate guidance to tuning RTM application using OneAPI functionalities;
  • Evaluate that migrated source code is more readable and easier to maintain;
  • Evaluate that OneAPI unifies the algorithm execution flow for our application in a unique structure;
  • Easily perform a manual review of the migrated source code:
    • Explore different options for memory management;
    • Achieve a performance 2x higher and an arithmetic intensity 4,9x higher.

Methodology / Approach

  • To evaluate numerical equivalence between original and migrated source code: we defining a set of RTM parameters to be used and we run seismic imaging using Koslov velocity model. The results will be compared;
  • DPC++ Compatibility Tool is applied cyclically (prepare, migrate, and review). Minor adjustments are expected after the first migration steps.
  • Intel Advisor will be used to provide resource utilization and performance metrics to evaluate and tuning migrated RTM.

Technologies Used

In the development of this research, we use i) IntelXeonCPU, IntelGPU Gen9, Data-Parallel C++, Intel oneAPI Base toolkit, Intel oneAPI Parallel Studio/VTune, and also Intel DevCloud environment.

Documents and Presentations

Repository

https://github.com/cs2isenaicimatec/OneAPI-solving-stencil-migration

Comments (1)