Molecular Dynamics using SYCL and Complete Offload analysis using Intel Advisor

Abhishek Nandy

Abhishek Nandy

Kolkata, WB

0 0
  • 0 Collaborators

this simulation demonstrates how to use OpenCL SYCL to parallelize the calculation of pairwise Lennard-Jones forces in a particle simulation, which can speed up the simulation and make it more efficient. ...learn more

Project status: Published/In Market

oneAPI, HPC

Intel Technologies
oneAPI, DPC++

Overview / Usage

simulation of a system of N particles using the Lennard-Jones potential. The simulation is performed using SYCL, a parallel programming model for heterogeneous computing, which allows the code to be executed on different types of devices such as CPUs, GPUs, and FPGAs.

The simulation computes the pairwise Lennard-Jones force between particles and updates the positions and velocities of the particles at each time step. The simulation runs for a total of 1500 time steps.

The code initializes the positions and velocities of the particles randomly and places them within a simulation box of size L. At each time step, the code calculates the pairwise forces between particles using the Lennard-Jones potential, which describes the interaction between two particles. The code applies periodic boundary conditions to account for the finite size of the simulation box.

After computing the forces, the code updates the velocities of the particles using a time step of dt and applies the updated velocities to update the positions of the particles. The code applies periodic boundary conditions to the updated positions of the particles to ensure they remain within the simulation box.

The code uses the SYCL parallel_for() function to parallelize the computation of the pairwise forces and the updates of the particle positions and velocities, which can significantly speed up the simulation on supported hardware. The SYCL buffer class is used to transfer data between the host and device memory.

Methodology / Approach

This code is a C++ program that simulates a system of N interacting particles using the Lennard-Jones potential. It performs pairwise force calculations between all pairs of particles and updates the particle positions and velocities at each time step using the Verlet algorithm.

The program first initializes the particle positions and velocities randomly. Then, it uses SYCL (a C++ abstraction layer for parallelism) to parallelize the force and position/velocity updates, which are executed in separate kernel functions.

At each time step, the program first launches a kernel to calculate the pairwise forces between particles. The kernel is executed in parallel across all particles using a SYCL parallel_for loop. Within the kernel, each thread (corresponding to a single particle) calculates the force exerted on that particle by all other particles in the system. The force calculation applies periodic boundary conditions and a cutoff to limit the range of interaction.

After the force calculation is complete, the program launches another kernel to update the particle positions and velocities using the Verlet algorithm. The kernel is executed in parallel across all particles using a SYCL parallel_for loop. Within the kernel, each thread (corresponding to a single particle) updates the particle's position and velocity based on the force calculated in the previous step.

The SYCL buffer class is used to manage the data between the host (CPU) and device (GPU or other accelerator). Buffers are created for each array of particle data, and accessors are used to specify the type of access (read or write) to the data from the host or device. The SYCL queue class is used to manage the execution of kernels on the device.

Overall, this code provides an example of how to use SYCL to parallelize a simulation with a large number of interacting particles. The program could be further optimized by using more efficient data structures or by optimizing the kernel functions themselves.

We offload the code using Intel Advisor and the results are inspiring

We have two regions or loops to be offloaded to an accelerator. These two loops contain 22% of the code

Code can be improved by 27%

Comparing two simulation run one 1000 particles and the other with 1500 particles

Technologies Used

oneAPI

SYCL

Intel Advisor

oneAPI Base toolkit

Comments (0)