Accelerate The “Stable” Three-Way QuickSort Performance Leveraging The Modern Nvidia GPGPUs

Arthur V. Ratz

Arthur V. Ratz

Lviv, Lviv Oblast

1 0
  • 0 Collaborators

Another alternative of the classical “stable” three-way quicksort performance optimization using Nvidia CUDA Development Toolkit, OpenMP 4.5/5.0 and Intel’s Open-Source Clang/LLVM compiler distribution. ...learn more

Project status: Published/In Market

HPC, Cloud

Intel Technologies
Other

Code Samples [1]Links [1]

Overview / Usage

The following project is another alternative of the parallel “stable” three-way quicksort implementation, previously introduced in my https://devmesh.intel.com/projects/parallel-stable-sort-performance-optimization-using-intel-parallel-studio-xe-and-intel-oneapi-hpc-toolkit project. The main goal of this project is to provide an even better performance speed-up gain of the parallel “stable” three-way quicksort, offloading the execution of specific workloads to the Nvidia GPUs, rather than host CPU and other acceleration targets, offering the ultimately high performance (about 36x faster) compared to the sequential quicksort execution. Unlike the previous project, I’ve used the OpenMP 4.5/5.0 library with offloading capabilities and open-source distribution of the Intel’s Clang/LLVM compiler (https://github.com/llvm/llvm-project) to deliver a modern code, implementing the parallel three-way quicksort, being introduced.

Methodology / Approach

The parallel “stable” three-way quicksort algorithm introduced in:
  1. "An Efficient Parallel Three-Way Quicksort Using Intel C++ Compiler And OpenMP 4.5 Library" - https://software.intel.com/en-us/articles/an-efficient-parallel-three-way-quicksort-using-intel-c-compiler-and-openmp-45-library

  2. "How To Implement A Parallel "Stable" Three-Way Quicksort Using Intel C++ Compiler and OpenMP 4.5 library" - https://software.intel.com/en-us/articles/how-to-implement-a-parallel-stable-three-way-quicksort-using-intel-c-compiler-and-openmp-45

  3. "How To Implement The Parallel "Stable" Sort Using Intel® MPI Library And Deploy It To A Multi-Node Computational Cluster" - https://software.intel.com/en-us/articles/how-to-implement-a-multi-node-parallel-stable-sort-using-intel-mpi-library

  4. "How To Optimize A Parallel Stable Sort Performance Using The Revolutionary Intel® oneAPI HPC Toolkit" - https://software.intel.com/en-us/articles/how-to-optimize-the-parallel-stable-sort-performance-using-intel-oneapi-hpc-toolkit

Technologies Used

Hardware:

· Nvidia GeForce GTX 1070 SLI x 2 8 GiB GDDR5 Graphics Cards;

Software:

· Nvidia CUDA Development Toolkit;

· Intel’s Open-Source Clang/LLVM compiler distribution;

· OpenMP 4.5/5.0 Library with offloading capabilities;

Repository

https://github.com/arthurratz/parallel_stable_sort_nvptx64

Comments (0)