Parallel Three-Way Quicksort Using Data Parallel C++ and Intel® oneAPI Toolkits

Arthur V. Ratz

Arthur V. Ratz

Lviv, Lviv Oblast

5 0
  • 0 Collaborators

A case-study of the classical three-way quicksort performance optimization using the revolutionary Intel® oneAPI HPC Toolkit ...learn more

Project status: Published/In Market

oneAPI, HPC

Intel Technologies
DevCloud, oneAPI, DPC++, Intel Integrated Graphics, Intel FPGA, Intel CPU, Intel vTune

Docs/PDFs [5]Code Samples [1]

Overview / Usage

In the era of modern computing, the sorting has a large influence on commercial and scientific data processing. The sorting algorithms are commonly used for solving the variety of problems in many fields of science and engineering, including financial transactions processing, mathematical and economic statistics, linear algebra and functional optimization, data compression and visualization, cryptography, linguistic systems, timeline and task scheduling, log and report analysis, artificial intelligence (AI) and data mining, etc. There’s an entire class of the fastsort algorithms used for sorting data effectively. These algorithms have the number of known implementations such as std::sort or qsort functions implemented as a part of C/C++ standard libraries. However, the most of existing fastsort implementations become less efficient, especially when sorting the various of big data. To benefit in sorting of the typically huge amounts of data having more complex user-defined abstract data types (ADT), we obviously need an approach that allows to increase the overall performance of the fastsort over its known shortcomings and limitations. In this research project, I’ve discussed about the aspects of using the various of the Intel’s HPC libraries and tools to deliver a modern code, performing the “stable” sort of big data in parallel on many-core Intel® CPUs as well as the innovative GPUs and FPGA hardware acceleration targets. This, in turn, allows to perform the sort by 2x-11x times faster, depending on the hardware acceleration platform, on which the specific workloads are executed. Additionally, I will discuss how to evaluate the performance speed-up of the parallel “stable” sort, compared to the performance of the legacy sequential sort performed by using the std::sort and qsort routines. For that purpose, I will use the two different performance metrices, such as either the execution wall-time measurement or more complex Intel® V-Tune™ Profiler to examine the performance and scalability of the modern parallel code being discussed. This project contains the number of code samples, demonstrating the parallel "stable" sort, running it on the variety of the Intel's innovative hardware acceleration targets.

What's Next...

Here's another interesting alternative of this project running the parallel stable three-way quicksort on Nvidia GPGPUs. The complete source code is available for downloading at https://drive.google.com/drive/folders/158kR9MPDbSYm0o7gGuw4dzKtoJDyqjyu?usp=sharing

Methodology / Approach

Here’s a series of articles, describing the basic concept and providing the useful guidelines for delivering a modern code, that implements an efficient parallel “stable” three-way quicksort, using Intel® Parallel Studio™ XE and the revolutionary new Intel® oneAPI HPC Toolkit:

  1. "An Efficient Parallel Three-Way Quicksort Using Intel C++ Compiler And OpenMP 4.5 Library" - https://software.intel.com/en-us/articles/an-efficient-parallel-three-way-quicksort-using-intel-c-compiler-and-openmp-45-library
  2. "How To Implement A Parallel "Stable" Three-Way Quicksort Using Intel C++ Compiler and OpenMP 4.5 library" - https://software.intel.com/en-us/articles/how-to-implement-a-parallel-stable-three-way-quicksort-using-intel-c-compiler-and-openmp-45
  3. "How To Implement The Parallel "Stable" Sort Using Intel® MPI Library And Deploy It To A Multi-Node Computational Cluster" - https://software.intel.com/en-us/articles/how-to-implement-a-multi-node-parallel-stable-sort-using-intel-mpi-library
  4. "How To Optimize A Parallel Stable Sort Performance Using The Revolutionary Intel® oneAPI HPC Toolkit" - https://software.intel.com/en-us/articles/how-to-optimize-the-parallel-stable-sort-performance-using-intel-oneapi-hpc-toolkit

Technologies Used

Intel® Parallel Studio™ XE, Intel® oneAPI HPC Toolkit, Intel® C Compiler 19.0, OpenMP 4.5, Data Parallel C (DPC++), Intel® oneAPI Threading Building Blocks (TBB)

Documents and Presentations

Repository

https://github.com/arthurratz/intel_parallel_stable_sort_oneapi

Comments (0)