Parallel Three-Way Quicksort Using Data Parallel C++ and Intel® oneAPI Toolkits
Arthur V. Ratz
Lviv, Lviv Oblast
- 0 Collaborators
A case-study of the classical three-way quicksort performance optimization using the revolutionary Intel® oneAPI HPC Toolkit ...learn more
Project status: Published/In Market
Intel Technologies
DevCloud,
oneAPI,
DPC++,
Intel Integrated Graphics,
Intel FPGA,
Intel CPU,
Intel vTune
Overview / Usage
In the era of modern computing, the sorting has a large influence on commercial and scientific data processing. The sorting algorithms are commonly used for solving the variety of problems in many fields of science and engineering, including financial transactions processing, mathematical and economic statistics, linear algebra and functional optimization, data compression and visualization, cryptography, linguistic systems, timeline and task scheduling, log and report analysis, artificial intelligence (AI) and data mining, etc. There’s an entire class of the fastsort algorithms used for sorting data effectively. These algorithms have the number of known implementations such as std::sort or qsort functions implemented as a part of C/C++ standard libraries. However, the most of existing fastsort implementations become less efficient, especially when sorting the various of big data. To benefit in sorting of the typically huge amounts of data having more complex user-defined abstract data types (ADT), we obviously need an approach that allows to increase the overall performance of the fastsort over its known shortcomings and limitations. In this research project, I’ve discussed about the aspects of using the various of the Intel’s HPC libraries and tools to deliver a modern code, performing the “stable” sort of big data in parallel on many-core Intel® CPUs as well as the innovative GPUs and FPGA hardware acceleration targets. This, in turn, allows to perform the sort by 2x-11x times faster, depending on the hardware acceleration platform, on which the specific workloads are executed. Additionally, I will discuss how to evaluate the performance speed-up of the parallel “stable” sort, compared to the performance of the legacy sequential sort performed by using the std::sort and qsort routines. For that purpose, I will use the two different performance metrices, such as either the execution wall-time measurement or more complex Intel® V-Tune™ Profiler to examine the performance and scalability of the modern parallel code being discussed. This project contains the number of code samples, demonstrating the parallel "stable" sort, running it on the variety of the Intel's innovative hardware acceleration targets.
What's Next...
Here's another interesting alternative of this project running the parallel stable three-way quicksort on Nvidia GPGPUs. The complete source code is available for downloading at https://drive.google.com/drive/folders/158kR9MPDbSYm0o7gGuw4dzKtoJDyqjyu?usp=sharing
Methodology / Approach
Here’s a series of articles, describing the basic concept and providing the useful guidelines for delivering a modern code, that implements an efficient parallel “stable” three-way quicksort, using Intel® Parallel Studio™ XE and the revolutionary new Intel® oneAPI HPC Toolkit:
- "An Efficient Parallel Three-Way Quicksort Using Intel C++ Compiler And OpenMP 4.5 Library" - https://software.intel.com/en-us/articles/an-efficient-parallel-three-way-quicksort-using-intel-c-compiler-and-openmp-45-library
- "How To Implement A Parallel "Stable" Three-Way Quicksort Using Intel C++ Compiler and OpenMP 4.5 library" - https://software.intel.com/en-us/articles/how-to-implement-a-parallel-stable-three-way-quicksort-using-intel-c-compiler-and-openmp-45
- "How To Implement The Parallel "Stable" Sort Using Intel® MPI Library And Deploy It To A Multi-Node Computational Cluster" - https://software.intel.com/en-us/articles/how-to-implement-a-multi-node-parallel-stable-sort-using-intel-mpi-library
- "How To Optimize A Parallel Stable Sort Performance Using The Revolutionary Intel® oneAPI HPC Toolkit" - https://software.intel.com/en-us/articles/how-to-optimize-the-parallel-stable-sort-performance-using-intel-oneapi-hpc-toolkit
Technologies Used
Intel® Parallel Studio™ XE, Intel® oneAPI HPC Toolkit, Intel® C Compiler 19.0, OpenMP 4.5, Data Parallel C (DPC++), Intel® oneAPI Threading Building Blocks (TBB)
Documents and Presentations
Accelerate_Sorting_for_Big_Data_Workloads_Aug2020.pdf
An_Efficient_Parallel_Three-Way_Quicksort_Using_Intel_C___Compiler_And_OpenMP_4.5_Library.pdf
Repository
https://github.com/arthurratz/intel_parallel_stable_sort_oneapi