Posts

Add Post

« Return to Posts

How To Build And Run Your Modern Parallel Code In DPC++ and OpenCL/SYCL On NVIDIA GPUs

How To Build And Run Your Modern Parallel Code In DPC++ and OpenCL/SYCL On NVIDIA GPUs

Before We Begin…
In my previous blog (https://devmesh.intel.com/blog/724749/how-to-build-and-run-your-modern-parallel-code-in-c-17-and-openmp-4-5-library-on-nvidia-gpus) I have thoroughly discussed how to build OpenMP 4.5 parallel code the Intel-LLVM/Clang compiler, supporting the Nvidia GPGPUs CUDA-NVPTX64 capabilities. In this blog, I will discuss how to use the Intel-LLVM/Clang compiler staging distribution (https://github.com/intel/llvm) to build and run the OpenCL/SYCL code on the Nvidia GPUs acceleration targets. This feature was deprecated in the official pre-releases of Intel oneAPI Toolkit, but it might become useful while evaluating the performance of the C++17 OpenCL/SYCL-code delivered by running it on the various of acceleration targets with different hardware architecture.

Hardware Requirements
To setup the development environment, all that we need is a Intel® Core™, Intel® Xeon® CPU @ 3.6 Ghz CPUs – based local development machine with the one more multiple Nvidia GeForce GPUs installed and SLI x2/ SLI x3 enabled. Please, notice that the following approach being discussed will not work in case when using virtual machines with the Hyper-V, VMware, VirtualBox or Qemu generic virtual graphics card installed.

Software Requirement
The approach discussed in this blog works only for the local development machine with Ubuntu Desktop 18.04.4 Bionic Beaver x86_64 installed. Unfortunately, we cannot use the current LLVM/Clang-10.0.0 distribution release in the Microsoft Windows environment. Also, it is not necessary to update/downgrade the Ubuntu 18.04.4 kernel, compiling it from sources, since the latest version 5.4.0 of the Linux x86_64 kernel is already fully supported.

Installing Prerequisites
After we’ve installed the Ubuntu Desktop 18.04.4 on the physical local development machine, our goal is to install the prerequisite packages first. To do that we must use the following command from the Linux bash-console with root administrative privileges elevated:

sudo apt install -y build-essential libqt5 python libelf-dev libffi-dev openssl pkg-config ninja-build git*

In this case we have a need to install the required build toolchain that includes GNU GCC-C++ compiler, MAKE/CMAKE utilities and development libraries, the using of which is necessary for compiling the LLVM/Clang project.

Downloading And Installing Nvidia Accelerated Graphics Driver

The next step is to install the latest Nvidia graphics driver for the graphics cards being installed on the development machine. Before doing that, we also need to disable and blacklist the standard Nouveau graphics driver installed during the Ubuntu Desktop 18.04 setup

**sudo bash -c "echo blacklist nouveau >> /etc/modprobe.d/blacklist-nvidia-nouveau.conf"
sudo bash -c "echo options nouveau modeset=0 >> /etc/modprobe.d/blacklist-nvidia-nouveau.conf"
sudo update-initramfs -u
**

To disable the Nouveau driver we need to blacklist it, re-build the current Ubuntu’s initramfs image by using the commands listed above, and reboot the development machine.
Since the Nouveau driver has been permanently disabled, now we can download and install Nvidia Graphics Driver:
https://www.nvidia.com/en-us/geforce/drivers/
After downloading the Nvidia Graphics Driver we must run the installation file by using the following command:

sudo sh ./nvidia/NVIDIA-Linux-x86_64-450.57.run –silent

In fact, the Nvidia CUDA Development Toolkit already includes the installation of the Nvidia Graphics Driver, but, anyway, it’s strongly recommended to install the driver separately and then install Nvidia CUDA development tools only. Finally, after the graphics driver installation we must reboot the development machine once again in order the changes to take an effect.

**Downloading And Installing Nvidia CUDA Development Toolkit

Similarly, we have a need to install Nvidia CUDA Development Toolkit to provide the Nvidia GPU compute capabilities:
https://developer.nvidia.com/cuda-downloads
Specifically, we need to download Nvidia CUDA Toolkit v.10.1.0, since the latest versions of CUDA are not supported by the LLVM/Clang project.
After downloading the specific installation file, we must invoke the following command to install CUDA Toolkit:

sudo sh ./nvidia/cuda_10.1.243_418.87.00_linux.run --silent --override --run-nvidia-xconfig –toolkit

We normally run the CUDA installation in silent mode, reconfiguring the Ubuntu Xorg 11 and overriding the re-compiling of the default libraries, when the kernel image is updated. After that, we must reboot the development machine to apply the configuration changes made.

Downloading And Installing The Latest Version Of CMAKE

Prior to building and using the Intel’s DPC++ Compiler based on the Intel-LLVM/Clang staging distribution, we basically need to download and install the latest version of CMAKE utility:
https://github.com/Kitware/CMake/releases/download/v3.18.0-rc4/cmake-3.18.0-rc4-Linux-x86_64.tar.gz

To make this process simple we will download CMAKE utility pre-built binaries and copy the required files to the proper locations using the rsync command:

**tar -xvf cmake-3.18.0-rc4-Linux-x86_64.tar.gz

sudo rsync -r ./ cmake-3.18.0-rc4-Linux-x86_64/ /usr/*

Since the CMAKE utility has been successfully installed we can proceed with the next maintenance steps.

Downloading And Installing Intel Threading Building Blocks (TBB) Library (Optional)

To have an ability to compile and run parallel code implemented using TBB library, it is also recommended to install the Intel Threading Building Blocks (TBB) library:

https://software.intel.com/content/www/us/en/develop/tools/threading-building-blocks.html

This can be done by using the following commands:

tar -xvf l_tbb_2020.2.217.tgz
./l_tbb_2020.2.217/install.sh --silent --install_dir=/opt/intel --accept_eula=yes

After all libraries and prerequisites have been successfully install we now can build and configure the Intel LLVM/Clang compiler distribution.

Building And Configuring Intel-LLVM/Clang Compiler

First, that we have to do is to set the environment variable DPCPP_HOME to specify the Intel-LLVM/Clang project’s sources location and create a specific directory to which those sources will be downloaded from GitHub repository:

export DPCPP_HOME=/sycl_workspace
mkdir $DPCPP_HOME
cd $DPCPP_HOME
sudo echo "export DPCPP_HOME=
/sycl_workspace" >> /root/.bashrc

After that we must use the git command in bash-console to clone the project’s sources from repository:

git clone https://github.com/intel/llvm -b sycl

Since the project’s sources have been successfully downloaded, now we can trigger the compilation process by using these two commands:

**python $DPCPP_HOME/llvm/buildbot/configure.py --cuda
python $DPCPP_HOME/llvm/buildbot/compile.py
**
The first command will be used for basic configuration tasks. Make sure that this command is invoked with –cuda argument, which makes it possible to build the compiler with NVPTX64 offloading capabilities enabled. The second command will build the project’s sources and copy all binary executables to the proper locations on the system.

Finally, we must set the number of environment variables to specify paths to the compiler’s executables and libraries installed:

**sudo echo "export PATH=$DPCPP_HOME/llvm/build/bin:$PATH" >> /root/.bashrc
sudo echo "export LD_LIBRARY_PATH=$DPCPP_HOME/llvm/build/lib:$LD_LIBRARY_PATH" >> /root/.bashrc
sudo echo "source /opt/intel/tbb/bin/tbbvars.sh intel64" >> /root/.bashrc
**

After that it’s highly recommended to reboot the development machine in order the changes will take an effect.

Building And Running OpenCL/SYCL Code In C++17 On Nvidia GPUs

To make sure that everything is working just fine, let’s build and execute the following C++17 OpenCL/SYCL code (https://intel.github.io/llvm-docs/GetStartedGuide.html), executing it on the Nvidia GPU.

To build the following code we must use the command shown below:

clang -std=c17 -fsycl -fsycl-targets=nvptx64-nvidia-cuda-sycldevice simple_gpu.cpp -o simple-gpu

When the compilation of this code is completed we will need to run it by using the following command in the bash-console:

**./simple_gpu
**

After the program has been launched, we can also verify if the code is executed on the Nvidia GPU. To do that we must open another bash-console and run the following command:

watch -n 0.1 nvidia-smi

After invoking this command the table with Nvidia graphics card information is displayed. To make sure that the program is executed entirely on the GPU, we must monitor GPU utilization parameter displayed and its value has reached 100% percent, which indicates that the GPU is busy with the following program execution.

Acknowledgements
The approach introduced in this blog was also briefly discussed in https://intel.github.io/llvm-docs/GetStartedGuide.html