Performance Profiling Pi
Taufeq Razakh
Unknown
- 0 Collaborators
You have your C/C++ program ready for CPU/GPU, congrats! Now let's identifying bottlenecks and measure your code's arithmetic like a performance engineer. ...learn more
Project status: Published/In Market
Intel Technologies
Intel vTune
Overview / Usage
Developed as a part of discussion material used in Scientific Computing and Visualization (CSCI 596) taught at USC.
Congratulations on writing your C/C++ program for CPU/GPU computing! Now that you've mastered the basics, you might be wondering how to optimize your code and get the most out of your hardware. One way to achieve this is by identifying bottlenecks and measuring your code's arithmetic intensity against your machine's limits. To do this, you can use various profiling tools such as gprof, perf, VTune, or CUDA profiler. These tools can help you understand where your program is spending most of its time, where the hotspots are, and where you can optimize your code. So, let's dive into performance profiling and take your programming skills to the next level!
Methodology / Approach
Write a Pi Program which compiles and runs on
- CPU
- GPU
Download & install the latest version of the Intel oneAPI Vtune Profiler GUI from this link.
Upon installation, launch the GUI from the installation directory depending on your OS.
- windows: [Program Files]\Intel\oneAPI\vtune
- Linux OS: /opt/intel/oneapi/vtune/
- mac OS: /opt/intel/oneapi/vtune_profiler/
Download and install the latest version of Intel Advisor here
We can now build the binaries using the following make
commands.
make singlethreaded\_pi\_calc
make multithreaded_pi_calc
You should have two executables in your working directory. Set the environment to limit the OpenMp threads
export OMP\_NUM\_THREADS=2
Try executing the binaries and see if you get the value of pi
$./singlethreaded\_pi\_calc
PI = 3.141593
$ ./multithreaded_pi_calc
PI = 3.141593
Not we capture some profile reports with the following commands
vtune -collect hotspots -result-dir rSingleThread ./singlethreaded\_pi\_calc
vtune -collect hotspots -result-dir rMultiThread ./multithreaded_pi_calc
vtune -collect memory-consumption -result-dir rMultiMemory ./multithreaded_pi_calc
This will result in the creation of two reports named rSingleThread
, rMultiThread
, rMultiMemory
. Import the the files to your local machine to view the results.
Technologies Used
VTune, icpx, icpc