Performance Profiling Pi

0 0
  • 0 Collaborators

You have your C/C++ program ready for CPU/GPU, congrats! Now let's identifying bottlenecks and measure your code's arithmetic like a performance engineer. ...learn more

Project status: Published/In Market

HPC

Intel Technologies
Intel vTune

Code Samples [1]

Overview / Usage

Developed as a part of discussion material used in Scientific Computing and Visualization (CSCI 596) taught at USC.

Congratulations on writing your C/C++ program for CPU/GPU computing! Now that you've mastered the basics, you might be wondering how to optimize your code and get the most out of your hardware. One way to achieve this is by identifying bottlenecks and measuring your code's arithmetic intensity against your machine's limits. To do this, you can use various profiling tools such as gprof, perf, VTune, or CUDA profiler. These tools can help you understand where your program is spending most of its time, where the hotspots are, and where you can optimize your code. So, let's dive into performance profiling and take your programming skills to the next level!

Methodology / Approach

Write a Pi Program which compiles and runs on

  • CPU
  • GPU

Download & install the latest version of the Intel oneAPI Vtune Profiler GUI from this link.

Upon installation, launch the GUI from the installation directory depending on your OS.

  • windows: [Program Files]\Intel\oneAPI\vtune
  • Linux OS: /opt/intel/oneapi/vtune/
  • mac OS: /opt/intel/oneapi/vtune_profiler/

Download and install the latest version of Intel Advisor here

We can now build the binaries using the following make commands.

make singlethreaded\_pi\_calc

make multithreaded_pi_calc

You should have two executables in your working directory. Set the environment to limit the OpenMp threads

export OMP\_NUM\_THREADS=2

Try executing the binaries and see if you get the value of pi

$./singlethreaded\_pi\_calc

PI = 3.141593
$ ./multithreaded_pi_calc
PI = 3.141593

Not we capture some profile reports with the following commands

vtune -collect hotspots -result-dir rSingleThread ./singlethreaded\_pi\_calc

vtune -collect hotspots -result-dir rMultiThread ./multithreaded_pi_calc
vtune -collect memory-consumption -result-dir rMultiMemory ./multithreaded_pi_calc

This will result in the creation of two reports named rSingleThread, rMultiThread, rMultiMemory. Import the the files to your local machine to view the results.

Technologies Used

VTune, icpx, icpc

Repository

https://github.com/TaufeqRazakh/IntroToPerformanceProfiling

Comments (0)