Understanding MPI Reduction Algorithms
Featured

Understanding MPI Reduction Algorithms

A look into MPI Reduce operations both basic and complex.

Modern Code

  • 0 Collaborators

  • 2 Followers

Description

Collective communication functions defined by the Message Passing Interface are often used in high performance computing workflows to orchestrate collective actions amongst groups of processes. The reduce operation is useful when developers need to combine data stored at each MPI process. Current reduce implementations are highly optimized to minimize the operation’s execution time while maximizing network and process utilization. This paper explores three basic approaches: binomial, pipelined, and pipelined binary tree reductions. Both theoretical and empirical running times are discussed regarding these algorithms. Modern reduce algorithms utilize these basic approaches and captilize on their strengths. This paper also analyzes a new greedy pipelined reduction algorithm and empirically benchmarks it against current approaches. Findings show that binomial reductions are faster for smaller messages while pipelined and pipelined binary tree implementations are faster for larger messages. Furthermore, the greedy pipelined algorithm can be faster in all situations, however requires that an optimal message segment size be chosen.

Links

Source Code

Paper

Medium 0 x1koj4krwcv2ikl8onsmwkzy4q4diel 2tsdd6jueiq2iqhqovsdwixzhcvhkoi6uzs2a25juam23u4dmz35oilr2amu3uj5uz3hx4jumqh532e2upadzacywi

Ryan T. created project Understanding MPI Reduction Algorithms

Medium 57792815 26b4 4558 8843 e3162071218d

Understanding MPI Reduction Algorithms

Collective communication functions defined by the Message Passing Interface are often used in high performance computing workflows to orchestrate collective actions amongst groups of processes. The reduce operation is useful when developers need to combine data stored at each MPI process. Current reduce implementations are highly optimized to minimize the operation’s execution time while maximizing network and process utilization. This paper explores three basic approaches: binomial, pipelined, and pipelined binary tree reductions. Both theoretical and empirical running times are discussed regarding these algorithms. Modern reduce algorithms utilize these basic approaches and captilize on their strengths. This paper also analyzes a new greedy pipelined reduction algorithm and empirically benchmarks it against current approaches. Findings show that binomial reductions are faster for smaller messages while pipelined and pipelined binary tree implementations are faster for larger messages. Furthermore, the greedy pipelined algorithm can be faster in all situations, however requires that an optimal message segment size be chosen.

No users to show at the moment.