Ryan T. added links to project Understanding MPI Reduction Algorithms
- Projects 0
- Followers 0
A look into MPI Reduce operations both basic and complex.
Collective communication functions defined by the Message Passing Interface are often used in high performance computing workflows to orchestrate collective actions amongst groups of processes. The reduce operation is useful when developers need to combine data stored at each MPI process. Current reduce implementations are highly optimized to minimize the operationâ€™s execution time while maximizing network and process utilization. This paper explores three basic approaches: binomial, pipelined, and pipelined binary tree reductions. Both theoretical and empirical running times are discussed regarding these algorithms. Modern reduce algorithms utilize these basic approaches and captilize on their strengths. This paper also analyzes a new greedy pipelined reduction algorithm and empirically benchmarks it against current approaches. Findings show that binomial reductions are faster for smaller messages while pipelined and pipelined binary tree implementations are faster for larger messages. Furthermore, the greedy pipelined algorithm can be faster in all situations, however requires that an optimal message segment size be chosen.
Ryan T. added links to project Understanding MPI Reduction Algorithms
Ryan T. added links to project Understanding MPI Reduction Algorithms
Ryan T. created project Understanding MPI Reduction Algorithms
Collective communication functions defined by the Message Passing Interface are often used in high performance computing workflows to orchestrate collective actions amongst groups of processes. The reduce operation is useful when developers need to combine data stored at each MPI process. Current reduce implementations are highly optimized to minimize the operationâ€™s execution time while maximizing network and process utilization. This paper explores three basic approaches: binomial, pipelined, and pipelined binary tree reductions. Both theoretical and empirical running times are discussed regarding these algorithms. Modern reduce algorithms utilize these basic approaches and captilize on their strengths. This paper also analyzes a new greedy pipelined reduction algorithm and empirically benchmarks it against current approaches. Findings show that binomial reductions are faster for smaller messages while pipelined and pipelined binary tree implementations are faster for larger messages. Furthermore, the greedy pipelined algorithm can be faster in all situations, however requires that an optimal message segment size be chosen.
No users to show at the moment.