Uniform Intercomm Creation

A question about uniformly creating an inter-communicator using MPI_Intercomm_create has been posted on the ULFM mailing list. Initially, I though it is an easy corner-case, that can be solved with few barriers and/or agreements. It turns out this issue is more complicated that initially expected, with few twists on the way. Let me detail our adventure toward writing a uniform intercomm creation function. Before moving further, let’s clarify what MPI_Intercomm_create is about. The MPI standard is not very explicit about the scope of this function, but we can gather enough info to start talking about (page 262 line 6): This call [MPI_Intercomm_create] creates an inter-communicator. It is collective over the union of the local and remote groups. Processes should provide identical local_comm and local_leader arguments within each group. Wildcards are not permitted for remote_leader, local_leader, and tag. In other words, if you provide two intra-communicators, a leader on each one and a bridge communicator where the leaders can talk together, you will be able to bind the two groups of processes corresponding to each of the intra-communicators into a inter-communicator. Neat! Graphically speaking this should look like So far so good, but what “uniformly” means? Based on some Continue reading Uniform Intercomm Creation

ULFM Specification update

A new version of the ULFM specification accounting for remarks and discussions going on at the MPI Forum Meeting in Chicago in December 2013 has been posted under the ULFM Specification item. This new update adds a new error code to separate process failure errors from non-impacted requests when they remain pending (MPI_ERR_PROC_FAILED_PENDING), and adds new examples. Head to ULFM Specification for more info.

New Usage Guide

To clarify the difference between installation/setup and usage, the old Usage Guide has been moved to ULFM Setup and a new Usage Guide has been put in place to provide instruction and examples for using ULFM constructs in MPI code. For now, this example section provides the code outlined in the ULFM specification, but this will eventually be amended to include more complete and unique examples. You can find the both of these pages in the menu bar, under User Level Failure Mitigation.