Spurious Errors: lack of MPI Progress and Failure Detection

A very common mishap while developing parallel applications is to assume an application running at small scales automatically translates into successful large scale runs. Such optimistic views are largely unproven in general, but tools exists to help with the validation process. However, adding resilience to a parallel application has a tendency to increase the likelihood of consistency errors, and unfortunately no tools to help through this process currently exist. In some cases, few common sense practices could save hours of debugging, and improve the quality of the parallel application. As you work on you MPI fault tolerant application, you discover it runs fine for small scales and small data sets, but when increasing the number of processes or the computational load on the participating processes, spurious faults seems to be ‘injected’ for no good reason. It might be easy to blame it on the underlying libraries, but before we go there it is possible you are observing an inter-operability issue between the different layers of the resilient software stack. More precisely, you may be observing the effect of the lack of MPI progress on the failure detector within the MPI library. MPI Progress (and lack thereof) The MPI Continue reading Spurious Errors: lack of MPI Progress and Failure Detection

Uniform Intercomm Creation

A question about uniformly creating an inter-communicator using MPI_Intercomm_create has been posted on the ULFM mailing list. Initially, I though it is an easy corner-case, that can be solved with few barriers and/or agreements. It turns out this issue is more complicated that initially expected, with few twists on the way. Let me detail our adventure toward writing a uniform intercomm creation function. Before moving further, let’s clarify what MPI_Intercomm_create is about. The MPI standard is not very explicit about the scope of this function, but we can gather enough info to start talking about (page 262 line 6): This call [MPI_Intercomm_create] creates an inter-communicator. It is collective over the union of the local and remote groups. Processes should provide identical local_comm and local_leader arguments within each group. Wildcards are not permitted for remote_leader, local_leader, and tag. In other words, if you provide two intra-communicators, a leader on each one and a bridge communicator where the leaders can talk together, you will be able to bind the two groups of processes corresponding to each of the intra-communicators into a inter-communicator. Neat! Graphically speaking this should look like So far so good, but what “uniformly” means? Based on some Continue reading Uniform Intercomm Creation