Logarithmic Revoke Routine

Starting with ULFM-1.0, the implementation features a logarithmic revoke operation, with a logarithmically bound per-node communication degree. A paper presenting this implementation will be presented at EuroMPI’15. The purpose of the Revoke operation is the propagation of failure knowledge, and the interruption of ongoing, pending communication, under the control of the user. We explain that the Revoke operation can be implemented with a reliable broadcast over the scalable and failure resilient Binomial Graph (BMG) overlay network. Evaluation at scale, on a Cray XC30 supercomputer, demonstrates that the Revoke operation has a small latency, and does not introduce system noise outside of failure recovery periods. Purpose of the Revoke Operation If the communication pattern of the application is complex, the occurrence of failures has the potential to deeply disturb the application and prevent an effective recovery from being implemented. Consider the example in the above figure: as long as no failure occurs, the processes are communicating in a point-to-point pattern (we decide to call plan A). Process Pk is waiting to receive a message from Pk-1, then sends a message to Pk+1 (when such processes exist). Let’s observe the effect of introducing a failure in plan A, and consider that P1 has failed. As only P2 communicates directly with P1, other processes do not Continue reading Logarithmic Revoke Routine

Logarithmic Agreement Routine

Starting with ULFM-1.0, the implementation features a purely logarithmic agreement, with no single point of failure. A paper presenting this implementation will be presented during SC’15. We considered a practical agreement algorithm with the following desired properties: the unique decided value is the result of a combination of all values proposed by deciding processes (a major difference with a 1-set agreement), failures consist of permanent crashes in a pseudo-synchronous system (no data corruption, loss of message, or malicious behaviors are considered), the agreement favors the failure-free performance over the failure case, striving to exchange a logarithmic number of messages in the absence of failures. To satisfy this last requirement, we introduced a practical, intermediate property, called Early Returning: that is the capacity of an early deciding algorithm to return before the stopping condition (early or not) is guaranteed: as soon as a process can determine that the decision value is fixed (except if it fails itself), the process is allowed to return. However, because the process is allowed to return early, later failures may compel that process to participate in additional communications.  Therefore, the decision must remain available after the processes return, in order to serve unexpected message exchanges until the stopping condition can be established. Unlike a regular early stopping algorithm, not all processes decide and Continue reading Logarithmic Agreement Routine

ULFM 1.0 Announced

The major 1.0 milestone has been reached for the User Level Failure Mitigation compliant fault tolerant MPI. We have focused on improving performance, both before and after the occurence of failures. The list of new features includes: Support for the non-blocking version of the agreement, MPI_COMM_IAGREE. Compliance with the latest ULFM specification draft. In particular, the MPI_COMM_(I)AGREE semantic has changed. New algorithm to perform agreements, with a truly logarithmic complexity in number of ranks, which translates into huge performance boosts in MPI_COMM_(I)AGREE and MPI_COMM_SHRINK. Meet us at SC’15 to  learn more about the novel algorithm we designed! New algorithm to perform communicator revocation. MPI_COMM_REVOKE performs a reliable broadcast with a fixed maximum output degree, which scales logarithmically with the number of ranks. Meet us at EuroMPI’15 to learn more about the Revoke algorithm we designed! Improved support for our traditional network layer: TCP: fully tested SM: fully tested (with the exception of XPMEM, which remains unsupported) Added support for High Performance networks Open IB: reasonably tested uGNI: reasonably tested The tuned collective module is now enabled by default (reasonably tested), expect a huge performance boost compared to the former basic default setting Back-ported PBS/ALPS fixes from Open MPI Continue reading ULFM 1.0 Announced