ULFM Specification

The specification for a Process Fault Tolerance chapter in the MPI Standard, titled User Level Failure Mitigation is a minimal set of changes necessary for applications and libraries to include fault tolerance techniques and to construct more forms of fault tolerance (transactions, strongly consistent collectives, etc.)

This specification document is based on the upcoming MPI 4.1 standard. This is the version currently under evaluation by the MPI standardization body where you can follow discussions and updates.

Older drafts of the specification:

Fault Tolerance Chapter(updated Feb. 2017)

Fault Tolerance Chapter (updated Jan. 2015)

Fault Tolerance Chapter (updated May. 18, 2014)

Fault Tolerance Chapter (updated Jan. 4, 2014)

ulfm-mpi31 (Update on December 10th 2013)

ulfm-mpi31 (Updated on August 16th 2013)

ulfm.pdf (Updated July 30, 2012)