Please use any of the following publication to reference ULFM
- Bland, W., Bouteiller, A., Herault, T., Bosilca, G., Dongarra, J.J. “Post-failure recovery of MPI communication capability: Design and rationale,” International Journal of High Performance Computing Applications August 2013 27: 244-254, doi:10.1177/109434201348823
This is a list of publications related either to the User Level Failure Mitigation proposal or to different types of use-case scenarios.
- Losada, Nuria, Bouteiller, Aurelien and Bosilca, George “Asynchronous Receiver-Driven Replay for Local Rollback of MPI Applications“, Fault Tolerance for HPC at eXtreme Scale (FTXS) Workshop at The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’19)
- Losada, Nuria, Bosilca, George, Bouteiller, Aurelien, Gonzalez, Patricia and Martin, Maria J “Local Rollback for Resilient MPI Applications with Application-Level Checkpointing and Message Logging“, Future Generation Computer Systems, Vol. 91, pages 450-464, 2018
- Bland, W., Bouteiller, A., Herault, T., Hursey, J., Bosilca, G., Dongarra, J.J. “An Evaluation of User-Level Failure Mitigation Support in MPI,“ Computing, Springer, 2013, issn 0010-4885X, http://dx.doi.org/10.1007/s00607-013-0331-3
- Bland, W., Bouteiller, A., Herault, T., Hursey, J., Bosilca, G., Dongarra, J. “An Evaluation of User-Level Failure Mitigation Support in MPI,” Proceedings of Recent Advances in Message Passing Interface – 19th European MPI Users’ Group Meeting, EuroMPI 2012, Springer, Vienna, Austria, September 23 – 26, 2012.
- Bland, W., Bosilca, G., Bouteiller, A., Herault, T., Dongarra, J. “A Proposal for User-Level Failure Mitigation in the MPI-3 Standard,” University of Tennessee Electrical Engineering and Computer Science Technical Report, ut-cs-12-693, February 24, 2012.