SC’16 Tutorial

The ULFM team is happy to announce that we will be teaching a day-long tutorial on fault tolerance at SC’16 (somewhat similar to last year tutorial). The tutorial will cover multiple theoretical and practical aspects of dealing with faults. It targets a wide scientific community, starting from scientists trying to understand the challenges of different types of failures, and up to advanced users with prior experience with fault-related topics that want to get a more precise understanding of the available tools allowing them to efficiently deal with faults. The tutorial was divided in two parts, one theoretical (covering the different existing approaches and their modeling), and one practical. The slides for the 2 parts are available (theory and practice), as well as the handon examples. Unlike the previous years, we have embraced new technologies to facilitate the public interaction with ULFM: enjoy the ULFM docker. More information about the tutorial can be found here. Enjoy our promotional video 😉 See you all in Salt Lake City, UT !!!

SC’15 tutorial

The ULFM team is happy to announce that we will be teaching a day-long tutorial on fault tolerance at SC’15 (somewhat similar to last year tutorial). The tutorial will cover multiple theoretical and practical aspects of dealing with faults. It targets a wide scientific community, starting from scientists trying to understand the challenges of different types of failures, and up to advanced users with prior experience with fault-related topics that want to get a more precise understanding of the available tools allowing them to efficiently deal with faults. Get the slides part1, part2, and the examples More information about the tutorial can be found here. Enjoy our promotional video 😉 See you all in Austin, TX !!!

Preparing for June MPI Forum meeting

In preparation for the June MPI forum meeting, the specification has received some updates. The most prominent changes are: The exposed memory in an RMA window may be completely undefined after a failure has occured. MPI_Comm_agree now operates a binary AND on the flag argument. Examples have been corrected to use error classes, instead of error codes, when relevant. The latest version is available in the ULFM specification area