Try the Docker packaged ULFM fault tolerant MPI

To support the SC’16 Tutorial, we have designed a self contained Docker image. This packaged docker image contains everything you need to compile, and run the tutorial examples, in a contained sandbox. Docker can be seen as a lightweight virtual machine, running its own copy of an operating system, but without the heavy requirement of a full-blown hypervisor. We use this technology to package a very small Linux distribution containing gcc, mpicc, and mpirun, as needed to compile and run natively your fault tolerant MPI examples on your host Linux, Mac or Windows desktop, without the effort of compiling a production version of ULFM Open MPI on your own. Content: 1. A Docker Image with a precompiled version of ULFM Open MPI 1.1. 2. The tutorial hands-on example. 3. Various tests and benchmarks for resilient operations. 4. The sources for the ULFM Open MPI branch release 1.1. Using the Docker Image 1. Install Docker You can install Docker quickly, either by downloading one of the official builds from http://docker.io for MacOS and Windows, or by installing Docker from your Linux or MAcOS package manager (i.e. yum install docker, apt-get docker-io, brew/port install docker-io). Please refer to the Docker installation Continue reading Try the Docker packaged ULFM fault tolerant MPI

ULFM Specification update

A new version of the ULFM specification accounting for remarks and discussions going on at the MPI Forum Meetings in 2016 has been posted under the ULFM Specification item. This new update has very few semantic changes. It clarifies the failure behavior of MPI_COMM_SPAWN, and corrects the output values of error codes and status objects returned from functions completing in error. Head to ULFM Specification for more info.

SC’16 Tutorial

The ULFM team is happy to announce that we will be teaching a day-long tutorial on fault tolerance at SC’16 (somewhat similar to last year tutorial). The tutorial will cover multiple theoretical and practical aspects of dealing with faults. It targets a wide scientific community, starting from scientists trying to understand the challenges of different types of failures, and up to advanced users with prior experience with fault-related topics that want to get a more precise understanding of the available tools allowing them to efficiently deal with faults. The tutorial was divided in two parts, one theoretical (covering the different existing approaches and their modeling), and one practical. The slides for the 2 parts are available (theory and practice), as well as the handon examples. Unlike the previous years, we have embraced new technologies to facilitate the public interaction with ULFM: enjoy the ULFM docker. More information about the tutorial can be found here. Enjoy our promotional video 😉 See you all in Salt Lake City, UT !!!