Following the success of the first joint tutorial with the VeloC team, we decided to follow-up with a second incarnation of this mixed tutorial at EuroMPI’18. Bogdan Nicolae, Franck Capello and George Bosilca will present this tutorial titled Resilience in parallel applications. The tutorial will two complementary fault management techniques to empower application developers to deal with various types of failures directly at application-level, increasing the opportunities to reduce the resilience overhead with a holistic support from all layers: hardware and software as well as from the parallel programming paradigm. The tutorial highlights application-driven solutions to survive faults and provide a basic understanding of their expected costs at scale. The presented solutions cover two complementary approaches:
- application-defined checkpoint-restart (as demonstrated through the VeloC runtime); and
- user-level failure mitigation (as demonstrated through ULFM extension to the MPI standard).
The tutorial will use the following decks of slides: Introduction, VeloC and ULFM as well as a set of examples for VeloC and ULFM. For the hands-on the participants are expected to bring their own laptop, running either Windows, Linux or Mac OS X with Docker installed.
Using the Docker Image
- Install Docker
- Docker can be seen as a “lightweight” virtual machine, a perfect way to quickly setup a tutorial execution environment. You will need basic knowledge about Docker that is available either from the documentation or a cheat sheet
- Docker is available for a wide range of systems (MacOS, Windows, Linux).
- You can install Docker quickly, either by downloading one of the official builds for MacOS or Windows, or by installing Docker from your Linux package manager (e.g.
yum install docker
,apt-get docker-io
,port install docker-io
, etc.)
- Validate your Docker installation by running into a terminal
docker run hello-world
- Load the pre-compiled ULFM Docker machine into your Docker installation
docker pull bnicolae/veloc-tutorial
which contains all libraries (ULFM and VeloC) needed to complete the tutorial
- Source the docker aliases in a terminal using
source dockervars.sh
or on windowscall dockervars.bat
(both shells files are in the example tarball). These aliases will redirect the “make”, “mpicc”, “mpif90”, “mpiexec” and “mpirun” command to execute in the Docker machine instead on the local environment (pretty nifty). Beware: the aliases should be loaded on each new shell where you want to play with the Docker. - Get the tutorial hands-on, and untar the downloaded archive (Linux & Mac OSX
tar -zxvf eurompi18-handson.tgz
) and then go to the tutorial hands-on directory (cd eurompi18
). Before going further, make sure the Docker aliases are correctly loaded (alias
), or you will neither be able to compile nor run the examples. You can now typemake
to compile the examples, and you can execute the generated examples in the Docker machine usingmpirun -np 4 *example*
.