ULFM has reached the 1.1 milestone, a minor release, crushing few bugs identified by our users and developers.
Focus has been toward improving stability, feature coverage for intercommunicators, and following the updated specification for MPI_ERR_PROC_FAILED_PENDING.
- Addition of the MPI_ERR_PROC_FAILED_PENDING error code, as per newer specification revision. Properly returned from point-to-point, non-blocking ANY_SOURCE operations.
- Alias MPI_ERR_PROC_FAILED, MPI_ERR_PROC_FAILED_PENDING and MPI_ERR_REVOKED to the corresponding standard blessed – extension- names MPIX_ERR_xxx.
- Support for Intercommunicators:
- Support for the blocking version of the agreement, MPI_COMM_AGREE on Intercommunicators.
- MPI_COMM_REVOKE tested on intercommunicators.
- Disabled completely (.ompi_ignore) many untested components
- Changed the default ORTE failure notification propagation aggregation delay from 1s to 25ms.
- Added an OMPI internal failure propagator; failure propagation between SM domains is now immediate.
- Bugfixes:
- SendRecv would not always report MPI_ERR_PROC_FAILED correctly.
- SendRecv could incorrectly update the status with errors pertaining to the Send portion of the Sendrecv.
- Revoked send operations are now always completed or remote cancelled and may not deadlock anymore.
- Cancelled send operations to a dead peer will not trigger an assert when the BTL reports that same failure.
- Repeat calls to operations returning MPI_ERR_PROC_FAILED will eventually return MPI_ERR_REVOKED when another process revokes the communicator.
Get the source and happy hacking,
The ULFM team