The current static usage model of HPC systems is becoming increasingly inefficient. This is driven by the continuously growing complexity and heterogeneity of system architectures, in combination with the increased usage of coupled applications, the need for strong scaling with extreme scale parallelism, and the increasing reliance on complex and dynamic workflows. As a consequence, we see a rise in research on malleable systems, middleware software and applications, which can adjust resource usage dynamically in order to extract a maximum of efficiency.
Malleability allows systems to dynamically adjust the computation and storage needs of applications, on the one side, and the global system on the other. Such malleable systems, however, face a series of fundamental research challenges, including: who initiates changes in resource availability or usage? How is it communicated? How to compute the optimal usage? How can applications cope with dynamically changing resources? What should malleable programming models and abstractions look like? How to design resource management frameworks for malleable systems? What should be the API for applications?
This tutorial will provide a presentation of techniques to achieve malleability in high-performance computing, high-level parallel programming models and programmability techniques to improve applications malleability. The main part of the tutorial will be devoted to show and demonstrate FlexMPI, a framework for HPC malleability, and Limitless, a HPC monitoring system to get information from applications and systems and the usage of AI and ML techniques to steer malleability in systems and applications. Finally, we will show how to apply the solutions presented to two use cases: Wacom++ and Nek5000
- MPI and C,C++ Knowledge
- HPC users
- Programmers of HPC applications
- HPC system administrators
- Researchers on HPC optimization
- Students interested in parallel and distributed programming
Conditions for accessing the tutorial
- Laptop with Linux installed and containers support
- ADMIRE project. https://admire-eurohpc.eu/
- Webinar on FlexMPI. https://admire-eurohpc.eu/dissemination/webinars/flexmpi/