ROOM 2. Malleability Techniques and in HPC

Tutorial. Malleability Techniques and in HPC

ROOM 2.

Monday September 18, 2023

08:00 - 12:00 hours

Instructor

Prof. Dr. Jesús Carretero, Computer Science and Engineering Department. Universidad Carlos III de Madrid. Spain

Program:

1.- System and system architecture considerations in designing malleable architectures.

2.- Emerging software designs to achieve malleability in high-performance computing.

3.- High-level parallel programming models and programmability techniques to improve applications malleability.

4.- FlexMPI framework for HPC malleability.

5.- Limitless: Getting information from applications and systems.

6.- Use of AI and ML techniques to steer malleability in systems and applications.

7.- Experiences and use cases applying malleability to HPC applications: Wacom++ and Nek5000

Chair(s):
Information

The current static usage model of HPC systems is becoming increasingly inefficient. This is driven by the continuously growing complexity and heterogeneity of system architectures, in combination with the increased usage of coupled applications, the need for strong scaling with extreme scale parallelism, and the increasing reliance on complex and dynamic workflows. As a consequence, we see a rise in research on malleable systems, middleware software and applications, which can adjust resource usage dynamically in order to extract a maximum of efficiency.

Malleability allows systems to dynamically adjust the computation and storage needs of applications, on the one side, and the global system on the other. Such malleable systems, however, face a series of fundamental research challenges, including: who initiates changes in resource availability or usage? How is it communicated? How to compute the optimal usage? How can applications cope with dynamically changing resources? What should malleable programming models and abstractions look like? How to design resource management frameworks for malleable systems? What should be the API for applications?

This tutorial will provide a presentation of techniques to achieve malleability in high-performance computing, high-level parallel programming models and programmability techniques to improve applications malleability. The main part of the tutorial will be devoted to show and demonstrate FlexMPI, a framework for HPC malleability, and Limitless, a HPC monitoring system to get information from applications and systems and the usage of AI and ML techniques to steer malleability in systems and applications. Finally, we will show how to apply the solutions presented to two use cases: Wacom++ and Nek5000

Student's prerequisites

  • MPI and C,C++ Knowledge
  • HPC users

Audience

  • Programmers of HPC applications
  • HPC system administrators
  • Researchers on HPC optimization
  • Students interested in parallel and distributed programming

Conditions for accessing the tutorial

  • Laptop with Linux installed and containers support

References

 

 

Instructor(s):
Jesus Carretero