Title: Studying Degree and Sources of Non-Determinism in MPI Applications via Graph Kernels
Abstract: As the scientific community prepares to deploy an
increasingly complex and diverse set of applications on exascale platforms, the need to assess
reproducibility of simulations and identify the root causes of reproducibility failures increases
correspondingly. One of the greatest challenges facing reproducibility issues at exascale is the
inherent non-determinism at the level of inter-process communication. The use of non-deterministic
communication constructs is necessary to boost performance, but communication non-determinism can
also hamper software correctness and result reproducibility.
In this talk we propose a software framework for identifying the percentage and sources of
communication non-determinism. We model parallel executions as directed graphs and leverage
graph kernels to characterize run-to-run variations in inter-process communication. We demonstrate
the effectiveness of graph kernel similarity as a proxy for non-determinism, by showing that these
kernels can quantify the type and degree of non-determinism present in communication patterns. To
demonstrate our framework's ability to link and quantify runtime non-determinism to root sources,
we show results for an adaptive mesh refinement application, where our framework automatically
quantifies the impact of function calls on non-determinism, and a Monte Carlo application, where
our framework automatically quantifies the impact of parameter configurations on non-determinism.
Bio: Michela Taufer is an ACM Distinguished Scientist and holds the Jack Dongarra Professorship
in High Performance Computing in the Department of Electrical Engineering and Computer Science at
the University of Tennessee Knoxville (UTK). She earned her undergraduate degrees in Computer
Engineering from the University of Padova (Italy) and her doctoral degree in Computer Science
from the Swiss Federal Institute of Technology or ETH (Switzerland). From 2003 to 2004 she was a
La Jolla Interfaces in Science Training Program (LJIS) Postdoctoral Fellow at the University of
California San Diego (UCSD) and The Scripps Research Institute (TSRI), where she worked on
interdisciplinary projects in computer systems and computational chemistry.
Michela has a long history of interdisciplinary work with scientists. Her research interests
include scientific applications on heterogeneous platforms (i.e., multi-core platforms and
accelerators); performance analysis, modeling and optimization; Artificial Intelligence (AI)
for cyberinfrastructures (CI); AI integration into scientific workflows, computer simulations,
and data analytics. She has been serving as the principal investigator of several NSF
collaborative projects. She also has significant experience in mentoring a diverse population
of students on interdisciplinary research. Michela's training expertise includes efforts to
spread high-performance computing participation in undergraduate education and research as
well as efforts to increase the interest and participation of diverse populations in
interdisciplinary studies.