CARLA 2024

Marta Mattoso

University: University of Rio de Janeiro 

Country: Brazil

Trusting data science workflows in HPC

Trusting outcomes from scientific workflows is related to the reproducibility and traceability of the execution.  Despite the progress in containers to support reproducibility for workflows, and advances in provenance data capture for traceability, executing in HPC is  challenging.  The large-scale workflows have distributed “isolated” executions of components that are part of a larger data science workflow
presenting limitations to  integrating provenance data and composing container images. When workflows adopt scientific machine learning models as part of their execution, one more level of complexity is added to the software stack and the provenance data derivation. This talk presents limitations and discusses current approaches to composing containers and capturing provenance data towards trusting data science workflows in HPC.

Bio

Marta Mattoso is a Full Professor at COPPE-Federal University of Rio de Janeiro. Her topics of interest in Data Science include aspects of large-scale data management. Among her interests is provenance data to support human analytics during the parallel execution of many computing tasks in high-performance environments. She has supervised more than 90 graduate students. She was the 2005 Honored Researcher in the Brazilian Database Conference. She is a CNPq level 1B research productivity fellow, and a Rio de Janeiro State Scientist fellow. She coordinates research projects funded by CNPq, CAPES, Faperj, and collaboration projects with INRIA, France, since 2001. She has served as a Mercator Fellow at DFG, Germany, in the Fonda project [2020-2023]. She is a member of the body of experts on the WorkflowsRI project in the USA. She is a founding member of the Brazilian Computing Society.