91 - NHR Perflab Seminar 2025-06-24: From Data Traces to System Insight—Redefining I/O Understanding in HPC [ID:58284]

50 von 595 angezeigt

Thank you for the great introduction and also thank you for inviting me.

It's really exciting to see that there are more people than I thought would join being

interested in I.O.

As Georg already mentioned, I have a very diverse background today when it comes to

But I started out in network and I.O. research originally.

So this is still one of the things that is very close to my heart.

And today I'm trying to capture activities that my group and people I have been collaborating

with over the last couple of years have been working on to provide better insight, understanding

about I.O. and HPC systems.

This may not come to you as a surprise, especially if you're interested in HPC, but our systems

have become very, very complex from a standpoint of hardware infrastructure, but also software

infrastructure.

We see very heterogeneous system architectures, a very diverse mix of components, including

storage components.

And at the same time, we have seen a rise in new computational paradigms.

We see new parameters that become important when it comes to system tuning.

And at this point, we have come to a stage where, especially for humans, optimally and

fully understanding the full picture is almost impossible.

Still, specifically for I.O., we do need experts to tweak and look into applications.

And what my group is really targeting is trying to help with improving the accessibility of

insights and knowledge about how your application is behaving and how a system is behaving.

So it all comes down to we're seeing more and more heterogeneous and complex infrastructures.

At the same time, we've seen a shift in the way workloads are being deployed on HPC systems.

Specifically if I look back into my own career, when I started as a PhD student working in

I.O. topic, that was like in 2014, 2015, it was mostly dominated by what we call today

traditional HPC, meaning we had very bulk synchronous I.O. phases, oftentimes write

or read phases, mostly targeting checkpoint and restart files or simulation workloads.

And that was something, even though it was already pretty complex to optimize and understand,

it was somewhat well understood.

And our tools were mostly in place.

But especially if you're attending different conferences, you've seen that we see a much

more diverse mix of HPC workloads today, specifically what we call emerging HPC workloads.

And there you have anything starting with machine learning, big data analytics, multi-step

workflows up to deep learning and artificial intelligence being deployed and run on HPC

systems today.

To give you a better understanding of how popular machine learning has become in recent

years, this is a study that has been performed on the Summit supercomputer in Oak Ridge.

And it was published three years ago.

And they looked into different science domains and what kind of methodologies they're using

to accelerate their jobs.

And the interesting part was that almost all of the jobs that use some form of machine

learning to accelerate their code.

And at the same time, the breakdown was basically any science domain being run on the system

was using machine learning at that point, meaning that the optimizations that had been

performed up to the point that Summit had been deployed became not ideal anymore for

the picture, basically the big picture of workload mix that you see today.

And that's where we start with the main motivation for the work that I'm really interested in.

You see very different I.O. today and the performance characteristics of all of them

are very different, meaning that today HPC I.O. is more than just checkpointing and bugs

and cronies per I.O.

Teil einer Videoserie :

HPC4FAU / NHR@FAU

Teil eines Kapitels:

NHR@FAU PerfLab Seminar

Presenters

Dr. Georg Hager

Zugänglich über

Offener Zugang

Dauer

00:52:56 Min

Aufnahmedatum

2025-06-24

Hochgeladen am

2025-07-22 15:52:29

Sprache

en-US

Speaker: Prof. Dr. Sarah Neuwirth, Full Professor of Computer Science, Johannes Gutenberg University Mainz

Title: From Data Traces to System Insight—Redefining I/O Understanding in HPC

Slides: https://hpc.fau.de/files/2025/07/2025-NHR-PerfLab-Talk-Neuwirth.pdf

Abstract:

As high-performance computing systems evolve to accommodate increasingly heterogeneous and data-intensive workloads, I/O behavior is emerging as a critical determinant of system efficiency, reliability, and scalability. This talk explores how rethinking I/O as a source of insight, rather than simply a performance constraint, opens new pathways for analysis, optimization, and design. I will discuss the growing complexity of I/O patterns across modern workflows, the challenges of achieving observability across deeply layered I/O stacks, and the importance of trace-driven methodologies for ensuring reproducibility and explainability. We will examine how these factors influence performance variability, system design trade-offs, and future benchmarking practices, particularly in the context of shifting architectural paradigms, emerging consistency models, and the increasing role of adaptive, multi-tiered storage. The goal of this talk is to offer a perspective on I/O that is both system-aware and forward-looking, highlighting its central role in shaping the next generation of sustainable and transparent HPC infrastructures.

Short Bio:

Sarah Neuwirth is a Full Professor of Computer Science and Chair of the “High Performance Computing and its Applications” research group at Johannes Gutenberg University Mainz.A black-and white portrait of speaker Sarah Neuwirth. She also serves as the co-director of the NHR South-West HPC Center. In 2018, Sarah completed her PhD in computer science at Heidelberg University. Her research interests include parallel I/O and storage systems, modular supercomputing, performance modeling and analysis, optimization, reproducible benchmarking, and parallel programming models. For her outstanding contributions to HPC, Sarah was awarded the “2023 PRACE Ada Lovelace Award for HPC” and the “ZONTA Science Award 2019.” She has participated in numerous research collaborations as co-PI, including working with Jülich Supercomputing Centre (DEEP Project Series, EUPEX), LLNL, BITS Pilani Goa Campus, ORNL, and Virginia Tech. Sarah has held various leadership positions in HPC conferences and workshops over the past years. For a list of past and upcoming NHR PerfLab seminar events, see: https://hpc.fau.de/teaching/hpc-cafe/

Einbetten

Wordpress FAU Plugin

 https://www.fau.tv/clip/id/58284

iFrame

<iframe src="https://api.video.uni-erlangen.de/services/oembed/?url=https://www.fau.tv/clip/id/58284&format=iframe&maxwidth=1280&maxheight=720" width="1280" height="720"seamless allowfullscreen style="border: 0; padding: 0; margin: 0;overflow: hidden;"></iframe>

Herunterladen

Video

Audio

Per RSS abonnieren