Thank you for the great introduction and also thank you for inviting me.
It's really exciting to see that there are more people than I thought would join being
interested in I.O.
As Georg already mentioned, I have a very diverse background today when it comes to
But I started out in network and I.O. research originally.
So this is still one of the things that is very close to my heart.
And today I'm trying to capture activities that my group and people I have been collaborating
with over the last couple of years have been working on to provide better insight, understanding
about I.O. and HPC systems.
This may not come to you as a surprise, especially if you're interested in HPC, but our systems
have become very, very complex from a standpoint of hardware infrastructure, but also software
infrastructure.
We see very heterogeneous system architectures, a very diverse mix of components, including
storage components.
And at the same time, we have seen a rise in new computational paradigms.
We see new parameters that become important when it comes to system tuning.
And at this point, we have come to a stage where, especially for humans, optimally and
fully understanding the full picture is almost impossible.
Still, specifically for I.O., we do need experts to tweak and look into applications.
And what my group is really targeting is trying to help with improving the accessibility of
insights and knowledge about how your application is behaving and how a system is behaving.
So it all comes down to we're seeing more and more heterogeneous and complex infrastructures.
At the same time, we've seen a shift in the way workloads are being deployed on HPC systems.
Specifically if I look back into my own career, when I started as a PhD student working in
I.O. topic, that was like in 2014, 2015, it was mostly dominated by what we call today
traditional HPC, meaning we had very bulk synchronous I.O. phases, oftentimes write
or read phases, mostly targeting checkpoint and restart files or simulation workloads.
And that was something, even though it was already pretty complex to optimize and understand,
it was somewhat well understood.
And our tools were mostly in place.
But especially if you're attending different conferences, you've seen that we see a much
more diverse mix of HPC workloads today, specifically what we call emerging HPC workloads.
And there you have anything starting with machine learning, big data analytics, multi-step
workflows up to deep learning and artificial intelligence being deployed and run on HPC
systems today.
To give you a better understanding of how popular machine learning has become in recent
years, this is a study that has been performed on the Summit supercomputer in Oak Ridge.
And it was published three years ago.
And they looked into different science domains and what kind of methodologies they're using
to accelerate their jobs.
And the interesting part was that almost all of the jobs that use some form of machine
learning to accelerate their code.
And at the same time, the breakdown was basically any science domain being run on the system
was using machine learning at that point, meaning that the optimizations that had been
performed up to the point that Summit had been deployed became not ideal anymore for
the picture, basically the big picture of workload mix that you see today.
And that's where we start with the main motivation for the work that I'm really interested in.
You see very different I.O. today and the performance characteristics of all of them
are very different, meaning that today HPC I.O. is more than just checkpointing and bugs
and cronies per I.O.
Presenters
Zugänglich über
Offener Zugang
Dauer
00:52:56 Min
Aufnahmedatum
2025-06-24
Hochgeladen am
2025-07-22 15:52:29
Sprache
en-US