Hello, everybody. My name is Thomas Gruber. Today I'm giving a talk with my colleague
Georg Hager. We both work here for the NHR at FAU. I'm originally a software developer,
but I'm using SLUR from time to time. So, and I'm digging into the more complex stuff
when it comes to SLUR. So today we are talking about best practices and some advanced usage
if you need some, have some special workflow or special case. Okay, we start with the SLUR basics.
Unfortunately, I cannot remove that on top. I don't know. I should just ignore the line.
So, of course, a good starting point is always to point to our documentation. We try to put a lot
of work into the documentation so that it's up to date and you find all the information that you need
in the documentation. We have some general one, which is basically doc.nhr.fau.de. There's, of
course, a SLURM specific page below it. And we have a cluster specific page for each of the clusters.
There's a page which explains a little bit what is the specialty when it comes to SLURM on this
machine. Then, of course, the official SLURM documentation. So there's a main page or the
documentation page for each of the SLURM commands. So sthrown, sbatch, sq and how they are all called.
Most of the information there is much better or more condensed than our web pages. But, of course,
they have to cover all the different versions and so on. For people coming from other computing
centers that use like PBS or LSF or the other batches schedulers, there's a Rosetta PDF,
so the second one on the lower one, which basically tells you how the commands can be translated from
like PBS to SLURM. So it makes it easy to change computing centers. And there are, of course,
whole tutorials about SLURM. So if you have a specific workflow, it might be documented there
and they describe how they think you should do it. Okay. Let me start with some terminology,
just that we are all on the same page. So a job is an allocation of resources to a user,
and you can use it to run any compute for a specific amount of time. A partition is a set of
nodes. In most systems, you would think it's a cluster, but a partition is just a part of the
cluster. And these machines all have a special property, like same hardware architecture or
same GPU type or so on. There can be constraints put on each partition when it comes to job size,
maximum job run time, who is allowed to use it and stuff like that. So in our case, we mostly call it
queues in our documentation and so on. We call it queues because that's what's in the end happening
there. A task is how many instances of your job are executed. So if you have an arranger, for example,
you have multiple tasks below one job. And in a task, in SLURM, task is completely different, sorry.
A task in SLURM is like an MPI process. In the normal MPI word, you call it the MPI processes.
In the SLURM word, that's a task. So you start multiple tasks for one MPI application. Then a
job step is like if you call multiple times srun or multiple times MPI exit or something like that,
each of these will be counted as one step. So one step in your job where you use the resources that
are allocated for you. QoS or quality of service, it's a limit on a per group basis about the
quality of service you get. We could say this group only gets the good nodes or the bad nodes,
what we don't have. But this would be one of those things, what quality of service means, or
only the nodes which have a CPU frequency above 3 GHz, for example, because we don't want to be fast.
And then SLURM selects the right nodes which fulfill this quality of service agreement and
schedule your job there. GRES is a generic resource. In our cases, we just have one
generic resource, and that's the GPUs. There are other centers which provide other
generic resources as well. So like an FPGA, for example, can be seen as a
general resource and stuff like that. And the CPU, so SLURM uses the term CPU, which is very ambitious
and ambiguous nowadays because the CPU has different meaning depending who you ask. In SLURM,
a CPU is equivalent to a hyperthread, so the lowest processing unit that a chip has. In most cases,
it's like a hyperthread if you have SMT enabled. In our center, most systems do not have SMT,
so it's the same thing like a CPU core. I call them hardware threads because that's more
specific than a CPU. A CPU is the whole thing you put in a PC with multiple hardware threads
and so on. So yeah, it's a different terminology when it comes to SLURM.
Okay, so let's go and look at the first batch script we have. One of the main things you
always should do in your batch script is start with bin bash minus L, so then you get a login shell.
Presenters
Zugänglich über
Offener Zugang
Dauer
00:57:28 Min
Aufnahmedatum
2025-01-07
Hochgeladen am
2025-01-07 16:06:04
Sprache
en-US
Speaker: Thomas Gruber, Dr. Georg Hager, NHR@FAU
Date: December 17, 2024
Slides: 2024-12-17-HPC-Cafe-Slurm.pdf
Abstract: The presentation will give an overview of how to best use Slurm to run batch jobs on the NHR clusters. Special topics such as data staging, multi-GPU jobs, chain jobs, dependencies, proper distribution of work, etc., will receive extra attention. This talk should be interesting to everyone who wants to get most out of Slurm or who needs to optimize their batch workflow.
More about past HPC Café events at: https://hpc.fau.de/teaching/hpc-cafe/