Today's topic is GPU top trumps. That's what I've been reminded of. In German it's called
Supertrump or Quartett. Where you have these card top trumps where you say my card accelerates
faster and then another person says yeah but mine is heavier or whatever. And then you need to pick
a category and then if your card wins you get both cards and that way you try to trade cards.
I've mocked this up as a this game doesn't exist. It's only in my imagination. I only have those
three cards even though I pretend with the serial numbering that there are more cards there aren't.
But I've mocked it up a little bit just for fun what the categories could be. And actually I think
at least for these three cards the game would actually be kind of balanced because for every
category you would be able to beat the other cards at least somewhat. For example with the
with the A100 you could beat everyone in the bandwidth category.
But then again on the other hand you could for example in the what is it
does the Intel CPU lose in everything?
Oh that looks bad. Yeah I think it loses in everything to the AMD CPU. Okay too bad. There's
always a card that sucks. I guess that's how it is. Okay but we're going to pan the side.
These are some of the categories. I have in here the DRAM capacity, clock frequencies,
bandwidth and then a bunch of loading points, execution rates. First of all and yeah and these
differ for all of the GPUs. All of the GPUs have a somewhat similar structure. That's what the
GPUs kind of converge to. This is an Nvidia marketing graph. We have like this big image
in the background. You know this thing around here which kind of should show that it's a schematic
of the whole chip. And then you see a lot of these repetitive structures, these subunits called SMs
or streaming multiprocessors in Nvidia speech. All centered around like this large level two cache
here in the middle and then it's probably too small to see properly. There's here on the
around the edges there are these HBM interfaces, the memory chip interfaces to the memory chip
to move data from it to memory. And the breakdown of one of these SMs shows a bunch of execution
units and here the number of execution units that each of the vendors put into their each of these
units differs a little bit. They all have very similar numbers of or the numbers of these SMs
don't differ too much but the number of execution units they put in they differ somewhat.
Not too different. This is another marketing thing this time from AMD just to show the balance.
Yeah you see again lots of repetitive structures and if you zoom in on these then that's definitely
too small for you to see there's a bunch of units that do stuff. What do these units do and why are
there different units at all? Because we need to differentiate a bunch of things and this depends
very much on the applications. The thing here is in floating point number formats that there is
different kinds of floating point operations depending on what precision they operate.
The big thing that GPUs traditionally do very well is the top one single precision 32-bit
single precision or FP32. Computer graphics usually just use FP32 after a while they standardize
standardized on that because that works well and this is what GPUs traditionally do
very well. For the traditional HPC market single precision is not quite enough. People want
either because they're too lazy to think about whether less precision also works but they
just standardize on double precision. This is the format on the bottom and I have these bars a bit
of a kind of a representation of the size of the how many bits I can allocated for each one of the
Matissa exponent and then one sign bit. You can see double precision well yeah it's double as wide
as single precision makes sense I guess. So most HPC applications double precision.
For HPC applications there are a bunch of HPC applications that can make use that use
single precision and go well. Among them I think for us it's mostly MD. MD is fine with single
precision I guess because it's so chaotic that pretty much everything diverges very quickly
anyway and then it's only statistics so I guess the full precision doesn't make much sense anyway.
I think lattice Boltzmann also works somewhat well with reduced precision and like I've said
computer graphics use them a lot and where in recent years all these precision things have
really come to place all these even smaller precision formats. The oldest one of them is FP16
or half precision and then there is formats like TF32 as Nvidia calls it which is I think
Presenters
Zugänglich über
Offener Zugang
Dauer
00:39:22 Min
Aufnahmedatum
2023-10-24
Hochgeladen am
2023-10-25 17:56:04
Sprache
en-US
What’s so special about those NVIDIA H100 GPUs that are (almost) as valuable as gold? This month’s HPC Cafe shines a light on the capabilities and differences of all the different GPUs currently sold for HPC. We try to answer questions like: How does the older A100 GPU, which the NHR@FAU has in spades, hold up to its newer sibling? Why does Alex also have another type of GPU, the A40? What’s the deal with the GPUs by “the other vendor,” which are powering the fastest (documented) supercomputer in the world? Can the new competitor, Intel, build GPUs that power supercomputers instead of just notebooks? What is an “APU” and why does everyone want to build one?
Speaker: Dominik Ernst, NHR@FAU
Slides: https://hpc.fau.de/files/2023/10/toptrumps.pdf
Material from past events is available at: https://hpc.fau.de/teaching/hpc-cafe/