58 - HPC Cafe on October 24, 2023: GPU Top Trumps! [ID:50313]
50 von 347 angezeigt

Today's topic is GPU top trumps. That's what I've been reminded of. In German it's called

Supertrump or Quartett. Where you have these card top trumps where you say my card accelerates

faster and then another person says yeah but mine is heavier or whatever. And then you need to pick

a category and then if your card wins you get both cards and that way you try to trade cards.

I've mocked this up as a this game doesn't exist. It's only in my imagination. I only have those

three cards even though I pretend with the serial numbering that there are more cards there aren't.

But I've mocked it up a little bit just for fun what the categories could be. And actually I think

at least for these three cards the game would actually be kind of balanced because for every

category you would be able to beat the other cards at least somewhat. For example with the

with the A100 you could beat everyone in the bandwidth category.

But then again on the other hand you could for example in the what is it

does the Intel CPU lose in everything?

Oh that looks bad. Yeah I think it loses in everything to the AMD CPU. Okay too bad. There's

always a card that sucks. I guess that's how it is. Okay but we're going to pan the side.

These are some of the categories. I have in here the DRAM capacity, clock frequencies,

bandwidth and then a bunch of loading points, execution rates. First of all and yeah and these

differ for all of the GPUs. All of the GPUs have a somewhat similar structure. That's what the

GPUs kind of converge to. This is an Nvidia marketing graph. We have like this big image

in the background. You know this thing around here which kind of should show that it's a schematic

of the whole chip. And then you see a lot of these repetitive structures, these subunits called SMs

or streaming multiprocessors in Nvidia speech. All centered around like this large level two cache

here in the middle and then it's probably too small to see properly. There's here on the

around the edges there are these HBM interfaces, the memory chip interfaces to the memory chip

to move data from it to memory. And the breakdown of one of these SMs shows a bunch of execution

units and here the number of execution units that each of the vendors put into their each of these

units differs a little bit. They all have very similar numbers of or the numbers of these SMs

don't differ too much but the number of execution units they put in they differ somewhat.

Not too different. This is another marketing thing this time from AMD just to show the balance.

Yeah you see again lots of repetitive structures and if you zoom in on these then that's definitely

too small for you to see there's a bunch of units that do stuff. What do these units do and why are

there different units at all? Because we need to differentiate a bunch of things and this depends

very much on the applications. The thing here is in floating point number formats that there is

different kinds of floating point operations depending on what precision they operate.

The big thing that GPUs traditionally do very well is the top one single precision 32-bit

single precision or FP32. Computer graphics usually just use FP32 after a while they standardize

standardized on that because that works well and this is what GPUs traditionally do

very well. For the traditional HPC market single precision is not quite enough. People want

either because they're too lazy to think about whether less precision also works but they

just standardize on double precision. This is the format on the bottom and I have these bars a bit

of a kind of a representation of the size of the how many bits I can allocated for each one of the

Matissa exponent and then one sign bit. You can see double precision well yeah it's double as wide

as single precision makes sense I guess. So most HPC applications double precision.

For HPC applications there are a bunch of HPC applications that can make use that use

single precision and go well. Among them I think for us it's mostly MD. MD is fine with single

precision I guess because it's so chaotic that pretty much everything diverges very quickly

anyway and then it's only statistics so I guess the full precision doesn't make much sense anyway.

I think lattice Boltzmann also works somewhat well with reduced precision and like I've said

computer graphics use them a lot and where in recent years all these precision things have

really come to place all these even smaller precision formats. The oldest one of them is FP16

or half precision and then there is formats like TF32 as Nvidia calls it which is I think

Teil einer Videoserie :
Teil eines Kapitels:
HPC Café

Zugänglich über

Offener Zugang

Dauer

00:39:22 Min

Aufnahmedatum

2023-10-24

Hochgeladen am

2023-10-25 17:56:04

Sprache

en-US

What’s so special about those NVIDIA H100 GPUs that are (almost) as valuable as gold? This month’s HPC Cafe shines a light on the capabilities and differences of all the different GPUs currently sold for HPC. We try to answer questions like: How does the older A100 GPU, which the NHR@FAU has in spades, hold up to its newer sibling? Why does Alex also have another type of GPU, the A40? What’s the deal with the GPUs by “the other vendor,” which are powering the fastest (documented) supercomputer in the world? Can the new competitor, Intel, build GPUs that power supercomputers instead of just notebooks? What is an  “APU” and why does everyone want to build one?

Speaker: Dominik Ernst, NHR@FAU

Slides: https://hpc.fau.de/files/2023/10/toptrumps.pdf

Material from past events is available at: https://hpc.fau.de/teaching/hpc-cafe/

 

Einbetten
Wordpress FAU Plugin
iFrame
Teilen