53 - HPC Cafe: Computer Architecture 101 [ID:48566]

50 von 465 angezeigt

Welcome to today's HPC Cafe. It's nice to meet so many people here in person. It's exceptional

and I'm really happy about it. So thanks a lot for coming here. There are a lot of people

in the Zoom. So whenever you have a question put it in the Zoom chat, just speak up, raise

your hand, whatever. So let's keep this interesting. My goal today is to topic computer architecture

101 for scientists because we know that sometimes people who use our parallel computers or

any parallel computer are sometimes challenged or intrigued by the complexity of the hardware

they're using and sometimes they don't even know what to do. They're just using the scripts

and setups they inherited from their goal workers without actually thinking what's

going on and that leads to some problems, some of which I will detail. So I think that it's

good to tell you a little bit about how our computer works. Our super computer works and

what the basic documents are that you're up against when you're running parallel code

and a super computer without going too much into detail. Of course, I could talk about this

for a week or so without pause. Bear with me. I think I can keep it below 45 minutes. All right.

So that's my plan. I can tell you something about hardware architecture, parallel programs,

how they map to the hardware a little bit about performance, what it is, how we can assess

it and judge it, limitations of parallelism. So how parallel program B and some best practices

that give you some kind of guideline, what to do, how to assess the performance of your

code and the scalability of the code. So let's start simple. This slide shows the anatomy

of a CPU compute cluster and that's a very simple level. We're going to be going to

more detail later. But the purpose of a parallel computer, any computer that matter is to

execute code, execute instructions. And for you as a scientist, during the merits,

even more specific, the purpose of a computer is to do arithmetic, usually floating on

Earth. And that happens in the so called execution units of a compute core. Each processor

today has a couple of compute cores, maybe two, if you have a Wimpy tablet PC or maybe 36,

if you are on a Fritz compute node, but it's always a couple of cores, each core can execute

a program, maybe two, if hypergolim is enabled, but that's not true into these details.

And this is the stuff that happens to make your program work to solve your problem. This

happens in the so called execution units. And these execution units, they do the modifications,

divisions, divisions, whatever that solve your numerical problem. And they make up for

a surprisingly small amount of transistors. If you look at the whole computer as a bunch

of transistors, then the execution units are really tiny bits of that. The rest goes

into other stuff. And the other stuff comprises, for example, as one until two caches, caches

are small, fast memories. And the CPU tries to keep data that you use very often within

these caches. So that if you reuse it from time to time, then you can access data more quickly.

It turns out, if you dig deeper into this, that data transfer and data access delay is

the most important performance limiting factor in computer. So if you can somehow make

your program use less data or use data more that is close to the CPU, like in the L1

and L2 cache, then your program gets in there to be faster. So this is the number one

often like data transfer. That's why we have these caches. And one caches typically are

a couple of tens of kilobytes, like 32, 48, 64 kilobytes. L2 caches are the order of

half a megabyte, one megabyte. And so that's what the CPU software is that make up on core.

Hit that attached to one core. In each core has that, each core has registers, execution

units, a one that you cache nowadays. Now on a chip, usually we have a couple of course,

as I said, two, maybe the minimum. Nowadays you can buy chips with up to 36 cores from

Intel and even more from others. And a bunch of those cores are put on the chip and they

usually share a common L2 cache. Now how much of the caches share, that dependably

particular architecture, but usually the L3 cache is not private to each core. It's shared

among a bunch of cores. And that's important as you can see later.

So let's say up to 64-ish cores nowadays on a chip. And that's the thing that you put

on a socket. So what we call a socket, that's the thing that you find the shop. So the CPU

Teil einer Videoserie :

HPC4FAU / NHR@FAU

Teil eines Kapitels:

HPC Café

Presenters

Dr. Georg Hager

Zugänglich über

Offener Zugang

Dauer

00:47:46 Min

Aufnahmedatum

2023-06-13

Hochgeladen am

2023-06-17 00:16:03

Sprache

en-US

Topic: Computer Architecture 101 for Scientists, and what it means for your cluster jobs

Speaker: Dr. Georg Hager, Head of Training & Support, NHR@FAU

Slides

Abstract:

Not knowing how a supercomputer works can literally cost you the better part of your precious resource allocation. We introduce the basic concepts of parallel computer architecture and how they impact the way cluster users should think about their compute jobs. Cores, sockets, caches, memory, networking, and data storage all play a role for performance; as a user, knowing your way around the supercomputer’s architecture can help you to more “bang” for your CPU/GPU time “buck.” If you are intrigued (or even intimidated) by all this “complicated” hardware stuff but always wanted to know more, this event is for you. Turns out, it’s not that complicated at all.

Material from past events is available on: https://hpc.fau.de/teaching/hpc-cafe/

Tags

Per RSS abonnieren