9 - Architectures of Supercomputers [ID:10233]
50 von 981 angezeigt

Welcome back to the University. I hope you all had a good time passing between the years.

I think last time we stopped somewhere around these slides on MPI. I decided to drop the

MPI coding example from the lecture. If anyone's interested in that, we can do that in the

exercises. I would like to continue now with the actual content of the lecture, simply

because I realized that we don't have as much time as I was expecting in this term left.

I think usually the teaching time of the winter term runs until early or even mid of February.

This year it will end on the end of January. So we have one or two lectures less than I

was expecting. So let's just continue. We will now continue with the review of the fastest

systems of the top 500 list. We are currently in November 2009 and we are looking at Jaguar.

Jaguar was located at the Oak Ridge National Laboratory. It was surpassing the previous

fastest system, which was Roadrunner. Roadrunner was not really a system which was loved by

programmers as I said last time. Actually Jaguar was a rather convenient system to program

for because it was just standard opteron processors, no accelerators, no really complicated architecture.

It was more or less a standard Cray system. So it was not one of a kind, but it was rather

simply something which everyone else bought at that time, just a slightly bigger version

of it. Also it was quite a lot faster than Roadrunner. I think Roadrunner had 1.1 petaflops,

so this is a step up of about 60%, quite a lot. However, it did stay on the top 500 list

just for one year when Tianhe, I think no one here can correct me so I can claim that

it was the correct pronunciation, Tianhe 1A, surpassed it just one year later. I said before

that the architecture is more or less standard. I think if we go back the past, probably even

the past 10 years and look at all Cray systems we will always see some sort of 3D torus network.

The version that Jaguar was using was the C-Star network which had per 3D direction

or per cardinal dimension 9.6 gigabytes bandwidth per second. It had a dedicated processor for

managing network traffic and each node was connected via a 6.4 gigabyte link. If you

look at this you may wonder why would I have a slower link to connect the node to when

my network here is actually faster. Why would I ever need a higher bandwidth here? The reason

for that is of course I have a torus network, so I might have to route traffic from neighboring

nodes through this node. So these 9.6 gigabytes are not exclusively for the connected node

but it might need to share it if I'm not using just next neighbor communication. That's the

reason for that. I had two opteron processors connected to this network link and each had

its own RAM. That's just to be expected because we have a cache coherent non-uniform memory

access or CC NUMA architecture just like every multi-core dual socket design today in all

supercomputers has. I think except for BlueGNQ which has just a single socket per node. What

is interesting, I think this is the first slide in this lecture we see on the cooling

system. Actually the cooling system was rather important for Jaguar to achieve a much better

power efficiency than RoadRAR. What they did was they were using a sort of liquid cooling

where they had an evaporator and they did lead the coolant flow from the bottom to the

top of the rack and on the top of the rack the hot coolant would be cooled back. Hello.

What's similar in this architecture to the BlueGNQ architectures we've seen before or

to the RoadRunner architectures is the modular and hierarchical design. So compute blades

did consist of multiple nodes. So in this case we have four nodes per blade. We have

eight blades per chassis and three chassis per rack. So that's actually similar to RoadRunner.

They were using I think standard 19 inch racks. And if we try to visualize this you can already

tell that per rack that's a lot of nodes, right? Of course they didn't stop at one rack

but they had a lot of racks. Actually I think RoadRunner if it hadn't been upgraded later

on would still be in the top ten for the fastest systems. Actually they did later on, they

didn't scrap the whole system and they rather upgraded it. But we'll see that later on in

the lecture. What's also interesting is if we visualize each node as such a bubble here

we can also visualize the I O nodes here as yellow bubbles and the red bubbles are I think

the interconnects to the external network. So if you want to do I O on this system efficiently

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

01:29:40 Min

Aufnahmedatum

2015-01-13

Hochgeladen am

2019-04-03 16:19:04

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen