104 - NHR PerfLab Seminar 2025-12-09: Why (and how) matrix processing? [ID:61348]
50 von 1333 angezeigt

Thanks for inviting me to give this talk.

I changed the talk a little bit from what's

in the announcement, right?

The talk in the announcement was why matrix processing.

And

then last night when I was doing final thinking of what to present

I said

maybe I'll add

an how here.

So why and how matrix processing.

And if you allow me to digress a little bit,

I think matrix processing is one of those examples that feed on itself, right?

You do something well,

more people use

which encourages you to do something better next time

which even more

people use.

So it's one of those virtual cycles where good implementation leads to good use,

which leads to good implementation.

And I hope to convey why and how that is the case.

Okay.

So let's start with, oh, why didn't move just a moment.

Oh, there we go.

Okay.

Let's start

with, no, where does the need for matrix processing come from?

If you take a large language model,

and which is not modern

relatively modern application of computing engines.

And let's see, where does the time go?

So there's a particular model that you can download from

Hugging Face called the Falcon.

This is the Falcon 7 billion parameter model.

And you can run it

using PyTorch.

And then you do some instrumentation and you see, well, what percentage of the time

when this is run on a single thread goes into doing two operations.

One operation is called GEM,

which, no, if you are familiar with Blasio, you know what it is. But if you're not, just remember

that GEM means matrix, matrix multiplication.

And the other operation that is common is GEMV,

which is matrix vector multiplication.

And as you can see by that profile, and I will talk about the

X axis in a moment, but the Y axis is the percentage of all the cycles.

Is the computer

doing GEM?

Or is it doing GEMV?

And what's the sum of both?

You see that more than 95% of the time,

more like

Teil einer Videoserie :
Teil eines Kapitels:
NHR@FAU PerfLab Seminar

Zugänglich über

Offener Zugang

Dauer

01:06:32 Min

Aufnahmedatum

2025-12-09

Hochgeladen am

2026-01-20 13:05:49

Sprache

en-US

Speaker: Dr. José Moreira, IBM

Slides

Abstract:
It took vector processing approximately 30 years to move from the domain of supercomputers to mainstream computing. In comparison, matrix processing hit the ground running and has arguably been mainstream since inception. Today, matrix processing units are pervasive in modern CPUs and GPUs, as well as in special-purpose processors. In this talk, we will review the principles of matrix processing and discuss its inherent advantage over scalar and vector processing. We will also discuss different approaches to implementing matrix processing units at the instruction-set architecture and micro-architecture level, and the applications and relevance of matrix processing in both AI/ML and more traditional HPC applications. We will also go over some of the challenges and limitations of matrix processing and argue that modern processing elements must offer a balance of scalar, vector, and matrix processing.

For a list of past and upcoming NHR PerfLab seminar events, please see: https://hpc.fau.de/research/nhr-perflab-seminar-series/
Einbetten
Wordpress FAU Plugin
iFrame
Teilen