Thanks for inviting me to give this talk.
I changed the talk a little bit from what's
in the announcement, right?
The talk in the announcement was why matrix processing.
And
then last night when I was doing final thinking of what to present
I said
maybe I'll add
an how here.
So why and how matrix processing.
And if you allow me to digress a little bit,
I think matrix processing is one of those examples that feed on itself, right?
You do something well,
more people use
which encourages you to do something better next time
which even more
people use.
So it's one of those virtual cycles where good implementation leads to good use,
which leads to good implementation.
And I hope to convey why and how that is the case.
Okay.
So let's start with, oh, why didn't move just a moment.
Oh, there we go.
Okay.
Let's start
with, no, where does the need for matrix processing come from?
If you take a large language model,
and which is not modern
relatively modern application of computing engines.
And let's see, where does the time go?
So there's a particular model that you can download from
Hugging Face called the Falcon.
This is the Falcon 7 billion parameter model.
And you can run it
using PyTorch.
And then you do some instrumentation and you see, well, what percentage of the time
when this is run on a single thread goes into doing two operations.
One operation is called GEM,
which, no, if you are familiar with Blasio, you know what it is. But if you're not, just remember
that GEM means matrix, matrix multiplication.
And the other operation that is common is GEMV,
which is matrix vector multiplication.
And as you can see by that profile, and I will talk about the
X axis in a moment, but the Y axis is the percentage of all the cycles.
Is the computer
doing GEM?
Or is it doing GEMV?
And what's the sum of both?
You see that more than 95% of the time,
more like
Presenters
Zugänglich über
Offener Zugang
Dauer
01:06:32 Min
Aufnahmedatum
2025-12-09
Hochgeladen am
2026-01-20 13:05:49
Sprache
en-US
Slides
Abstract:
It took vector processing approximately 30 years to move from the domain of supercomputers to mainstream computing. In comparison, matrix processing hit the ground running and has arguably been mainstream since inception. Today, matrix processing units are pervasive in modern CPUs and GPUs, as well as in special-purpose processors. In this talk, we will review the principles of matrix processing and discuss its inherent advantage over scalar and vector processing. We will also discuss different approaches to implementing matrix processing units at the instruction-set architecture and micro-architecture level, and the applications and relevance of matrix processing in both AI/ML and more traditional HPC applications. We will also go over some of the challenges and limitations of matrix processing and argue that modern processing elements must offer a balance of scalar, vector, and matrix processing.
For a list of past and upcoming NHR PerfLab seminar events, please see: https://hpc.fau.de/research/nhr-perflab-seminar-series/