Hi, I'm Hiroki Otomo from Tokyo Institute of Technology and thank you for inviting me
to this seminar.
And yeah, I'm very glad to be able to talk about my recent research, DGEMM on Integer,
Tensor,Cores.
So before talking about my research, I'd like to introduce myself briefly.
So my name is Hiroki Otomo and I'm a PhD candidate at Tokyo Institute of Technology.
And yeah, the defense is already over, so I received my PhD this September.
And my research interests rely on mixed-pressure computing, randomized new maker linear algebra,
quantum circuit simulation, and HPC processors, including GPU and other accelerators.
And yeah, so before talking about double-pressure emulation, I would like to introduce the motivation
of our research.
So in recent years, significant advancement in deep learning has been achieved.
And many processors equipped with mixed-pressure or low-pressure matrix multiplication units,
taking advantage of the fact that the deep and the computation of deep learning can be
tolerant for low-pressure computing and heavily relies on matrix multiplication.
On the other hand, traditional HPC applications require more higher accuracy, such as FP32,
FP64, and even higher.
So there is a gap between the pressure that deep learning hardware equipped with and HPC
applications require.
So yeah, we have a question as a motivation of our research.
Can deep learning processors be used for HPC applications?
And yeah, I think it is yes.
And I would like to introduce the example answer to this question in this talk, but
it's not easy.
And the similar discussion, as in there, similar discussion as shown in the left of this video.
So I'll start introducing my research.
So before talking about double-pressure emulation, I'd like to introduce single-pressure emulation,
single-pressure matrix multiplication on tensor cores.
Because both methods are very similar, but there are big differences.
And single-pressure emulation is a more straightforward method, so it is easy to understand.
So I will introduce single-pressure emulation first, and then introduce double-pressure.
So first, what is NVIDIA Tensor Core?
So NVIDIA Tensor Core is a mixed-pressure matrix multiplication and addition unit on
NVIDIA GPUs.
The input of the Tensor Core is low-pressure, such as FP16 or TF32, while the computation
inside and output is FP32.
And TF32 is one of the floating-point formats, which has 8-bit exponent, the same as FP32,
and 10-bit of mantissa, which is the same as FP16.
And the throughput of Tensor Core is very fast.
For example, on NVIDIA A100 GPU and H100 GPU, the throughput of TF32 input Tensor Core
is about 7 to 8 times faster than FP32, FP32 computing unit.
And FP16 Tensor Core is about 15 to 16 times faster than FP32 computing unit.
So we would like to utilize this high throughput computing unit to improve the performance
of HPC applications.
And now we want to compute single-pressure matrix multiplication on Tensor Core.
But we have a problem.
Can we compute S-gen on Tensor Core?
The answer is, unfortunately, no.
Because as I mentioned before, because the input matrix of the Tensor Core is low-pressure,
we need to convert the input FP32 matrix to low-pressure.
Presenters
Zugänglich über
Offener Zugang
Dauer
00:35:29 Min
Aufnahmedatum
2023-09-05
Hochgeladen am
2023-09-08 19:06:03
Sprache
en-US