94 - NHR Perflab Seminar 2025-07-22: (Re-)Configurable Processor Arrays On-Chip—Low Energy/High Performance Loop Nest Acceleration [ID:58527]

Dieser Clip ist ausschließlich für angemeldete Benutzer zugänglich.

Teil einer Videoserie :

HPC4FAU / NHR@FAU

Teil eines Kapitels:

NHR@FAU PerfLab Seminar

Presenters

Dr. Georg Hager

Zugänglich über

Nur für Portal

Dauer

00:53:47 Min

Aufnahmedatum

2025-07-22

Hochgeladen am

2025-08-08 19:16:03

Sprache

en-US

Speaker: Prof. Dr. Jürgen Teich, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)

Date and time: Tuesday, July 22, 2025, 2:00 p.m. CEST

Slides

Abstract:

Semiconductor technology has advanced so fast in the last decade, allowing for the integration of 100 and more processing elements (PEs) on a single chip. In many applications of high-performance computing and embedded systems, parallel on-chip computing is considered essential to achieve also a high energy efficiency. In order to support the acceleration of a wide class of nested loop program applications, tightly-coupled arrays of programmable light-weight processing elements provide the most promising match.

In the first part of this talk, we discuss the need of a highly intertwined co-design of architecture and compiler for loop nests. E.g., in order to save a substantial amount of energy, data locality of the variables must be preserved during compilation and match the local interconnect between processing elements. We also study the use of block partitioning in order to be able to map also huge loop nests to schedules of accelerator-matched function calls so to achieve scalability.

The second part of the talk is dedicated to tightly coupled processor arrays (TCPAs), a class of massively parallel arrays of locally interconnected PEs, as well as corresponding compilation concepts. The PEs of a TCPA possess a special, yet tiny instruction set and a tiny local instruction memory. Different to coarse-grain reconfigurable arrays (CGRAs), they enable the parallel execution of iterations of multiple rather than just the innermost loop for many computationally intensive applications. Besides introducing the main architectural building blocks of TCPAs, the presentation covers the corresponding steps and transformations in application mapping, which starts from a functional programming language and involves symbolic loop compilation. Finally, the talk presents a concrete TCPA chip design called ALPACA that has been recently designed and manufactured in 22 nm technology by our group.