84 - HPC Café on March 11, 2025: Large Language Models for Dummies [ID:56417]

50 von 693 angezeigt

So the agenda is, first we'll do a short introduction, then we look into how LLMs actually work,

especially the attention mechanism.

We go into scaling loss, and then we look into the different training types of LLMs.

We look into the evaluation and after that into parallelization techniques and also how

we use LLMs on our HPC systems.

First introduction, basically a little bit more than two years ago, JetGPT was released

and it definitely changed the world as we knew it before.

So for me it took away a lot of boring type of work, which I didn't like to do, and much

more productive at writing Python scripts or data processing, whatever.

I really like to use it there.

And this is the current state of the art leaderboard for both commercially available and some open

source LLMs.

We see OpenAI 01, it's a reasoning model, which is first close after a Steepseq, the

Chinese model, which was released some weeks ago.

You will see Claude, which is also a commercial model, but also a lot of open source models

like Croc and Llama3, for example.

And the reason open source model exists, that's the model we can use on our HPC systems, right?

So the closed ones we don't have access to.

So if we just look at the open source LLMs for this talk, there's also a separate leaderboard

from Hugging Face, which you can look up.

So the most popular open source LLMs are basically Llama, Llama3 series, Mistral AI,

especially for German.

I like Mistral models.

They're quite good at speaking German and European languages.

And the new kit on the blog Deepseq, which is yet hard to host, but has excellent results

for those benchmarks.

So let's hop right into how do these LLMs actually work.

So basically they're next word predictors.

And if we start with the sequence of words, so just for clarification, I use a sequence

of words instead of tokens for making it easier to understand.

So the actual LLMs produce tokens instead of words.

But if we have a sequence of, let's say, HPC is all we, the transformer, the GPT model

will predict probabilities of the next word, which is in this case, I think, need, right?

HPC is all we need.

This is basically the principle how it works.

And if we look into the details, so again, those words represent tokens.

They all have their correspondent embeddings.

For example, HPC has the meaning encoded into an embedding that is static in this case.

And then is has its own embedding, all has its own embedding and we, and so on.

And this is how it is in a static way.

So the embeddings are just looked up and they contain the meaning of the single word, which

is not too helpful yet.

So the GPT model uses instead tokens and that's to provide a balance between linguistic flexibility

and also the computational efficiency.

So they are split up, basically split up words, and so they can make better and understand

different languages better.

But there's still problems with numbers and also code.

And one point here is that tokenizers matter a lot.

But for the latest models, the tokenizers became quite good.

So even for let's say German language, which is globally a niche language, right?

Teil einer Videoserie :

HPC4FAU / NHR@FAU

Teil eines Kapitels:

HPC Café

Presenters

Dr. Georg Hager

Zugänglich über

Offener Zugang

Dauer

00:47:09 Min

Aufnahmedatum

2025-03-19

Hochgeladen am

2025-03-19 11:26:05

Sprache

en-US

Topic: Large Language Models for Dummies

Speaker: Sebastian Wind, NHR@FAU

Slides

Abstract:
Large Language Models (LLMs) are revolutionizing the way we interact with artificial intelligence, and the open-source community plays a pivotal role in driving their accessibility and innovation. This talk delves into the inner workings of LLMs, exploring their foundational mechanisms and architectures. Additionally, we examine how these models can be efficiently trained on high-performance computing (HPC) systems, leveraging state-of-the-art scaling strategies and principles derived from scaling laws. By understanding these methodologies, attendees will gain valuable insights into the challenges and opportunities of developing and deploying LLMs in diverse computational environments.

Material from past events is available at: https://hpc.fau.de/teaching/hpc-cafe/

Einbetten

Wordpress FAU Plugin

 https://www.fau.tv/clip/id/56417

iFrame

<iframe src="https://api.video.uni-erlangen.de/services/oembed/?url=https://www.fau.tv/clip/id/56417&format=iframe&maxwidth=1280&maxheight=720" width="1280" height="720"seamless allowfullscreen style="border: 0; padding: 0; margin: 0;overflow: hidden;"></iframe>

Herunterladen

Video

Audio

Per RSS abonnieren