15 - An Introduction to Generative Modeling [ID:33290]

50 von 341 angezeigt

Okay, cool. Thanks Daniel, thanks for the kind introduction for having me. I think it's 16 years by now, almost like 15 and a half we can meet in the middle.

It's a pleasure to be back.

And, yeah, I think the goal for today really that I have is to give you a very very quick introduction to degenerative modeling, because I understand that there is a mixed audience here some students and researcher faculty more experienced people, even

than I am, but also some junior people.

And this is a very hot topic in deep learning, and I thought, you know, let's kind of set the stage a little bit and give you a really quick overview as Daniel said I mean have a longer version of this on my website also with recordings that go for longer where

I also go slower.

However, in any case, there is plenty of time for asking questions along the way. I mean the most important thing is that you get out what you want to get out of the seminar, I mean I can talk about whatever I want to talk about is more important that I talk about what

matters to you. Okay, so let's, let's look at deep generative modeling and I'll skip all this introduction that you've heard in the news about deep fakes and all that we all have read that.

So, really the goal in generative modeling is, you are given a bunch of samples.

These access. And what you want is you want to learn the underlying probability distribution. So you assume they come from some distribution I mean, say all the cat images in the world.

And you cannot write down this distribution elegantly with mathematics, so you try to learn it from data. So that's kind of the key goal.

There are a bunch of challenges so typically and is large so you have high dimensional data sets. Think about images number of pixels and images is pretty large.

But also this distribution x can be really complicated. So assume. Definitely non non Gaussian distributions here. Typically, multimodal, the support can be disjoint can be really really rough.

And that's what we're looking at in most cases in the most general cases that people are interested in. So, what's the key idea is the idea is sort of you learn the distribution by learning how to generate samples.

So you have a generator that's called a G theta.

And this here is going to denote parameters or weights. I'm not going to be so specific about what type of weights we have will sweep all that under the rug that is kind of a question of how do you design and do a network architecture.

And I guess you'll hear that and hear about that in other talks, but not today. The most important thing is, there are weights that we need to train.

So you, you train those weights, so that you can sample from a latent distribution.

Put those, so they're called, let's call it z, pull it into the generator, and get mapped back to our end, and hopefully what you produce here matches the characteristics of x.

So, the latent distribution today will just be a Gaussian, and the dimension q here will be, let's say less or equal to n. We'll first keep it equal to n then we'll keep it less than n.

Okay, so once you have, so say you can do this, so say you can find a generator. I want to convince you that then you have really learned at least two important things about the distribution you're interested in.

So you can estimate the density so if I give you an x, I can quantify the likelihood of that sample by using my model, using my generator, namely, I can kind of integrate out so I can marginalize over the z here so P of z, so this is a Gaussian so I can

get away from it I can compute entities as much as I like, really easy. And this is a likelihood term. This kind of tells us, so if I given z I plug it into my generator, how close is that sample to my to x, which I'm interested in so both of these things

are computable.

The only thing that's maybe not so easy is to compute this integral because it will be quite high dimensional. And of course I should correct this this is a DZ, integrate over the z.

So about that.

And, yeah, but in principle you can estimate the density. The main use for today is sampling though. So it's really easy once you have the generator to produce, say new images of the content that you're interested in so you sample from your latent distribution

which is simple you plug it into the generator and here you have a new image so that's that's the ultimate goal for today and generative modeling, but the density estimation is also important to see that there's an option that can be interesting actually for many

applications and also can be used in terms of the training.

Okay, so here's a picture of the whole thing for a problem. So the latent distribution here nice and simple nice Gaussian everything looks looks kind of pretty.

Here's your bit more messy data distribution.

And in order to kind of get get or learn this distribution. What we can think about doing is we go from the latent distribution to the generated distribution and kind of want to make those as similar as possible.

And then immediately you run into the question of, okay, so how do you actually compare distributions that is going to be the crux here because you need to be able to quantify how close these distributions are so that you can for example improve the updates

of the weights. Otherwise, how are you going to train the weights, and I'm going to present you three main approaches.

So, they're pretty old by now probably a few years old at least in machine learning that's ancient, but they build the foundation of most of the methods that people are still developing these days.

Okay, so here are two examples.

One really silly easy one and one also silly one but a little bit bigger.

And I want to kind of use this also to motivate the notation of the latent space and so on. So the first one is really happening in a two dimensional data space so here's my data set.

It's just the points in 2d. It's called the moon's data set you can find it in psychic learn.

And if you if you think about what could be the distribution here you definitely see it's it's a by modal distribution, and the support of the distribution is not connected so there's a gap here between between these two between these two half moons.

So, I want to kind of take this distribution and my latent distribution in both cases actually will just be the two dimension a Gaussian standard Gaussian.

So, then I have two tasks here. So, in the in the moon's data set.

Ultimately I want to generate samples. Okay, so I want to find a generator so that I can get from the left to the right, so I'm at the red dots to the blue squares.

So, in here, I can actually estimate the density so make it so that we can actually compute the inverse of the generator, and thereby you can compute the in the density without doing this integral that I showed you.

So, that's the first class of kind of problems. And I can invert this, this mapping for two reasons so the obvious reason is that, and is equal to q and both are equal to two so that's that's my hope here.

And the other reason is that I still have to kind of buy construction make sure that I can invert whatever I define as the generator. And then a more difficult and more general problem though is to allow different dimensionality so here for example I have

this image data set of handwritten digits. So here, and is about 800.

And q, I still want to keep it at two.

So basically I want to find to sample points and to the to the coordinates and generate these images. So here there's no hope to even think about an inverse. And that's going to complicate the training also which is one of the primary focuses for today.

Okay, so these two examples I will walk through and on the website I shared that in the Slack chat. Sorry, the zoom chat.

Teil einer Videoserie :

AG Mathematics of Deep Learning

Zugänglich über

Offener Zugang

Dauer

00:55:59 Min

Aufnahmedatum

2021-05-18

Hochgeladen am

2021-05-24 01:18:56

Sprache

en-US

Lars Ruthotto (Emory University) on "An Introduction to Generative Modeling":

Deep generative models (DGM) are neural networks with many hidden layers trained to approximate complicated, high-dimensional probability distributions from a finite number of samples. When trained successfully, we can use the DGMs to estimate the likelihood of each observation and to create new samples from the underlying distribution. Developing DGMs has become one of the most hotly researched fields in artificial intelligence in recent years. The literature on DGMs has become vast and is growing rapidly.
Some advances have even reached the public sphere, for example, the recent successes in generating realistic-looking images, voices, or movies; so-called deep fakes.
Despite these successes, several mathematical and practical issues limit the broader use of DGMs: given a specific dataset, it remains challenging to design and train a DGM and even more challenging to find out why a particular model is or is not effective. To help students contribute to this field, this talk provides an introduction to DGMs and provides a concise mathematical framework for modeling the three most popular approaches: normalizing flows (NF), variational autoencoders (VAE), and generative adversarial networks (GAN). We illustrate the advantages and disadvantages of these basic approaches using numerical experiments. Our goal is to enable and motivate the reader to contribute to this proliferating research area. Our presentation also emphasizes relations between generative modeling and optimal transport.

Tags

Per RSS abonnieren