Welcome back to deep learning. Today we want to talk about a couple of the more advanced
gun concepts, in particular the conditional guns and the cycle guns. So let's have a look
what I have here on my slides. It's part four of our unsupervised deep learning lecture
and we first start with the conditional GANs. So one problem that we had so far is that
the generator creates a fake generic image but it's not specific for a certain condition or
characteristic. So let's say if you have text to image generation then of course the image should
depend on the text so you need to be able to model the dependency somehow and of course if you want
to generate zeros then you don't want to generate one so you need to put in some condition but you
want to generate the digit zero one two three and so on and this can be done by encoding conditioning
which is introduced in reference 15. So the idea here is now that you essentially split up your
latent vector into the z that has essentially the observation and then you also have the condition
which is encoded here in the conditioning vector y and you concatenate the two and use them in order
to generate something and also the discriminator then gets the generated image but it also gets
access to the condition and vector y so it knows what it's supposed to see and the specific generated
output of the generator. So both of them receive the conditioning and this then essentially again
results in a two-player min-max game that can be described as again as a loss that is dependent on
the discriminator but here you additionally have the conditioning with y in the loss. So how does
this then work? So you add a conditional feature like smiling, gender, age or other properties of
the image and then the generator discriminator learns to operate in those modes and this then
leads to the property that you're able to generate a face with a certain attribute and the discriminator
learns whether this is the face given that specific attribute. So here you see different examples of
generated faces. The first row are just random samples, the second row are conditioned into the
property old age and the third row is given the condition old age plus smiling and here you see
that the conditioning vector is still able to produce similar images but you can actually add
those conditions on top. So this allows them to create really very nice things like image to image
translation. So you have here several examples for inputs and outputs and you can essentially then
create labels to street scene, you can generate aerial to map, you can generate labels to facade,
black white to color, day to night, edges to photo and the idea here is that we use the label image
again as a conditioning vector and this leads us to the following observations that this is
domain translation is simply a conditional gun. So here the positive examples are given to the
discriminator where you have the handbag and the edges of the handbag and the negative examples are
then constructed by giving the edges of the handbag to the generator, generator handbag and then
give it to the discriminator. So you can see that we are able to generate really complex images
just by using conditional guns. Now a key problem here is of course that you need the two images to
be aligned so your conditioning image like the edge image here has to exactly match the respective
handbag image and if they don't you wouldn't be able to train this. So for domain translation using
conditional guns you need exact matches. In many cases you don't have access to exact matches
so let's say you have a scene that shows zebras you will probably not find a paired data set that
shows exactly the same scene but with horses so you cannot just use it with a conditional gun
and the key ingredient here is the so-called cycle consistency loss so you couple guns with
trainable inverse mappings and the key idea here is that you have one conditional gun that inputs
g as the conditioning image and generates then some new output and if you take this new output
and use it in the condition variable of f it should produce x again. So you use the conditioning
variables to form a loop and the key component here is that g and f should be essentially inverses
of each other so if you take f of g you should end up with x again and of course also if you take
g of f of y then you should end up with y again and this then gives rise to the following concept
so you take two guns and two discriminators one gun g is generating from x y and one gun f is
generating x from y and you still need the discriminators x and the discriminator of y
and the two main additions to the gun losses are so of course you have the typical discriminator
losses the original gun losses for dx and dy and they are of course coupled respectively with g
Presenters
Zugänglich über
Offener Zugang
Dauer
00:09:05 Min
Aufnahmedatum
2020-10-12
Hochgeladen am
2020-10-12 21:36:26
Sprache
en-US
Deep Learning - Unsupervised Learning Part 4
In this video, we talk about conditional GANs and the CycleGAN.
For reminders to watch the new video follow on Twitter or LinkedIn.
Video References:
CoverGAN
AgeGAN
Hyperrealism for Surgical Training
Further Reading:
A gentle Introduction to Deep Learning