Welcome back to deep learning to the last video where we discussed the different algorithms
regarding generative adversarial networks and today we want to look into the fifth part
of our lecture and these are essentially more tricks of the trade concerning GANs.
One trick that can help you quite a bit is one-sided label smoothing. So what you may want to do is
replace your targets of the real samples with a smoothed version. So instead of using a one
probability you use a 0.9 probability but you do not use the same for the fake samples. So you
don't change their label to zero because otherwise this will reinforce incorrect behavior. So your
generator would produce samples that resemble the data or samples it already makes. Benefits are
that you can prevent the discriminator from giving very large gradients to your generator and you
also prevent extrapolating to encourage extreme samples. Is balancing between the generator and
the discriminator necessary? No it's not. The GANs work by estimating the ratio of data model density
so the ratio is estimated correctly only when the discriminator is optimal so it's fine if your
discriminator overpowers the generator. When the discriminator gets too good your gradients of
course may vanish then you can use tricks like the non-saturating laws the Wasserstein GANs as
we talked about earlier and you may also run into the problem that your generator's gradients may
get too large and in this case you can use the trick of label smoothing. Of course you can also
work with deep convolutional GANs so this is the C-GAN where you implement a deep learning approach
into the generator so you can replace pooling layers with stride convolutions and transposed
convolutions. You can fully remove the connected hidden layers for deeper architectures and the
generator then typically uses ReLU activations except for the output layer in which you use
a tungent superpolychos and the discriminator for example here uses a leaky ReLU activation for
all the layers and they use batch normalization and if you do that then you may end up in the
following problem you can see here some generation results and within the batches there may be a very
strong intra-batch correlation so within the batch all of the generated images look very similar.
And this brings us to the concept of virtual batch normalization so you don't want to use
one batch normalization instance for both mini-batches. You could use two separate batch
normalizations or even better you use the virtual batch normalization and in case this is too
expensive you choose instance normalization for each sample and subtract the mean and divide by
the standard deviation. In case you choose virtual batch normalization then you create a reference
batch R of random samples and fix them once at the start of the training and then for each Xi of the
current mini-batch you create a new virtual batch that is the reference batch union d Xi and then
you compute the mean and standard deviation of this virtual batch and you always need to propagate
then R forward in addition to the current batch. This then allows you to normalize Xi with these
statistics so this may be kind of expensive but we have seen that this is very useful for stabilizing
the training and remove the intra-batch correlations. There's also the idea of
historical averaging so there you add a penalty term that punishes weights which are rather far
away from the historical average and this historical average of the parameters can then be updated in an
online fashion. Similar tricks from reinforcement learning can also work for generative adversarial
networks like experience replay. You keep a replay buffer of past generations and occasionally show
them and you keep checkpoints from the past generator and discriminator and occasionally swap
them out for a few iterations. So if you do so then you can do things like the dc-gun. Here are
bedrooms after just one epoch and you can see that you are able to generate quite a few different
bedrooms. So very interesting what kind of diversity in terms of generation you can actually achieve.
Another interesting observation is that you can do vector arithmetic on the generated images so
you can generate for example the mean of three instances of man with glasses and with this mean
then you can subtract for example the mean of man without glasses and then you compute the mean
of woman without glasses and add it on top and what you get is woman with glasses. So you can
really use the constrained generation with this trick in order to generate something where you
potentially don't have a conditioning variable for. So the guns learn a distribution representation
that disentangles the concept of gender from the concept of wearing glasses and if you're interested
Presenters
Zugänglich über
Offener Zugang
Dauer
00:18:09 Min
Aufnahmedatum
2020-10-12
Hochgeladen am
2020-10-12 22:26:21
Sprache
en-US
Deep Learning - Unsupervised Learning Part 5
In this last video on unsupervised learning, we introduce some more advanced GAN concepts to avoid mode collapse and strong intra-batch correlation using virtual batch normalization, unrolled GANs, and minibatch discrimination.
For reminders to watch the new video follow on Twitter or LinkedIn.
Further Reading:
A gentle Introduction to Deep Learning
Links
Link - Variational Autoencoders:
Link - NIPS 2016 GAN Tutorial of Goodfellow
Link - How to train a GAN? Tips and tricks to make GANs work (careful, not
everything is true anymore!)
Link - Ever wondered about how to name your GAN?
References
[1] Xi Chen, Xi Chen, Yan Duan, et al. “InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets”. In: Advances in Neural Information Processing Systems 29. Curran Associates, Inc., 2016, pp. 2172–2180.
[2] Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, et al. “Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion”. In: Journal of Machine Learning Research 11.Dec (2010), pp. 3371–3408.
[3] Emily L. Denton, Soumith Chintala, Arthur Szlam, et al. “Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks”. In: CoRR abs/1506.05751 (2015). arXiv: 1506.05751.
[4] Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern classification. 2nd ed. New York: Wiley-Interscience, Nov. 2000.
[5] Asja Fischer and Christian Igel. “Training restricted Boltzmann machines: An introduction”. In: Pattern Recognition 47.1 (2014), pp. 25–39.
[6] John Gauthier. Conditional generative adversarial networks for face generation. Mar. 17, 2015. URL: http://www.foldl.me/2015/conditional-gans-face-generation/ (visited on 01/22/2018).
[7] Ian Goodfellow. NIPS 2016 Tutorial: Generative Adversarial Networks. 2016. eprint: arXiv:1701.00160.
[8] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, et al. “GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium”. In: Advances in Neural Information Processing Systems 30. Curran Associates, Inc., 2017, pp. 6626–6637.
[9] Geoffrey E Hinton and Ruslan R Salakhutdinov. “Reducing the dimensionality of data with neural networks.” In: Science 313.5786 (July 2006), pp. 504–507. arXiv: 20.
[10] Geoffrey E. Hinton. “A Practical Guide to Training Restricted Boltzmann Machines”. In: Neural Networks: Tricks of the Trade: Second Edition. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 599–619.
[11] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, et al. “Image-to-Image Translation with Conditional Adversarial Networks”. In: (2016). eprint: arXiv:1611.07004.
[12] Diederik P Kingma and Max Welling. “Auto-Encoding Variational Bayes”. In: arXiv e-prints, arXiv:1312.6114 (Dec. 2013), arXiv:1312.6114. arXiv: 1312.6114 [stat.ML].
[13] Jonathan Masci, Ueli Meier, Dan Ciresan, et al. “Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction”. In: Artificial Neural Networks and Machine Learning – ICANN 2011. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, pp. 52–59.
[14] Luke Metz, Ben Poole, David Pfau, et al. “Unrolled Generative Adversarial Networks”. In: International Conference on Learning Representations. Apr. 2017. eprint: arXiv:1611.02163.
[15] Mehdi Mirza and Simon Osindero. “Conditional Generative Adversarial Nets”. In: CoRR abs/1411.1784 (2014). arXiv: 1411.1784.
[16] Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial 2015. eprint: arXiv:1511.06434.
[17] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, et al. “Improved Techniques for Training GANs”. In: Advances in Neural Information Processing Systems 29. Curran Associates, Inc., 2016, pp. 2234–2242.
[18] Andrew Ng. “CS294A Lecture notes”. In: 2011.
[19] Han Zhang, Tao Xu, Hongsheng Li, et al. “StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks”. In: CoRR abs/1612.03242 (2016). arXiv: 1612.03242.
[20] Han Zhang, Tao Xu, Hongsheng Li, et al. “Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks”. In: arXiv preprint arXiv:1612.03242 (2016).
[21] Bolei Zhou, Aditya Khosla, Agata Lapedriza, et al. “Learning Deep Features for Discriminative Localization”. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, June 2016, pp. 2921–2929. arXiv: 1512.04150.
[22] Jun-Yan Zhu, Taesung Park, Phillip Isola, et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”. In: CoRR abs/1703.10593 (2017). arXiv: 1703.10593.