30 - Deep Learning - Plain Version 2020 [ID:21164]
50 von 62 angezeigt

Welcome back to deep learning and today we want to talk about the final part of the architectures

and in particular we want to look into learning architectures.

Okay part five learning architectures.

Well the idea here is that we want to have self-developing network structures and they can be

optimized with respect to accuracy, floating point operations and of course you could simply do that

with a grid search but typically that's too time consuming. So there have been a couple of approaches

to do that and one of the ideas here in reference 22 is using reinforcement learning. So you train

a recurrent neural network to generate model descriptions of networks and you train this RNN

with reinforcement learning to maximize the expected accuracy. Of course there's also many

other options you can do reinforcement learning for small building blocks transferred to large

CNNs, genetic algorithms, energy based and there's actually plenty of ideas that you could follow

but they are all very expensive in terms of training time and if you want to look into those

approaches you really have to have a large cluster because otherwise you aren't able to actually

perform the experiments. So there's actually not too many groups in the world that are able to

do such kind of research right now. So you can see that also here many elements that we've seen

earlier pop up again there's the separable convolutions and many other things that you

can see here in the left hand side you see this normal cell which kind of looks like an inception

module. If you look at the right hand side it kind of looks like later versions of the inception

modules where you have these separations and they are somehow concatenated and also use residual

connections and this somehow has been determined only by architecture search. Performance for

ImageNet is on par with the squeeze and excitation networks with lower computational costs and

yeah there's also of course optimization possible for different size for example for mobile platforms.

ImageNet where are we? Well we see that the ImageNet classification has dropped now below

five percent in most of the submissions. Substantial and significant improvements are more

and more difficult to show on this data set and the last official challenge was on CVPR in 2017.

It's now continued on Kaggle. There is new data sets that is being generated and is needed for

example 3D scenes, human level understanding and those data sets are currently being generated.

There's for example things like the MS-Coco data set or the Visual Genome data set which have

replaced ImageNet as state-of-the-art data set. Of course there's also different research directions

like speed and size of networks for mobile applications and in these situations ImageNet

may still be a suitable challenge. So let's come to some conclusions. We see that the

one-by-one filters to reduce the parameters and add regularization is a very common technique.

Inception modules are really nice because they allow you to find the right balance between

convolution and pooling. The residual connections are a recipe that have been used over and over

again and we've also seen that some of the new architectures can actually be learned.

So we see that there is a rise of deeper models from five layers to more than a thousand. However,

often a smaller net is sufficient. Of course this depends on the amount of training data.

You can only train those really big networks if you have sufficient data and we've seen that

sometimes it also makes sense to build wider layers instead of deep layers. You remember

we've already seen that in the universal approximation theorem. If we had infinitely

wide layers then maybe we could fit everything into a single layer. Okay so that brings us

already to the outlook on the next couple of videos and what we want to talk about is recurrent

neural networks. We will look into long short-term memory cells, we will look into truncated back

propagation through time which is a key element in order to be able to train those recurrent

networks and we finally have a look at the long short-term memory cell, one of the key ideas

that have been driven by Schmidt-Huber and Hochreiter. Another idea that came up by Cho

are the gated recurrent units which can somehow be a bridge between the traditional recurrent cells

and the long short-term memory cells. Well let's look at some comprehensive questions.

So what are the advantages of deeper models in comparison to shallow networks? Why can we say

that residual networks learn an ensemble of shallow networks? You remember I hinted on that slide

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

00:07:07 Min

Aufnahmedatum

2020-10-12

Hochgeladen am

2020-10-12 18:56:20

Sprache

en-US

Deep Learning - Architectures Part 5

This video discusses learning to learn options for architecture search and first results.

For reminders to watch the new video follow on Twitter or LinkedIn.

References

[1] Klaus Greff, Rupesh K. Srivastava, and Jürgen Schmidhuber. “Highway and Residual Networks learn Unrolled Iterative Estimation”. In: International Conference on Learning Representations (ICLR). Toulon, Apr. 2017. arXiv: 1612.07771.
[2] Kaiming He, Xiangyu Zhang, Shaoqing Ren, et al. “Deep Residual Learning for Image Recognition”. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, June 2016, pp. 770–778. arXiv: 1512.03385.
[3] Kaiming He, Xiangyu Zhang, Shaoqing Ren, et al. “Identity mappings in deep residual networks”. In: Computer Vision – ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 2016, pp. 630–645. arXiv: 1603.05027.
[4] J. Hu, L. Shen, and G. Sun. “Squeeze-and-Excitation Networks”. In: ArXiv e-prints (Sept. 2017). arXiv: 1709.01507 [cs.CV].
[5] Gao Huang, Yu Sun, Zhuang Liu, et al. “Deep Networks with Stochastic Depth”. In: Computer Vision – ECCV 2016, Proceedings, Part IV. Cham: Springer International Publishing, 2016, pp. 646–661.
[6] Gao Huang, Zhuang Liu, and Kilian Q. Weinberger. “Densely Connected Convolutional Networks”. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, July 2017. arXiv: 1608.06993.
[7] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. “ImageNet Classification with Deep Convolutional Neural Networks”. In: Advances In Neural Information Processing Systems 25. Curran Associates, Inc., 2012, pp. 1097–1105. arXiv: 1102.0183.
[8] Yann A LeCun, Léon Bottou, Genevieve B Orr, et al. “Efficient BackProp”. In: Neural Networks: Tricks of the Trade: Second Edition. Vol. 75. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 9–48.
[9] Y LeCun, L Bottou, Y Bengio, et al. “Gradient-based Learning Applied to Document Recognition”. In: Proceedings of the IEEE 86.11 (Nov. 1998), pp. 2278–2324. arXiv: 1102.0183.
[10] Min Lin, Qiang Chen, and Shuicheng Yan. “Network in network”. In: International Conference on Learning Representations. Banff, Canada, Apr. 2014. arXiv: 1102.0183.
[11] Olga Russakovsky, Jia Deng, Hao Su, et al. “ImageNet Large Scale Visual Recognition Challenge”. In: International Journal of Computer Vision 115.3 (Dec. 2015), pp. 211–252.
[12] Karen Simonyan and Andrew Zisserman. “Very Deep Convolutional Networks for Large-Scale Image Recognition”. In: International Conference on Learning Representations (ICLR). San Diego, May 2015. arXiv: 1409.1556.
[13] Rupesh Kumar Srivastava, Klaus Greff, Urgen Schmidhuber, et al. “Training Very Deep Networks”. In: Advances in Neural Information Processing Systems 28. Curran Associates, Inc., 2015, pp. 2377–2385. arXiv: 1507.06228.
[14] C. Szegedy, Wei Liu, Yangqing Jia, et al. “Going deeper with convolutions”. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 2015, pp. 1–9.
[15] C. Szegedy, V. Vanhoucke, S. Ioffe, et al. “Rethinking the Inception Architecture for Computer Vision”. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 2016, pp. 2818–2826.
[16] Christian Szegedy, Sergey Ioffe, and Vincent Vanhoucke. “Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning”. In: Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Inception-v4, San Francisco, Feb. 2017. arXiv: 1602.07261.
[17] Andreas Veit, Michael J Wilber, and Serge Belongie. “Residual Networks Behave Like Ensembles of Relatively Shallow Networks”. In: Advances in Neural Information Processing Systems 29. Curran Associates, Inc., 2016, pp. 550–558. A.
[18] Di Xie, Jiang Xiong, and Shiliang Pu. “All You Need is Beyond a Good Init: Exploring Better Solution for Training Extremely Deep Convolutional Neural Networks with Orthonormality and Modulation”. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, July 2017. arXiv: 1703.01827.
[19] Lingxi Xie and Alan Yuille. Genetic CNN. Tech. rep. 2017. arXiv: 1703.01513.
[20] Sergey Zagoruyko and Nikos Komodakis. “Wide Residual Networks”. In: Proceedings of the British Machine Vision Conference (BMVC). BMVA Press, Sept. 2016, pp. 87.1–87.12.
[21] K Zhang, M Sun, X Han, et al. “Residual Networks of Residual Networks: Multilevel Residual Networks”. In: IEEE Transactions on Circuits and Systems for Video Technology PP.99 (2017), p. 1.
[22] Barret Zoph, Vijay Vasudevan, Jonathon Shlens, et al. Learning Transferable Architectures for Scalable

Further Reading:
A gentle Introduction to Deep Learning

Einbetten
Wordpress FAU Plugin
iFrame
Teilen