Welcome back to deep learning and today we want to continue talking about unsupervised methods
and today we look into a very popular technique that are the so-called autoencoders.
Okay so here are our slides so part two of our lecture and the topic are autoencoders.
Well the concept of the autoencoder is that we want to use the ideas of feed-forward neural networks
and you could say that a feed-forward neural network is a function of x that produces some
encoding y. Now the problem is how can we generate a loss in such a constellation and the idea is
rather simple we add an additional layer here on top and the purpose of the layer is to
compute a decoding. So we have another layer that is g of y and g of y produces some x hat and the
loss that we can then define is that x hat and x need to be the same. So the autoencoder tries to
learn an approximation of the identity. Well that sounds rather simple and to be honest if we have
exactly the same number of nodes in the input and in the hidden layer for y here then the easiest
solution would probably be the identity. So why is this useful at all? Well let's look at some loss
functions and what you can typically use is a loss function that then operates here on x and some x
prime and it can be proportional to a negative log likelihood function where you have p of x given
x prime and resulting functions then are in a similar way as we've seen earlier in this class.
You can use the squared L2 norm where you assume your probability density of a function to be
a normal distribution and uniform variance then you end up with the L2 loss and the L2 loss is
simply x minus x prime and this encapsulated in an L2 norm. Of course you can also do things like
cross entropy so if you assume the Bernoulli distribution you see that we end up exactly
with our cross entropy and here then this is simply the sum over the weighted xi's times the
logarithm of x prime i plus one minus xi times the logarithm of one minus x prime i. Remember that if
you want to use it this way then your x's need to be in the range of probabilities so if you want to
apply this kind of loss function then you may want to use it in combination with a softmax function.
Okay so here are some typical strategies to construct such outer encoders and i think one
of the most popular ones is the under complete outer encoder. So here you enforce information
compression by using fewer neurons in the hidden layer you try to find a transform that does
essentially a dimensionality reduction to the hidden layer and then you try to expand from this
hidden layer onto the original data domain and try to find a solution that produces minimum loss.
So you try to learn compression here. By the way if you do this with linear layers and squared L2
norm you essentially learn a principal component analysis pca. If you use it with non-linear layers
you end up with something like a non-linear pca generalization. There's also things like these
sparse outer encoder and here we have a different idea we even increase the number of neurons.
So you may say why would you increase the number of neurons then you could even find a much simpler
solution like the identity and neglect a couple of those neurons. So this idea will not work
straightforwardly and you have to enforce sparsity which is also coining the name sparse autoencoder.
Here you have to enforce sparsity in the activations using some additional regularization.
For example you can do this with an L1 norm on the activations in Y. Remember the sparsity in
the sparse autoencoder stems from the sparsity in the activations not from the sparsity in the weights
because if you look at your identity then you see this is simply a diagonal matrix with ones on the
diagonal so this would also be a very sparse solution. So again enforce the sparsity on the
activations not on the weights. What else can be done? Well you can use autoencoder variations
you can combine it essentially with all the recipes we've learned so far in this class.
You can build convolutional autoencoders there you replace the fully connected layers with
convolutional layers and you can optionally also add pooling layers. There is the denoising autoencoder
which is also a very interesting concept. There you corrupt the input with noise and
the target is then the noise free original sample. So this then results in a trained system that does
not just do like dimensionality reduction or finding a sparse representation but at the same time
it also performs denoising and you could argue that this is an additional regularization that is
similar to dropout but at the same time it also performs denoising and you could argue that this
is not a dropout but essentially applied to the input layers. There's also a very interesting paper
Presenters
Zugänglich über
Offener Zugang
Dauer
00:19:50 Min
Aufnahmedatum
2020-10-12
Hochgeladen am
2020-10-12 21:46:21
Sprache
en-US
Deep Learning - Unsupervised Learning Part 2
In this video, we show fundamental concepts of autoencoders (AEs) ranging from undercomplete and sparse AEs, over stacked and denoising AEs all the way to Variational Autoencoders.
For reminders to watch the new video follow on Twitter or LinkedIn.
Further Reading:
A gentle Introduction to Deep Learning