47 - Deep Learning - Plain Version 2020 [ID:21181]
50 von 161 angezeigt

Welcome back to deep learning and today we want to continue talking about unsupervised methods

and today we look into a very popular technique that are the so-called autoencoders.

Okay so here are our slides so part two of our lecture and the topic are autoencoders.

Well the concept of the autoencoder is that we want to use the ideas of feed-forward neural networks

and you could say that a feed-forward neural network is a function of x that produces some

encoding y. Now the problem is how can we generate a loss in such a constellation and the idea is

rather simple we add an additional layer here on top and the purpose of the layer is to

compute a decoding. So we have another layer that is g of y and g of y produces some x hat and the

loss that we can then define is that x hat and x need to be the same. So the autoencoder tries to

learn an approximation of the identity. Well that sounds rather simple and to be honest if we have

exactly the same number of nodes in the input and in the hidden layer for y here then the easiest

solution would probably be the identity. So why is this useful at all? Well let's look at some loss

functions and what you can typically use is a loss function that then operates here on x and some x

prime and it can be proportional to a negative log likelihood function where you have p of x given

x prime and resulting functions then are in a similar way as we've seen earlier in this class.

You can use the squared L2 norm where you assume your probability density of a function to be

a normal distribution and uniform variance then you end up with the L2 loss and the L2 loss is

simply x minus x prime and this encapsulated in an L2 norm. Of course you can also do things like

cross entropy so if you assume the Bernoulli distribution you see that we end up exactly

with our cross entropy and here then this is simply the sum over the weighted xi's times the

logarithm of x prime i plus one minus xi times the logarithm of one minus x prime i. Remember that if

you want to use it this way then your x's need to be in the range of probabilities so if you want to

apply this kind of loss function then you may want to use it in combination with a softmax function.

Okay so here are some typical strategies to construct such outer encoders and i think one

of the most popular ones is the under complete outer encoder. So here you enforce information

compression by using fewer neurons in the hidden layer you try to find a transform that does

essentially a dimensionality reduction to the hidden layer and then you try to expand from this

hidden layer onto the original data domain and try to find a solution that produces minimum loss.

So you try to learn compression here. By the way if you do this with linear layers and squared L2

norm you essentially learn a principal component analysis pca. If you use it with non-linear layers

you end up with something like a non-linear pca generalization. There's also things like these

sparse outer encoder and here we have a different idea we even increase the number of neurons.

So you may say why would you increase the number of neurons then you could even find a much simpler

solution like the identity and neglect a couple of those neurons. So this idea will not work

straightforwardly and you have to enforce sparsity which is also coining the name sparse autoencoder.

Here you have to enforce sparsity in the activations using some additional regularization.

For example you can do this with an L1 norm on the activations in Y. Remember the sparsity in

the sparse autoencoder stems from the sparsity in the activations not from the sparsity in the weights

because if you look at your identity then you see this is simply a diagonal matrix with ones on the

diagonal so this would also be a very sparse solution. So again enforce the sparsity on the

activations not on the weights. What else can be done? Well you can use autoencoder variations

you can combine it essentially with all the recipes we've learned so far in this class.

You can build convolutional autoencoders there you replace the fully connected layers with

convolutional layers and you can optionally also add pooling layers. There is the denoising autoencoder

which is also a very interesting concept. There you corrupt the input with noise and

the target is then the noise free original sample. So this then results in a trained system that does

not just do like dimensionality reduction or finding a sparse representation but at the same time

it also performs denoising and you could argue that this is an additional regularization that is

similar to dropout but at the same time it also performs denoising and you could argue that this

is not a dropout but essentially applied to the input layers. There's also a very interesting paper

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

00:19:50 Min

Aufnahmedatum

2020-10-12

Hochgeladen am

2020-10-12 21:46:21

Sprache

en-US

Deep Learning - Unsupervised Learning Part 2

In this video, we show fundamental concepts of autoencoders (AEs) ranging from undercomplete and sparse AEs, over stacked and denoising AEs all the way to Variational Autoencoders.

For reminders to watch the new video follow on Twitter or LinkedIn.

Further Reading:
A gentle Introduction to Deep Learning

Einbetten
Wordpress FAU Plugin
iFrame
Teilen