2 - Deep Learning [ID:12101]
50 von 716 angezeigt

Welcome everybody. Welcome to deep learning. And sorry that I'm only showing up at this

appointment but I had to travel quite a bit. But now we can go ahead and enjoy the teachings

about deep learning. Okay, so what you've seen so far was the introduction and the last

lecture we kind of had technical problems and couldn't talk about so much. So I heard

that you were up to slide number 12, is that approximately what you could speak about before

the projector had a breakdown? Yes, no, maybe? Okay, but you've heard a bit about feed-forward

neural networks and now let's see. So you talked about this and universal approximation

classification trees, how to map them. So you've seen essentially the gist of this is that

with universal approximation one hidden layer is essentially enough to approximate any continuous

function on a compact set. Speaking that the compact set is essentially the distribution

of your data. So everything that we're doing works on the same distribution that you have

seen in the training data set, but there are certain functions that can be better represented

if you construct them not just with a single hidden layer network, but if you construct

them let's say with two layers or even more. Later in the lecture we will hear about exponential

feature reuse that the deeper you build your network the more paths emerge in your network

and you can reuse also many of those features. So the main point that I want you to follow

to understand here is that it's not just universal. So universal approximation tells us one layer

is enough, but actually when you stack layers on top the representational power of the network

increases and you can learn here we had an example where we had something like six neurons

and we couldn't model the function very well, but then if we increase to just seven but

arrange them in two layers we are able to model this function accurately and this is

a zero error and by the way with two layers any tree any decision tree can be approximated

with a neural network without error. So it's simply that the idea is that you use the first

layer to create the partitions and with the partitions you can then model in the second

layer like every patch here and assign it a class. So the first layer takes essentially

the modeling function to form the partitions and then the second layer you just assign

one class to the respective partition and it works for every decision tree. So other

thing that you should take away from this the mechanism that we use in the neural networks

is very powerful. One layer enough to model any kind of function, second layer then or

deeper layers you can model really complex systems and this is also what deep learning

is about that we start modeling very complex systems in a rather compact form. Okay so

this is essentially the gist of the universal approximation theorem that we can approximate

any kind of function on a compact set and the main problem that we still now have is

we don't know how to actually determine the parameters right. So we know we just know

that there exists a solution but we don't know how to get there and this is also one

fundamental problem that we still need to solve. So far we've only learned that there's

potentially a solution or some good solutions but how do we actually get there. And now

we have to go ahead and have to go from the activations to classifications and this is

why we introduce the so-called softmax function. So far we had essentially described the ground

truth the label by some as a y is the ground truth and y hat is the estimate and we were

using essentially minus one to one where we had two classes but this will only apply for

a two class problem right. So if you want to go ahead to multiple classes you have somehow

have to model it in a different way so you cannot just take one number as the class output.

So instead we can use a vector. Now you see that this is a bold we use bold script to

indicate vectors and now we have a set of scalars from one to k where k is the number

of classes and now we can also have many classes we just need a output vector that is just

big enough to model all of the classes. If you have a hundred classes then your output

vector will be a hundred classes. And then for every index we essentially have zero if

it's not the class and we have one if it is the correct class. So this way this is also

called a one hot encoding so in the in the ground truth vector in the y not the y hat

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

01:24:10 Min

Aufnahmedatum

2019-10-29

Hochgeladen am

2019-10-29 21:49:02

Sprache

en-US

Tags

analytic regression gradient function network
Einbetten
Wordpress FAU Plugin
iFrame
Teilen