Let's assume that the solution that we found before was not good enough.
So is there only the chance to give up or can we do something else?
And this something else that you can do there is you can go in the direction of deep feedforward
neural networks.
So why deep?
Because in itself it looks like a contradiction. You know that with one hidden layer here you
can do everything. So why should I take several hidden layers to come to an output?
That in principle with one hidden layer you can do everything does not mean that this
is the optimal solution for your input output problem.
But indeed it's more complicated to learn because look in the forward pass here you
go from an input vector to something else then to something else something else something
else which means the relationship between input and output then is very indirect.
So what can you do against that this indirectness in the forward and in the backward pass is
not showing as crazy solutions at the end.
What you have to do is you have to rearrange your deep neural network in a way so that
every hidden layer knows I have to do something good for the final solution.
And the way to do this and this is an architectural analogy to the algorithm from Freuntern-Sharpia
as the point here from inputs through a hidden layer through an output to an output this
is the normal feed forward neural network in principle it can do everything.
But now let's take another hidden layer which is above of the first hidden layer so that
we do not miss information which goes through here from here through here we can do a direct
connection from the input to the next hidden layer here.
And then so that this thing here is also learning something now we have a second time or a target
value here but how can I couple the different layers so that they really help each other
and the way to do so is to say now the first hidden layer has learned more or less the
solution and then the solution is given as a setup to the next layer which means the
output which is coming from this and this side this is a sum the sum of the of the values
coming from here and coming from here.
This means the output generated here is a superposition of the offset coming from here
and the new information coming from here and hopefully this thing then is able to give
us a better solution than the original output description here.
So the idea is that step by step we have the chance that the higher levels here have to
learn only a residual error the residual error and not focus on the same thing that the original
level has done. See this thing here does not see the same deviation between output and
target because of the output is better than downside here.
Here the output is only the computation coming from here and then output target is giving
you an error here but in this case here the output should be better and the residual between
output and target here is only a small error which is going down here to give you step
by step a more sophisticated explanation of the final output.
And if you do so yeah why is this a better solution than our universal approximation
statement with a very large hidden layer you can explain everything there then you have
to have a very large hidden layer to satisfy the universal approximation theorem here.
And if you have a very large hidden layer and something's changing here then in parallel
something has to change on the other side to keep it all together set in a superposition
your output is explained but here you have the chance to say okay let's try it and if
it's not too good so let's give the responsibility to next layer next layer and so on so that
this is more a sequential type of the learning then.
Now here you see such a thing in the software here the input is going to all the different
hidden layers all the different hidden layers have their outputs the outputs are stacked
on each other and then here you have the final output then.
Presenters
Zugänglich über
Offener Zugang
Dauer
01:32:34 Min
Aufnahmedatum
2022-04-19
Hochgeladen am
2022-04-19 22:06:05
Sprache
en-US