Good.
So now let's go on.
Before lunch we stopped with the point that the framework of HCMN is a nice idea, but
to bring it to a practical success, we need additional ideas and not only one.
The first point we have discussed is the handling of the learning rate is simple.
The next point was if you have so large systems, you have to think about sparsity and how to
do so.
At the end you can handle this by a simple rule of thumb with 50 divided by the dimensionality
of S or the dimensionality of the matrix here.
And so then you have to live with large sparse matrixes if you are really interested in large
dynamic systems.
And then that's the basic.
Otherwise you are unable to do the computation of large systems.
And then the next topic was to think about how can you improve the memory of the system.
And this is a big step in making it practical.
Because if you do this teacher forcing each step, this is too much help.
In the beginning you need it.
Please when you start with this thing here, first of all, start with P equal to zero and
then run it to as good as possible solution.
What is as good as possible?
If it's not good, then simply increase the dimensionality of S to the point so that you
really here are with target near to zero.
Exactly zero is nonsense, but so that you have a small value there.
So then you have the dimensionality, which is large enough.
Combined with the dimensionality, you have the sparsity level.
And about the length of the unfolding, we do not have to discuss because we unfolded
along the whole time series.
And then there are not many metaparameters to think about.
So the only point is if you have such a long unfolding over a hundred of steps, then long
memory is definitely a point you have to think about.
And we had three different possibilities about this.
Large spars alone or large spars in combination with partial teacher forcing or with LSTM.
And if you really have large spars, then it looks like that LSTM is not reasonable because
the learning itself absorbs it in your network part.
Only if you have smaller systems, not large spars, then the LSTM could be reasonable.
But the partial teacher forcing in every case is reasonable.
I have shown it to you in this exercise here.
It's starting here.
So this is a bad solution, even after 30,000 epochs of learning.
Even after a check if the learning rate is OK in the form that the distribution of the
parameters in the matrix A is fine.
And so therefore you could say even the start distribution here is OK.
So it's not too far away from uniform distribution.
And it's not artificially shrinked between a minus value and a plus value here.
So it is not minus 2.5 plus 2.5.
No, no.
It's what the system itself is learning.
So you have a technical, from a technical viewpoint, you have a good solution.
Nevertheless, from the viewpoint of the outcome and generalization behavior, it's awful.
So we have one instrument more.
Presenters
Zugänglich über
Offener Zugang
Dauer
01:07:00 Min
Aufnahmedatum
2022-04-20
Hochgeladen am
2022-04-20 19:06:03
Sprache
en-US