So, good morning together for the other one.
So I had had now in repeating the stuff from yesterday before without recording.
Now I will go on with new material here.
But the starting point of all of this was we will continue to speak about large closed
dynamical systems. And large closed dynamical systems as an easy architecture is the picture
above. But this is unsolvable. So we had to invent the architectural teacher forcing,
something that I have done 12 years ago. And then you get this picture here, which allows
you to say, now in the past I have the learning here, so that we can have an extended learning.
But if you really are able to learn down to zero error, then there's no more information
flow along the green lines here and have solved the original problem. So this is the HCNN
architecture. You can say that's the most basic architecture with which you can try
to do learning of large closed dynamical systems. And what I want to do now is I want to show
you that there are even more ideas needed to learn this historical consistent neural
networks. Historical consistent was the naming first of all historical because you have only
one history that you learn because you have to extend the unfolding along the whole time
series and consistent because you have a story where you have symmetric handling of the past
and the future. So and first of all, if you work with data, you have to think about preprocessing,
preprocessing of the data. And the system here are so powerful so that you can minimize
the handling of the preprocessing to relatively simple types of preprocessing. And here are
three different descriptions I show you. And in this HCNN environments, I never have used
more complicated things than these three examples here. So what is the first type of preprocessing?
So the idea is that the original raw data are capital Y. And the data which are presented
to the learning of the neural network are the small y here. So the first idea is to
say I do a linear transformation such that the raw data, I compute the average value
of the raw data. And then I do this linear transformation. The obvious consequence of
this is that you have data which are fluctuating around zero because you have subtracted the
average value here. And you have a scaling parameter which makes this thing small enough
so that except outliers that the most of the data are in the interval between minus one
and plus one. So this is a linear transformation. You have to find the average value of the
raw data and the scaling parameter so that this thing here is between one and minus one.
It's not only an easy computation for the preprocessing, also for the post-processing.
Please have in mind your customers are not interested in a forecast of these values here.
They want to have a forecast of raw data of the time series in raw data units. Which means
in a post-processing step you have to turn around the preprocessing. And that's easy
if you have such a thing here. So if you have a forecast in this range here, then you have
to take one over A and offset here. And then you have the forecast of the post-processed
values in a form so that your customer will like it. I say again, they do not like the
forecast of this thing. They like the forecast of the values they want to have. The second
possibility is if you have a trend in the time series. An example of what you can do
here, upside here for the first example is let's speak about temperature. So normally
in our area here the temperature always is a positive number. So it might be most of
the time between zero and 30 or 35 degrees Celsius. So which means if you would do such
a change here, then you would have something which is fluctuating around zero. This is
for the computation, not that I like so deep temperatures. And then the next possibility
is if you speak about things which might have a trend. And as I told you before, trends
in your network descriptions are not good because the neural network adjusts all the
parameter in between so that the information flows are living between minus one and plus
one because that's the limitation of the Tangent's age. And if you would have a trend from the
past in the future direction, then in the generalization you would sit always in the
limitation of the Tangent's age on one or the other side.
Presenters
Zugänglich über
Offener Zugang
Dauer
01:32:10 Min
Aufnahmedatum
2022-04-20
Hochgeladen am
2022-04-20 17:26:05
Sprache
en-US