богatellism. uniformly oriented.
Welcome to the last lecture of AI at FAU.
So we were coming to an end.
There is no lecture AI 3, which is a pity because there's more AI than I've covered
here or we have covered together, I think we should say.
I would like to basically complete a little bit the reinforcement learning stuff we started
yesterday and then kind of go back over, wrap up with what we've learned and answer any
questions that may have come up in the last days.
Maybe the most important question first, when is the exam?
I've received confirmation this morning when I met Mr Hoffman that the exam will be Tuesday.
But apparently he had scheduled it at 12.30 instead of 14.00 as I had asked him and announced
and so he said, oh yeah, then I'll move it to 14.00, not a problem.
And now there's an email from him saying, we have to talk, so I don't know.
But we do have a room, it's a Hörsaal 11.
Okay so I hope to resolve this today when I actually get him on the phone.
The Nachklausur KI1 is going to be on Monday, 10.30 in H10 in case one of you is affected.
So we know that, even though the official database doesn't know it yet.
So that's the state in the never ending story of the date and time.
Any questions so far, admin stuff?
We do plan to have the, do the corrections directly and then have a close way in the
days after.
Good, so reinforcement learning is a form of unsupervised learning and unsupervised
learning is learning without labeled examples.
The kind of, it's a slightly more tedious way of learning because you don't have examples
you can just optimize for.
But it's in a way more realistic.
So we're learning from rewards, which we in this case we also call reinforcements and
they can, these goals can kind of come at the end or can be hints of the environment
to the agent in between.
So the topic of having rewards is actually something that points us into the direction
in which a solution here to reinforcement learning could lie.
We've introduced rewards as part of Markov decision procedures and the idea here that
is the main idea in reinforcement learning is that you want to look at reinforcement
learning in a way as an MDP with the only difference that in Markov decision procedures
the reward function was totally observable, whereas it's only partially observable in
unsupervised learning.
You should think of these kind of delayed rewards, right, the reinforcements don't
come after every action.
In MDPs we had a reward in every action.
So the reinforcements really come at intervals or at the end, so we interpret that as a reward
function which is only partially observable.
You're in theory, in fiction, you're getting a reward after every action except nobody
tells you what it is, which is realistic.
You come to AI lectures every Wednesday and Thursday and you get a reward for that even
if you don't know that because by learning, right, it's not directly observable but you're
getting something out of it apparently or you're expecting to get something out of that.
And then of course the day of reckoning is Tuesday, you get your reinforcement in kind
of ultimately in the exam.
Of course there's intermediate rewards in getting points for their homeworks as well.
So it's partially, your actions get partially observable rewards.
Presenters
Zugänglich über
Offener Zugang
Dauer
01:05:16 Min
Aufnahmedatum
2018-07-12
Hochgeladen am
2018-07-13 09:56:10
Sprache
en-US
Der Kurs baut auf der Vorlesung Künstliche Intelligenz I vom Wintersemester auf und führt diese weiter.
Lernziele und Kompetenzen
Fach- Lern- bzw. Methodenkompetenz
-
Wissen: Die Studierenden lernen grundlegende Repräsentationsformalismen und Algorithmen der Künstlichen Intelligenz kennen.
-
Anwenden: Die Konzepte werden an Beispielen aus der realen Welt angewandt (bungsaufgaben).
-
Analyse: Die Studierenden lernen über die Modellierung in der Maschine menschliche Intelligenzleistungen besser einzuschätzen.