Okay, so we're still in the intro phase to machine learning and we are essentially looking
at a problem that plagues all the all learning paradigms which is the problem of overfitting
versus underfitting. So machine learning is an optimization process and we're trying to
optimize essentially as an agent what we do with respect to this external performance
measure and at the moment we are looking at machine learning from examples and there's
always depending on what kind of examples we get which only partially mirror the underlying
processes we really want to learn. So there's the tendency that instead of learning the
underlying processes we're just learning the examples we've seen so far which of course
is only fair because that's the only thing we're given. So the question is can we do
something about overfitting? And can we do something about underfitting? Underfitting
we can normally solve relatively easily by giving or looking at more examples if we have
them. Overfitting can also sometimes be cured by more examples because they will naturally
generalize the behavior but we want to do something about that actively. So we want
to sometimes generalize our solutions by just looking at them in and of themselves. That
sometimes helps. And we looked at one example there which was decision tree pruning which
was remember decision tree learning found out a nice decision tree and the question
is can we make that better? And the idea here is that we'll go through the terminal nodes
of that tree and for every one of those look at whether it offers us enough information
gain. And the really only question there is because we can compute the information gain
already, the only question is how much is enough to think of a node as irrelevant? Unfortunately
statistics have an answer for us by just using standard significance tests. And the idea
for these significant tests is that we want to use the information gain kind of as a measure
and we want to distinguish the information gain a node gives us with, well compare to,
instead of distinguish I must say, compare the information gain that a node, a particular
terminal node gives us to the information gain we're expecting with the null hypothesis,
namely that everything is just random. And if it's sufficiently near, for some value
of sufficiently near randomness we can say this node doesn't give us enough information
so we throw it out and we can make our decision trees smaller and of course smaller decision
trees make less decisions so possibly they generalise better. That's the idea. And then
there are some standards, tricks of the trade. For instance looking at these, looking at
the errors in a kind of a squared, sum of squared errors way and we know something about
how sum of squared errors should be distributed and comparing to this function how squared
errors should be distributed will give us a measure. And by some statistics voodoo we
know that if this quantity up there is of that size then a statistician would say it's
significant. Don't erase me. Okay? That's the idea. What we're seeing here, I mean what
we've been seeing the whole semester is that AI has been on a huge shopping tour everywhere
into logic, into probabilities, into statistics, into control theory, into all kinds of things
and has been kind of reinterpreting them in the context of trying to build intelligent
agents and actually getting down to the nitty gritty details and making those things efficient
and practical. But for many things, many of the ideas come from other sciences. The same
thing happens by the way with psychology and philosophy and all of those kind of things.
Here AI actually takes in everything that's needed and munges it up under the heading
of oh we make intelligent agents. Which is why I'm kind of in this course always alternating
between showing you funny agent pictures and essentially doing math. You could think of
this as an applied math course. For many intents and purposes it is, but it's also kind of
an applied math course where we're implementing math more than applied mathematicians do it.
Here we're on a shopping tour into statistics. Information gain and so on is on a shopping
tour into information and control theory and so on.
What we looked at next was looking at not only optimizing for the best hypothesis under
some criteria, but also finding the right hypothesis. One of the ways we're doing that
Presenters
Zugänglich über
Offener Zugang
Dauer
01:25:36 Min
Aufnahmedatum
2018-06-14
Hochgeladen am
2018-06-21 08:09:39
Sprache
en-US
Der Kurs baut auf der Vorlesung Künstliche Intelligenz I vom Wintersemester auf und führt diese weiter.
Lernziele und Kompetenzen
Fach- Lern- bzw. Methodenkompetenz
-
Wissen: Die Studierenden lernen grundlegende Repräsentationsformalismen und Algorithmen der Künstlichen Intelligenz kennen.
-
Anwenden: Die Konzepte werden an Beispielen aus der realen Welt angewandt (bungsaufgaben).
-
Analyse: Die Studierenden lernen über die Modellierung in der Maschine menschliche Intelligenzleistungen besser einzuschätzen.