Okay. So, the quiz seems over. I definitely gave you too much time. But that's fine. And
there seems to be nothing really wrong with no red flags. The first one here, basically
reducing MDP to POMDP, something you might want to look at again, and decision trees here as well.
Yeah, in any case. So, are there any questions after a week of being free to work on the
challenge? By the way, questions have been better. Of course, if there's only seven people present,
I shouldn't be hoping for too much. But remember, asking questions is a good way of learning and is
extremely effective. And in a smaller group might even be more effective than in a bigger group.
So, what are we doing at the moment? We're talking about machine learning. And in particular,
about a variant called supervised learning, where we're learning from examples. We're seeing
positive and negative examples. And try to mimic that. The setup is that we have what is called an
inductive learning problem, where we have a hypothesis set, that kind of functions we want
to learn. Where we have a target function, which is the thing we want to learn. And we have a couple
of examples which are the input-output pairs of that function. Of the target function. And remember,
the target function does not have to be in the hypothesis set. These things we're playing all
the time. And we have been looking at, I'm recapping a little bit from last time so that we get a
smoother transition into the stuff. So, we were counting error rates. Showing basically
the algorithm unseen examples and see how many of those does the algorithm get right.
And the hope is that the more training examples, we show it that the error rate actually gets
better and better until it becomes essentially zero. And we've generalised that. That's kind of
a very important concept. Is that error rates are very simplistic. They actually don't make a
difference between getting things right, getting things wrong. And as our example of email spam
classification shows, getting it right, getting it wrong are not essentially the same. So, we
generalise this idea to general loss functions. A loss function is something if for an input
X, we're expecting output Y in the unseen example, but we're getting Y hat instead. Then we define
for that triple a loss function. And very often this is independent of X. So, if we only have the
loss of expecting Y, getting Y hat, then that's the typical kind of loss function we almost always
have. And in the ham spam example, we have one of these special loss functions that are independent
of where we incur the loss. And now the idea is that instead of minimising error rates, we actually
minimise the loss. And that's kind of like maximising expected utility by maximising the
expected loss. And so, we've basically kind of used that, reformulated everything we do in terms of
losses. So, the generalisation loss is really the expected loss, which is the sum over all pairs in
our set. The loss they incur times weighted by the probability of the examples occurring. Right? And
then we just need to minimise that. Okay? That's ideally the, that's the ideally what we should be
learning. The problem is we don't know the probabilities. Right? We have no way of knowing them.
So, this is not something we can actually do in practice. So, what we do instead is we take what
we have, namely we look only at the examples. And then we get the empirical loss. And that's
something we can actually minimise. And that's what we do instead. Even though it's a very,
very poor substitute, it's the best we have. Right? And there's various reasons why this is not
optimal. There's a difference between the generalisation loss and the empirical loss. And because
this, the problem might not be realisable, meaning the function we're gunning for is not even in the
hypothesis set. We might have variance over different subsets of examples. We might have
noise. We might have computational complexity. If the hypothesis base is huge, then searching for
the right function, if we have lots and lots and lots and lots of choice, might be more than we
have cycles to use for learning. And so, there's a big, there is a big difference, which makes it
somewhat surprising that machine learning actually works. But it does in many situations. But also
explains where it doesn't work. And the idea is that we can use this for regularisation. Remember
that regularisation is the situation where we're trying to optimise the hypothesis space at the
same time as in that hypothesis space. The hypothesis. And the idea is to basically co-optimise
them. In this case, the cost we are going to optimise, minimise, is the empirical loss plus the
complexity. We're allowing a parameter lambda in there because it might just be that the scales on
Presenters
Zugänglich über
Offener Zugang
Dauer
01:23:00 Min
Aufnahmedatum
2025-06-24
Hochgeladen am
2025-06-25 02:10:33
Sprache
en-US