15 - Artificial Intelligence II [ID:57517]
50 von 693 angezeigt

Okay. So, the quiz seems over. I definitely gave you too much time. But that's fine. And

there seems to be nothing really wrong with no red flags. The first one here, basically

reducing MDP to POMDP, something you might want to look at again, and decision trees here as well.

Yeah, in any case. So, are there any questions after a week of being free to work on the

challenge? By the way, questions have been better. Of course, if there's only seven people present,

I shouldn't be hoping for too much. But remember, asking questions is a good way of learning and is

extremely effective. And in a smaller group might even be more effective than in a bigger group.

So, what are we doing at the moment? We're talking about machine learning. And in particular,

about a variant called supervised learning, where we're learning from examples. We're seeing

positive and negative examples. And try to mimic that. The setup is that we have what is called an

inductive learning problem, where we have a hypothesis set, that kind of functions we want

to learn. Where we have a target function, which is the thing we want to learn. And we have a couple

of examples which are the input-output pairs of that function. Of the target function. And remember,

the target function does not have to be in the hypothesis set. These things we're playing all

the time. And we have been looking at, I'm recapping a little bit from last time so that we get a

smoother transition into the stuff. So, we were counting error rates. Showing basically

the algorithm unseen examples and see how many of those does the algorithm get right.

And the hope is that the more training examples, we show it that the error rate actually gets

better and better until it becomes essentially zero. And we've generalised that. That's kind of

a very important concept. Is that error rates are very simplistic. They actually don't make a

difference between getting things right, getting things wrong. And as our example of email spam

classification shows, getting it right, getting it wrong are not essentially the same. So, we

generalise this idea to general loss functions. A loss function is something if for an input

X, we're expecting output Y in the unseen example, but we're getting Y hat instead. Then we define

for that triple a loss function. And very often this is independent of X. So, if we only have the

loss of expecting Y, getting Y hat, then that's the typical kind of loss function we almost always

have. And in the ham spam example, we have one of these special loss functions that are independent

of where we incur the loss. And now the idea is that instead of minimising error rates, we actually

minimise the loss. And that's kind of like maximising expected utility by maximising the

expected loss. And so, we've basically kind of used that, reformulated everything we do in terms of

losses. So, the generalisation loss is really the expected loss, which is the sum over all pairs in

our set. The loss they incur times weighted by the probability of the examples occurring. Right? And

then we just need to minimise that. Okay? That's ideally the, that's the ideally what we should be

learning. The problem is we don't know the probabilities. Right? We have no way of knowing them.

So, this is not something we can actually do in practice. So, what we do instead is we take what

we have, namely we look only at the examples. And then we get the empirical loss. And that's

something we can actually minimise. And that's what we do instead. Even though it's a very,

very poor substitute, it's the best we have. Right? And there's various reasons why this is not

optimal. There's a difference between the generalisation loss and the empirical loss. And because

this, the problem might not be realisable, meaning the function we're gunning for is not even in the

hypothesis set. We might have variance over different subsets of examples. We might have

noise. We might have computational complexity. If the hypothesis base is huge, then searching for

the right function, if we have lots and lots and lots and lots of choice, might be more than we

have cycles to use for learning. And so, there's a big, there is a big difference, which makes it

somewhat surprising that machine learning actually works. But it does in many situations. But also

explains where it doesn't work. And the idea is that we can use this for regularisation. Remember

that regularisation is the situation where we're trying to optimise the hypothesis space at the

same time as in that hypothesis space. The hypothesis. And the idea is to basically co-optimise

them. In this case, the cost we are going to optimise, minimise, is the empirical loss plus the

complexity. We're allowing a parameter lambda in there because it might just be that the scales on

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

01:23:00 Min

Aufnahmedatum

2025-06-24

Hochgeladen am

2025-06-25 02:10:33

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen