Okay. So what we looked at next was looking at not only optimizing for the best hypothesis
under some criteria, but also finding the right hypothesis space. And one of the ways
we're doing that is by using probability theory again. So what we're doing in this whole part
is that we're looking at the process of predicting the future. For that, we need to make some
assumptions about essentially that the future is somewhat like the past, so that we have
some basis on which to work. And that's really what this IID hypothesis gives us. So we have
an independence hypothesis that says all the examples are independent and they're also
independently distributed. So the priors actually don't change, the probability distributions
don't change, and they're independent. If we have something like this, we have a basis
to make computations with. If this hypothesis doesn't work, for instance, because some country
changes some law which affects what you can build into cars and so on, and then if you're
looking at the probabilities of learning like the quality outcome of cars, then something
fundamental has changed and all bets are off. You can just essentially discard the old examples,
because then we don't have this independently distributed assumption anymore. But given
that, we can look at probabilities of errors. So instead of just looking and finding error
rates, observing errors, we can actually look at... No, here we're still looking at error
rates. So here we're looking at... Wrong introduction, I'm sorry. Misremembered from yesterday. That
comes next. So what we did as well was we looked at this question of how to evaluate
whether your learning algorithm gave you a good result, and the answer is very simple.
You have a test set, and from your examples, you hold back on certain examples, and even
though they're evidence to train with, you actually use them for evaluation, because
the future we don't know yet. Then there is a couple of tricks you can do like make different
partitions of the whole set and use both the examples for training and for evaluation.
Here we come into what I wanted to say. How do you select a good model? Of course, learning,
as we've defined it, is always with respect to a fixed hypothesis space. So in a way,
it's an optimization problem that optimizes over age. In real life, we don't get that
luxury. So we have to find age as well. So it's always kind of a problem finding a better
age and inside age learning. Those kind of have to interleave, and we looked essentially
at kind of a wrapper algorithm where you kind of bump up the learning and the hypothesis
size if you have some kind of a size measure, and kind of look at the results you're getting
via cross out validation. Then you can observe how well you are in a given age, in a given
hypothesis space, here given by some kind of a tree size here for decision tree learning.
Then you can see empirically with very small hypothesis tree sizes, you underfit mostly
because you're not able to realize your perfect function. Then at some point, you start overfitting
and somewhere in between, which is something via the holdout cross validation, you can
actually measure, and then you just stop when it's best. Finding a local minimum and hoping
it's the global one.
Presenters
Zugänglich über
Offener Zugang
Dauer
00:06:10 Min
Aufnahmedatum
2021-03-30
Hochgeladen am
2021-03-31 11:16:43
Sprache
en-US
Recap: Evaluating and Choosing the Best Hypothesis (Part 1)
Main video on the topic in chapter 8 clip 7.