So, welcome back to pattern recognition.
So today we want to look a bit more into model assessment and in particularly we want to
know how to estimate the statistics and in particular the performance robustly on datasets
of fixed size.
So the problem is that we want to determine the bias and variance for some learning algorithm
and of course we want to estimate its performance on data that we have not seen yet.
So we want to estimate the performance with respect to unknown distributions.
From what we've seen so far, bias and variance will change with varying samples.
So we will need resampling techniques that can be used in order to create more informative
estimates of general statistics.
Formally we can express this in the following way.
So let's suppose we want to estimate a parameter vector theta that depends on a random sample
set that is given as x ranging from x1 to xn.
Then we can assume that we have an estimator of theta but we do not know its distribution.
So the resampling methods try to estimate the bias and the variance of this estimator
using the subsamples from x.
This then brings us to the jackknife and the jackknife is using the so-called pseudo value
that we index here with i of x and this pseudo value is determined from the estimator in
the following way.
So you take n times the estimated value that is produced on x and then you subtract n minus
one times the estimated value of the set where you're omitting the element i.
Then you can rewrite this breaking essentially up the multiplication with n and you pull
the remaining estimator of x into the right hand part.
So this then gives you n minus one and then the difference between the estimator where
the element i is missing times the complete estimator.
So you can essentially use this pseudo value to determine the performance of our estimator
if the i-th value is missing.
So you essentially estimate the bias between those two models and then you subtract it
from the model that you estimated on the complete data.
So we assume that the bias trend can be estimated from the difference between different estimations
from different sets and here we construct this by the difference between the estimators
when we essentially omit one of the samples in the estimation process.
So the Jackknife principle is essentially that the pseudo values are treated as independent
random variables with mean theta and then using the central limit theorem the maximum
likelihood estimators for the mean and the variance of the pseudo values can be essentially
determined as the mean over all the pseudo values and the variance is determined as one
over n minus one and the sum over the differences of the pseudo values with the respective mean.
So let's look into one example here the estimator for the sample mean is given simply as the
mean value of x.
Now the pseudo values can be determined as n times the mean value of x minus n minus
one times the mean value where x i is missing and if you actually write this up this is
nothing else than x i.
So you can then essentially determine the Jackknife estimate and the Jackknife estimate
is then simply given as the mean so here the mean doesn't change in the Jackknifing but
you see that the variance is changed so the variance is not normalized with one over n
but with one over n minus one so we have a tendency to estimate a higher variance and
you see that if you have large sample numbers it doesn't change that much but if you have
rather low sample numbers then you generally have to estimate the variance higher than
what you get from the typical ML estimate.
So let's have a look into the case where the sample estimator is the variance so here we
Presenters
Zugänglich über
Offener Zugang
Dauer
00:14:03 Min
Aufnahmedatum
2020-11-16
Hochgeladen am
2020-11-17 01:18:58
Sprache
en-US
In this video, we look into how to estimate reliable performance measures on finite data.
This video is released under CC BY 4.0. Please feel free to share and reuse.
For reminders to watch the new video follow on Twitter or LinkedIn. Also, join our network for information about talks, videos, and job offers in our Facebook and LinkedIn Groups.
Music Reference: Damiano Baldoni - Thinking of You