So, non-operators, so is it even sensible to have these all powerful networks?
We have already seen that using a fully connected layer everywhere is not really the way to
go.
Instead, we use convolutional layers and batch normalization, et cetera, et cetera.
And maybe we can take this concept even further if we know more about our problem.
But let's start with the weakly supervised part.
So learning with limited annotations, what does that even mean?
We have seen for supervised learning that we can achieve impressive results.
So here you see an example or an output of mask R-CNN where we see clearly the people,
the person boundaries and the boundaries of the airplane, different instances of persons
are detected.
So you've talked about this last week.
And kind of a prerequisite for this kind of quality segmentation object detection is that
we on one hand have large amounts of training data.
So remember ImageNet, remember MSCoco for example.
And on the other hand, that this data comes with consistent and high quality annotations
so that the network can really learn the concepts based on the training data and based on the
labels that it has.
But if we look at what this means getting high quality annotations, we can take a second
look basically.
So if we just want to classify what is in an image as you can see here, it's relatively
fast.
So for MSCoco which has around 90 classes, they had a pretty smart way of annotating
it.
It took around 27 seconds per image to say which objects are contained in this image.
And in this case it's for example dog and bottle.
So this is the only label that is provided for this image in this very first case.
And we call this image level labels because we don't know how many objects are there.
We don't know what is their relationship to each other.
And we also don't know what their exact boundaries are for example.
Then kind of the next step in the pipeline is so-called instant spotting.
So this is now where are the objects located, localization task.
And if we have the class labels, then this can be done in another 14 seconds.
But here we are already at half a minute.
If you think about ImageNet for example, this scales pretty rapidly with the amount of working
hours that you need to annotate such a data set.
Now instance segmentation, so really creating basically dense boundaries for these objects
and really getting segmentation masks takes around 80 seconds.
This is now not per image but actually per instance in the image.
These are all average values so of course they are more complicated and easier to segment
objects and instances in such an image but generally around 80 seconds per instance.
Now if we want to get to annotations like this, so really dense and pixel labels where we
don't have overlap between the different classes and we don't have unnecessary overlap between
the different classes, it takes around 1.5 hours per image.
So this is again per image.
But just looking at this you can see how easily this annotation effort scales upwards and
how much money we actually need to spend to get good quality annotations and we haven't
even talked about getting just the data for it.
So on one hand we have the issue with getting data.
This is in the medical domain a big problem but it's even harder to find a physician or
Presenters
Zugänglich über
Offener Zugang
Dauer
01:17:03 Min
Aufnahmedatum
2020-01-28
Hochgeladen am
2020-01-28 17:09:03
Sprache
en-US