56 - Deep Learning - Plain Version 2020 [ID:21190]
50 von 111 angezeigt

Welcome back to Deep Learning. So today I want to talk to you about a couple of advanced topics,

in particular looking into sparse annotations. So we know that data quality and annotations

is extremely costly and in the next couple of videos we want to talk about some ideas

on how to save annotations. So the topics will be weekly supervised learning and self-supervised

learning. Okay, so let's look at our slides and see what I have for you. So the topic

weekly and self-supervised learning and we start today with looking into limited annotations

and some definitions. Later we will look into self-supervised learning for representation

learning. So what's the problem with learning with limited annotations? Well, so far we

had the supervised learning and we've seen these impressive results achieved with large

amounts of training data, consistent high quality annotations. So here you see some

example we had annotations for instant-based segmentation and there we had simply the assumption

that all of these annotations are there, we can use them and they are maybe even publicly

available so it's no big deal. But actually that's in most cases not actually true. So

typically you have to annotate and annotation is very costly. So if you look at image level

class labels you will spend approximately 20 seconds per sample. So here you can see

for example the image with dog. There's also ideas where we try to make it faster for example

by instance spotting that you can see here in reference 11. If you then go to instance

segmentation then you actually have to draw outlines and that's at least 80 seconds per

annotation that you have to spend here. And if you go ahead to dense pixel level annotations

you can easily spend one and a half hours for annotating an image like this one. So

you can see that in reference four. Now the difference between weakly supervised learning

and strongly you can see in this graph. So here you see that if we have image labels

of course we can classify image labels and train that and that would be essentially supervised

learning, training of bounding boxes to predict bounding boxes and training with pixel labels

to predict pixel labels. Of course you could also abstract from pixel labels to bounding

boxes or from bounding boxes to image labels and that all would be strong supervision.

Now the idea of weakly supervised is that you start with image labels and go to bounding

boxes or you start with bounding boxes and try to predict pixel labels. So this is the

key idea in weakly supervised learning that you somehow want to use as sparse a few annotation

example and then create much more powerful predictors. So the key ingredients for weakly

supervised learning that you use priors, you use explicit and implicit priors about shape

and size, contrast, also motion can be used for example to shift bounding boxes, the class

distributions some classes are much more frequent than others and similarity across images.

Of course you can also use hints like image labels, bounding boxes, image captions can

be used as weakly supervised labels, sparse temporal labels that are then propagated over

time, scribbles or clicks inside objects and here are a couple of examples of such sparse

annotations for scribbles and clicks. And there are some general approaches, one from

labels to localization would be that you use a pre-trained classification network and then

for example you can use tricks like in the lecture on visualization that you produce

a qualitative segmentation map. So here we had this idea of back propagating the class

label into the image domain in order to produce such labels. Now the problem is that this

classifier was never trained for localized decisions and the second problem is good classifiers

don't automatically yield good maps. So let's look into another idea and the key idea here

is to use global average pooling. So let's rethink about the fully convolutional networks

and what we've been doing there. You remember that we can replace fully connected layers

that have only a fixed input size by mtn convolution. And if you do so, you see that if we have

some input image and we convolve with a tensor, then essentially we get one output. Now if

we have multiple of those tensors, then we would essentially get multiple channels. And

if we now start moving our convolution masks across the image domain, you can see that

if we have a larger input image, then also our outputs will grow with respect to the

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

00:12:13 Min

Aufnahmedatum

2020-10-12

Hochgeladen am

2020-10-12 22:36:20

Sprache

en-US

Deep Learning - Weakly and Self-Supervised Learning Part 1

In this video, we discuss weak supervision and demonstrate how to create class activation maps for localization and how to get from bounding boxes to pixel segmentations.

For reminders to watch the new video follow on Twitter or LinkedIn.

Further Reading:
A gentle Introduction to Deep Learning

Einbetten
Wordpress FAU Plugin
iFrame
Teilen