54 - Deep Learning - Plain Version 2020 [ID:21188]
50 von 71 angezeigt

Welcome back to deep learning. So today we want to discuss the single shot detectors

and how we can actually approach real-time object detection.

Okay, so fourth part of segmentation and object detection, the single shot detectors. So can't

we just use the region proposal network as a detector in a you look only once fashion?

And this is the idea of YOLO that is a single shot detector. You only look once, you combine

the bounding box prediction and the classification into a single network. And this is done by

subdividing the image essentially into S times S cells. And for every cell you do in

parallel the class probability map computation and you produce bounding boxes and confidence.

And this then gives you for each cell B bounding boxes and a confidence score and the class

confidence that is produced from a CNN. So the CNN predicts S times S times 5B plus C

values where C is the number of classes. In the end to produce the final object detection

you compute the overlap of the bounding box with the respective class probability map

and this then allows you to compute the average within this bounding box to produce the final

class of that respective object. And this way you are able to solve complex scenes like

this one and this is really real time. So there is YOLO 9000 which is an improved version

of YOLO which is advertised as better, faster and stronger. So it's better because the batch

normalization is used and they also do high risk classification to improve the mean average

precision by up to 6%. The anchor boxes that are found by the clustering over the training

data improves the recall by 7% and training over multiple scales allows YOLO 9000 to detect

object at different resolutions more easily. It's faster because it's using a different

CNN architecture that speeds up the forward pass and it's stronger because it has this

hierarchical detection on a tree that allows to combine different object detection data

sets. All in this allows YOLO 9000 to detect up to 9000 classes in real time or faster.

There's also the single shot multi box detector in reference 24. It's a popular alternative

to YOLO, a single shot detector like YOLO but only one forward pass through the CNN.

It's called multi box because this is the name of the bounding box regression technique

in reference 15 and it's obviously an object detector. It differs from YOLO in several

aspects but shares the same core idea. Now you have this problem with multiple resolutions

and in particular if you think about tasks like histological images that have a very

very high resolution then you can also work with detectors like RetinaNet and RetinaNet

is essentially using a ResNet CNN decoder so very similar to what we've already seen

in image segmentation and then it's using a feature pyramid net that allows you to couple

the different feature maps that are produced with the original input images that are generated

from the decoder. So you could say it's very similar to a UNet but in contrast to UNet

it does a class and box prediction using a subnet on each of the scales of the feature

pyramid net. So you could say it's a single shot detector that uses UNet simultaneously

to the class and box prediction. Also it uses the focal loss that we will talk about in

a couple of slides. Let's look a bit at the trade of speed and accuracy. You can see that

generally networks that are very accurate are not so fast. So here you see on the x-axis

the GPU time and on the y-axis the overall mean average position and you can see that

you can combine the architectures like single shot detectors, a regional fully connected

network or ideas like FASTA or CNN in combination with different feature extractors like Inception

Resonant, Inception and so on. And this allows us to produce many different combinations

and you can see that if you spend more time on the computation then you typically can

also increase the accuracy and this is reflected in this graph. The class imbalance is key

to tackle the speed accuracy trade-off. All of those single shot detectors evaluate many

hypothesis locations. Most of them are really easy negatives. So this imbalance is not addressed

by the current training and in classical methods we typically dealt with this with hard negative

mining and now the question is can we change the loss function to pay less attention to

easy examples. And this idea exactly brings us to the focal loss and here we can essentially

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

00:08:11 Min

Aufnahmedatum

2020-10-12

Hochgeladen am

2020-10-12 22:06:31

Sprache

en-US

Deep Learning - Segmentation and Object Detection Part 4

In this video, we look at some ideas on how to perform object detection really quickly. This leads to single shot detectors of which YOLO is one of the most popular ones. If you are in need of multi-scale object detection, Retina-Net is a popular choice.

For reminders to watch the new video follow on Twitter or LinkedIn.

Additional References
nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation
X-ray-transform Invariant Anatomical Landmark Detection for Pelvic Trauma Surgery
Retina-net Figure by Marc Aubreville

Further Reading:
A gentle Introduction to Deep Learning

Einbetten
Wordpress FAU Plugin
iFrame
Teilen