Welcome back to deep learning. So today we want to talk about the further advanced methods of image segmentation.
So let's look at our slides and you can see this is part two of our small series on image segmentation and object detection.
Now the key idea that we need to know about is how to integrate the context knowledge.
So just using this encoder decoder structure that we talked about in the last video will not be enough to get a good segmentation.
So the key concept is that you somehow have to tell your method where what happened in order to get a good segmentation mask.
So you need to balance the local and the global information and of course this is very important because the local information is crucial to give a good pixel accuracy.
And the global context is of course important in order to figure out the classes correctly and CNNs typically struggle with this balance.
So we now need some good ideas how to incorporate this context information.
Now Long et al. one of the first approaches to do so was essentially using an up sampling that is consisting of learnable transposed convolutions.
And he had the key idea that you want to add links combining the final prediction with the previous lower layers in the finest rides.
Additionally he had one by one convolutions after the pooling layer and then the predictions were added up to make the local predictions with a global structure.
So the network topology is a directed R-cyclic graph with skip connections from lower to higher layers.
And therewith you can then refine a course segmentation.
So let's look at this idea in some more detail. So you can see now if you have the ground truth here on the bottom right this has a very high resolution.
And if you would simply use your CNN and an up sample you would get a very coarse resolution as shown on the left hand side.
So what are Long et al. proposing? Well they propose then to use the information from the previous down sampling approach which still had a higher resolution.
And use it within the decoder branch using a sum to produce a more highly resolved image.
And of course you can then do this again in the decoder branch and you can see that this way we can up sample the segmentation and reuse the information from the encoder branch in order to produce better highly resolved results.
Now you can introduce those skip connections and they produce much better segmentations than if you were just use the decoder and up sample that information.
So you see integrating context knowledge is key. In SegNet a different approach was taken.
So here you also have this encoder decoder structure that is convolutional and here the key idea was that in the up sampling step you reuse the information from the max pooling in the down sampling steps in the encoder such that you get better resolved decoders.
So this is already a pretty good idea to integrate the context knowledge and an even better idea is then demonstrated in UNet. Here the network consists of the encoder branch which is then a contracting path to capture the context and the decoder branch that does a symmetric expansion for the localization.
So the encoder follows the typical structure of a CNN and the decoder now consists of the up sampling step and a concatenation of the previous feature maps of the respective layers of the corresponding encoder step.
So then the training strategy relies also on data augmentation. There were non rigid deformations used and rotation and translation which gave the UNet an additional kick of performance and you can say that UNet is essentially the state of the art method for image segmentation.
This is also the reason why it has the name. So it has its name from its shape. You can see that you get this U structure because you have the high resolution on the fine levels then you down sample up to a lower resolution and then you have the decoder branch that up samples everything again.
And the key information is here the skip connections that are connecting the two respective levels of the decoder and the encoder and this way you can get very very good image segmentations.
It's quite straightforward to train and this paper has been cited thousands of times. Every day you can check the citation count and it already increased.
So Olaf Ronneberger really was able to publish a very important paper here and it's dominating the entire scene of image segmentation.
By the way, there have been many many different modifications proposed since Ronneberger published this paper and a very recent paper just demonstrated that if you apply this to many different image segmentation tasks.
So let's say 10 different image segmentation tasks, then the original definition as published by Ronneberger still outperforms on average all the other suggested improvements that have been proposed since the original publication.
So that's a very interesting observation and I will also put the link to this paper in the description of this video.
Well, what else? So you can see that there are many additional approaches and they can be implemented with the unit so they can use dilated convolutions and there have been many of these very small changes that have been suggested.
And they may be useful for a particular task, but for general image segmentation, the unit has been shown to still outperform such approaches.
Still, there's things like using network stacks that can be very beneficial.
There's even multi-scale networks that even further go into this idea of using the image at different scales.
You can also do things like defer the context modeling to another network, then you can also incorporate recurrent neural networks and also very nice is the idea to refine the resulting segmentation maps using a conditional random field.
So we have some of these additional approaches such that you can see what we're talking about.
So the idea is that you use dilated convolutions to support exponentially expanding the receptive field without losing the resolution that you introduced in the dilation rate that controls the up sampling factor.
And you then stack this on top such that you make the receptive field grow exponentially while the number of parameters for the filters grows linear.
So in specific applications where you have a broad range of magnifications happening, this can be very useful.
So it really depends on your application.
Examples for this are DeepLab, ENET and the multi-scale context aggregation module in reference 28.
The main issue, of course, here is there's no efficient implementation available, so the benefit is somewhat unclear.
Another approach that I would like to show you here are these so-called stacked hourglass networks.
So here the idea is that you use something very similar as a unit, but you would put in an additional trainable part in the skip connection.
So that's essentially the main idea that you do.
And then you can use this hourglass module and stack it behind each other.
So you have essentially multiple refinement steps after each other and you always return to the original resolution and you can plug in a second network essentially as an artifact correction network.
Now, what's really nice about this kind of hourglass network approach is that you return to the original resolution.
And let's say you're predicting several classes at the same time, then you end up with several segmentation masks for the different classes.
Presenters
Zugänglich über
Offener Zugang
Dauer
00:14:24 Min
Aufnahmedatum
2020-10-12
Hochgeladen am
2020-10-12 22:06:21
Sprache
en-US
Deep Learning - Segmentation and Object Detection Part 2
In this video, we discuss ideas on how to improve on image segmentation. In particular skip connections as used in the U-Net have been applied very successfully here. Also, we, look into other advanced methods such as stacked hourglasses, convolutional pose machines, and conditional random fields in combination with deep learning.
For reminders to watch the new video follow on Twitter or LinkedIn.
Additional References
nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation
X-ray-transform Invariant Anatomical Landmark Detection for Pelvic Trauma Surgery
Further Reading:
A gentle Introduction to Deep Learning