24 - Deep Learning - Common Practices Part 3 [ID:16900]

41 von 41 angezeigt

Welcome back to deep learning. Today we want to continue talking about our common practices

and the methods that we are interested in today is class imbalance. So a very typical problem is

that one class in particular the very interesting one is not very frequent. So this is a challenge

for all the machine learning algorithms because let's take the example of fraud detection out of

10,000 transactions 9,999 are genuine and only one is fraudulent. So if you classify everything as

genuine you get 99.99 percent accuracy. Obviously if you had a model that would misclassify one out

of 100 transactions then you would end up only in a model with 99 percent accuracy. So this is of

course a very hard problem and in particular in screening applications you have to be very careful

because just classifying everything to the most common class would still get you very very good

accuracy. It doesn't have to be credit cards for example here detecting mitotic cells is a very

similar problem. So a mitosis is a cell undergoing cell division and they are very important as we

already heard in the introduction because if you count the cells under mitosis you know how

aggressively this cancer is growing. So this is a very important feature but you have to detect them

correctly and they make up only a very small portion of the cells and tissues. So the data of

this class is seen much less during the training and measures like the L2 norm or cross entropy

don't show this imbalance so they are not very responsive to this. One thing that you can do for

example is resampling. So the idea is that you balance the class frequencies by sampling classes

differently. So you can under sample this means that you have to throw away a lot of the training

data of the most frequent classes and this way you get to train a classifier that will be balanced

towards both of these classes. Now they're seen approximately as frequent as the other class.

Now the disadvantage of this approach is that you're not using all the data that is being seen

and of course you don't want to throw away data. So another technique is oversampling and you can

just sample more often from the underrepresented classes and in this case you can use all of the

data. Well the disadvantage is of course that it can lead to rather heavy overfitting towards the

less frequently seen examples. Also possible are combinations of under and oversampling

and this then leads to the following procedure. This is an advanced resampling technique to try

to avoid the shortcomings of over and under sampling by synthetic minority oversampling

techniques mode but it's rather uncommon in deep learning. Underfitting caused by undersampling

can be reduced by taking a different subset after each epoch which is quite common and also you can

use data augmentation to help reducing overfitting for underrepresented classes so you essentially

augment more of the samples that you have seen less frequently. Very typical choice.

So instead of fixing the data of course you can also try to adapt the loss function to be stable

with respect to class imbalance and here you then choose a loss with the inverse class frequency.

So you can then create the weighted cross entropy where you introduced it additional

weight Wk and Wk is simply determined as the inverse class frequency. More common in segmentation

problems are then things like a dice based loss based on the dice coefficient that is a very

typical measure for evaluating segmentations. Instead of class frequency weights can also be

adapted with regards to other considerations but we are not discussing them here in this current

lecture. This already brings us to the end of this part and in the final lecture of common practices

we will now discuss measures of evaluation and how to evaluate our models appropriately.

So thank you very much for listening and goodbye.

Teil einer Videoserie :

Deep Learning - Plain Version

Presenters

Prof. Dr. Andreas Maier

Zugänglich über

Offener Zugang

Dauer

00:05:09 Min

Aufnahmedatum

2020-06-01

Hochgeladen am

2020-06-01 01:46:36

Sprache

en-US

Deep Learning - Common Practices Part 3

This video discusses the problem of class imbalance and how to compensate for it.

Further Reading:
A gentle Introduction to Deep Learning

Tags

Per RSS abonnieren