10 - Seminar Meta Learning (SemMeL) - Jingwei Song - Optimization as a Model for Few-Shot Learning [ID:28106]
50 von 166 angezeigt

Welcome everybody. So today we have another presentation on meta-learning and today we

will look into optimization as a model for future learning and the presentation will

be given by Jinwei Song. Jinwei, the stage is yours.

Okay, thank you. Hello everyone. Today I want to introduce a paper called

Optimization as a Model for Future Learning and it was written by Sa-Xing,

Harvey and Hugo Laroscha. First, my presentation consists of four parts.

The first part is introduction, the second part is task description and the

third part which is the most important part is called LSTM-based meta-learners

model and the fourth part is evaluation. Firstly, the introduction. As we know, deep

learning has shown great success in a variety of tasks with large amount of

labeled data in image classification. However, it performed bad in field label

examples. So why gradient-based optimization fell in the face of field

label examples? Because it requires many iterative steps over many examples to

perform well and the momentum at a grade at a delta n at them performed bad under

the constraint of a set number of updates. For each separate data set considered,

the network had to start from a random initialization of its parameters which

has the ability to converge to a good solution. So we introduced an LSTM-based

meta-learner optimizer. The meta-learners captured the short-term

knowledge within a task and the long-term knowledge common among all the

tasks. So now we can only set a number of updates that meta-learners model is

trained to converge to a learner classifier to a good solution quickly on

each task. For the task description, now we can set up the problem as a future

learning. Here is the example of meta-learning setup. As we can see, we

have the meta-trained set and the meta-test set. And each row represents one

task. For each task, we got D training set and a D testing set. In this

illustration, we are considering the one short file class classification task. We

have one example from each of five classes in the training set and two

examples for evaluation in the testing set.

Now we want to optimize the parameter theta using LSTM-based meta-learner on a

training set D train and we evaluated generalization on the test set D test.

We now move to the description of our proposed model for meta-learning. Here is

the model description. Let's start with equation one, which is the update route

used to train deep neural network. As we can see, the theta t-1 are the

parameter of the learners of the t-1 updates and the alpha t is the

learning rate at time t. Lt is the loss optimized by the learners for it t's

update and the nominal theta t-1 Lt is the gradient of its loss with respect to

the parameter theta t-1 and the theta t is the update parameter of the

learners. Now we set the cell state of the LSTM to be the parameter of the

learner. As we can see in the equation two is the update route for

training based on LSTM meta-learner. And we can set our ct equal to theta t and

the candidate cell state ct tilde is equal to minus theta t minus 1 Lt.

We can see the equation one and equation two are resemble if ft equal to 1 and ct

minus 1 equal to theta t minus 1 and it equal to alpha t and ct tilde equal to

minus lambda theta t minus 1 Lt. Now we continue to consider the equation two.

We define the parametric form for it and ft so that the meta learner can

determine optimal value through the course of the updates. The learning rate

and the for-coding gate is a function of the current parameter value nominal

values theta t minus 1, the current gradient of loss Lt and the current loss

Lt and the previous learning rate it or ft minus 1. So for it the meta learners

should be able to finally control the learning rate so as to train the learner

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

00:20:48 Min

Aufnahmedatum

2021-01-18

Hochgeladen am

2021-01-18 11:18:47

Sprache

en-US

Presentation by Jingwei Song.

Abstract: Though deep neural networks have shown great success in the large data domain, they generally perform poorly on few-shot learning tasks, where a model has to quickly generalize after seeing very few examples from each class. The general belief is that gradient-based optimization in high capacity models requires many iterative steps over many examples to perform well. Here, we propose an LSTM-based meta-learner model to learn the exact optimization algorithm used to train another learner neural network in the few-shot regime. The parametrization of our model allows it to learn appropriate parameter updates specifically for the scenario where a set amount of updates will be made, while also learning a general initialization of the learner network that allows for quick convergence of training. We demonstrate that this meta-learning model is competitive with deep metric-learning techniques for few-shot learning.

Paper Link:
https://openreview.net/forum?id=rJY0-Kcll

Tags

meta learning
Einbetten
Wordpress FAU Plugin
iFrame
Teilen