Welcome everybody. So today we have another presentation on meta-learning and today we
will look into optimization as a model for future learning and the presentation will
be given by Jinwei Song. Jinwei, the stage is yours.
Okay, thank you. Hello everyone. Today I want to introduce a paper called
Optimization as a Model for Future Learning and it was written by Sa-Xing,
Harvey and Hugo Laroscha. First, my presentation consists of four parts.
The first part is introduction, the second part is task description and the
third part which is the most important part is called LSTM-based meta-learners
model and the fourth part is evaluation. Firstly, the introduction. As we know, deep
learning has shown great success in a variety of tasks with large amount of
labeled data in image classification. However, it performed bad in field label
examples. So why gradient-based optimization fell in the face of field
label examples? Because it requires many iterative steps over many examples to
perform well and the momentum at a grade at a delta n at them performed bad under
the constraint of a set number of updates. For each separate data set considered,
the network had to start from a random initialization of its parameters which
has the ability to converge to a good solution. So we introduced an LSTM-based
meta-learner optimizer. The meta-learners captured the short-term
knowledge within a task and the long-term knowledge common among all the
tasks. So now we can only set a number of updates that meta-learners model is
trained to converge to a learner classifier to a good solution quickly on
each task. For the task description, now we can set up the problem as a future
learning. Here is the example of meta-learning setup. As we can see, we
have the meta-trained set and the meta-test set. And each row represents one
task. For each task, we got D training set and a D testing set. In this
illustration, we are considering the one short file class classification task. We
have one example from each of five classes in the training set and two
examples for evaluation in the testing set.
Now we want to optimize the parameter theta using LSTM-based meta-learner on a
training set D train and we evaluated generalization on the test set D test.
We now move to the description of our proposed model for meta-learning. Here is
the model description. Let's start with equation one, which is the update route
used to train deep neural network. As we can see, the theta t-1 are the
parameter of the learners of the t-1 updates and the alpha t is the
learning rate at time t. Lt is the loss optimized by the learners for it t's
update and the nominal theta t-1 Lt is the gradient of its loss with respect to
the parameter theta t-1 and the theta t is the update parameter of the
learners. Now we set the cell state of the LSTM to be the parameter of the
learner. As we can see in the equation two is the update route for
training based on LSTM meta-learner. And we can set our ct equal to theta t and
the candidate cell state ct tilde is equal to minus theta t minus 1 Lt.
We can see the equation one and equation two are resemble if ft equal to 1 and ct
minus 1 equal to theta t minus 1 and it equal to alpha t and ct tilde equal to
minus lambda theta t minus 1 Lt. Now we continue to consider the equation two.
We define the parametric form for it and ft so that the meta learner can
determine optimal value through the course of the updates. The learning rate
and the for-coding gate is a function of the current parameter value nominal
values theta t minus 1, the current gradient of loss Lt and the current loss
Lt and the previous learning rate it or ft minus 1. So for it the meta learners
should be able to finally control the learning rate so as to train the learner
Presenters
Zugänglich über
Offener Zugang
Dauer
00:20:48 Min
Aufnahmedatum
2021-01-18
Hochgeladen am
2021-01-18 11:18:47
Sprache
en-US
Presentation by Jingwei Song.
Abstract: Though deep neural networks have shown great success in the large data domain, they generally perform poorly on few-shot learning tasks, where a model has to quickly generalize after seeing very few examples from each class. The general belief is that gradient-based optimization in high capacity models requires many iterative steps over many examples to perform well. Here, we propose an LSTM-based meta-learner model to learn the exact optimization algorithm used to train another learner neural network in the few-shot regime. The parametrization of our model allows it to learn appropriate parameter updates specifically for the scenario where a set amount of updates will be made, while also learning a general initialization of the learner network that allows for quick convergence of training. We demonstrate that this meta-learning model is competitive with deep metric-learning techniques for few-shot learning.
Paper Link:
https://openreview.net/forum?id=rJY0-Kcll