Welcome back to Pattern Recognition. So today we want to start talking a little
bit about optimization and we'll have a couple of refresher elements in case you don't know that
much about different ideas on optimization and you will see that they're actually fairly simple
and they will turn out to be quite useful for your further working pattern recognition.
So today's topic is optimization and of course this is crucial for many things in pattern
recognition so you want to find solutions of complex objective functions and this is a problem
that arises in pattern analysis, machine learning, artificial intelligence and the like. So optimization
also has many faces. There's discrete optimization, there's combinatorial optimization,
genetic algorithms, gradient descent, unconstrained and constrained optimization, linear programming
and convex optimization and so on. So here we essentially need some basic knowledge on
optimization such that you are able to follow the remaining lecture and this is why we want to talk
about a couple of ideas on how to optimize and of course everybody has typically his or her own
favorite algorithm and therefore we will show some of the ideas and in particular what we will focus
on in this class is the optimization of problems that are convex. So here you see the formal
definition of convexity, a function probably a high dimensional one of dimension D that is projected
onto a scalar value is convex if the domain of F is a convex set and if for all points x y within
this domain and theta between 0 and 1 we have the following relation and here you can see that we
have F of theta x plus 1 minus theta y so you can essentially see this is a linear interpolation
between x and y so if you're moving in the input space from x to y then all of the points on the
function will fulfill the property that if you interpolate linearly between F of x and F of y so
here we use the same notation so it's theta F of x plus 1 minus theta F of y so this is again a
linear interpolation in the output domain and this will always be greater than the interpolation
along the original function. So you can also see that a function is concave if minus F is
convex and if we look in this in a kind of geometric interpretation we can say that the
line segment that lies on x F of x and y F of y lies above the graph of F. If you can connect any
two points on the graph with a line and this lies above the graph of F you have a convex function.
A very typical example is a parabola. Let's look a bit into unconstrained optimization. So here we
generally assume that we have some function from Rd to r that is twice differentiable. Now the
unconstrained problem is just the solution of the minimization problem x star equals to the minimum
over F of x and here x star is the optimal point. If we have this particular set of functions meaning
convex functions a necessary and sufficient condition for the minimum are the zero crossings
of the functions gradient. So if you have a convex function and you find a position where
F of x star and the gradient at this position is zero then you must have a minimum and the great
thing is that if you have a convex function and you seek to find the minimum then following the
negative gradient direction will eventually bring you there. So this is already a very nice thing and
also by the way you will find a global minimum in a convex function. So the convex function has only
one global minimum. So the gradient descent methods here will always result in global minima which is
also very nice in terms of the optimization because that essentially means you are not really
dependent on the initialization. It doesn't matter where you start the gradients will always lead
you to the global minimum. Now most methods in this unconstrained optimization follow this
iterative scheme and you initialize somewhere with some x zero that you can choose randomly
essentially or you have a fixed value and then you do iteration steps and you produce a new x k
plus one of the old x k given some function g and g is the update function. So there are different
versions of implementing g one could be just following the negative gradient direction another
one could be shrinkage functions so these are functions that allow you to produce a smaller
function value for a given input of x. Now the iterations terminate if we only have small change
meaning that the difference between x k plus one and x this is small below a threshold that we
introduce here as epsilon so there is no further significant change. Now let's consider a very
simple update scheme I already hinted at that we choose the update function to be the old position
plus some step size t k times delta x of k and here delta x of k is the search direction in the
Presenters
Zugänglich über
Offener Zugang
Dauer
00:20:20 Min
Aufnahmedatum
2020-11-08
Hochgeladen am
2020-11-08 14:48:06
Sprache
en-US
In this video, we look into optimization using gradient descent and back-tracking line search.
This video is released under CC BY 4.0. Please feel free to share and reuse.
For reminders to watch the new video follow on Twitter or LinkedIn. Also, join our network for information about talks, videos, and job offers in our Facebook and LinkedIn Groups.
Music Reference: Damiano Baldoni - Thinking of You