25 - Pattern Recognition [PR] - PR 21 [ID:23069]
50 von 154 angezeigt

Welcome back to Pattern Recognition. So today we want to start talking a little

bit about optimization and we'll have a couple of refresher elements in case you don't know that

much about different ideas on optimization and you will see that they're actually fairly simple

and they will turn out to be quite useful for your further working pattern recognition.

So today's topic is optimization and of course this is crucial for many things in pattern

recognition so you want to find solutions of complex objective functions and this is a problem

that arises in pattern analysis, machine learning, artificial intelligence and the like. So optimization

also has many faces. There's discrete optimization, there's combinatorial optimization,

genetic algorithms, gradient descent, unconstrained and constrained optimization, linear programming

and convex optimization and so on. So here we essentially need some basic knowledge on

optimization such that you are able to follow the remaining lecture and this is why we want to talk

about a couple of ideas on how to optimize and of course everybody has typically his or her own

favorite algorithm and therefore we will show some of the ideas and in particular what we will focus

on in this class is the optimization of problems that are convex. So here you see the formal

definition of convexity, a function probably a high dimensional one of dimension D that is projected

onto a scalar value is convex if the domain of F is a convex set and if for all points x y within

this domain and theta between 0 and 1 we have the following relation and here you can see that we

have F of theta x plus 1 minus theta y so you can essentially see this is a linear interpolation

between x and y so if you're moving in the input space from x to y then all of the points on the

function will fulfill the property that if you interpolate linearly between F of x and F of y so

here we use the same notation so it's theta F of x plus 1 minus theta F of y so this is again a

linear interpolation in the output domain and this will always be greater than the interpolation

along the original function. So you can also see that a function is concave if minus F is

convex and if we look in this in a kind of geometric interpretation we can say that the

line segment that lies on x F of x and y F of y lies above the graph of F. If you can connect any

two points on the graph with a line and this lies above the graph of F you have a convex function.

A very typical example is a parabola. Let's look a bit into unconstrained optimization. So here we

generally assume that we have some function from Rd to r that is twice differentiable. Now the

unconstrained problem is just the solution of the minimization problem x star equals to the minimum

over F of x and here x star is the optimal point. If we have this particular set of functions meaning

convex functions a necessary and sufficient condition for the minimum are the zero crossings

of the functions gradient. So if you have a convex function and you find a position where

F of x star and the gradient at this position is zero then you must have a minimum and the great

thing is that if you have a convex function and you seek to find the minimum then following the

negative gradient direction will eventually bring you there. So this is already a very nice thing and

also by the way you will find a global minimum in a convex function. So the convex function has only

one global minimum. So the gradient descent methods here will always result in global minima which is

also very nice in terms of the optimization because that essentially means you are not really

dependent on the initialization. It doesn't matter where you start the gradients will always lead

you to the global minimum. Now most methods in this unconstrained optimization follow this

iterative scheme and you initialize somewhere with some x zero that you can choose randomly

essentially or you have a fixed value and then you do iteration steps and you produce a new x k

plus one of the old x k given some function g and g is the update function. So there are different

versions of implementing g one could be just following the negative gradient direction another

one could be shrinkage functions so these are functions that allow you to produce a smaller

function value for a given input of x. Now the iterations terminate if we only have small change

meaning that the difference between x k plus one and x this is small below a threshold that we

introduce here as epsilon so there is no further significant change. Now let's consider a very

simple update scheme I already hinted at that we choose the update function to be the old position

plus some step size t k times delta x of k and here delta x of k is the search direction in the

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

00:20:20 Min

Aufnahmedatum

2020-11-08

Hochgeladen am

2020-11-08 14:48:06

Sprache

en-US

In this video, we look into optimization using gradient descent and back-tracking line search.

This video is released under CC BY 4.0. Please feel free to share and reuse.

For reminders to watch the new video follow on Twitter or LinkedIn. Also, join our network for information about talks, videos, and job offers in our Facebook and LinkedIn Groups.

Music Reference: Damiano Baldoni - Thinking of You

Einbetten
Wordpress FAU Plugin
iFrame
Teilen