17 - Artificial Intelligence II [ID:47308]
50 von 612 angezeigt

Right, so you were talking with Dennis about neural networks.

All right, and we're talking only about the simplest of networks, namely feed forward

networks. They are essentially directed a cyclic graph. In comparison to human brains,

that's not an accurate model, but the math is kind of simple. These cyclic networks,

we call recurrent networks, and we need, and probably know this from a basic class of

computer architecture. You can only have state in a network of gates. If you have cycles,

and that gives you flip flops and memory and all of those kind of things, that's not

what we're looking at. The surprising thing is that even in this very, very simple feed

forward network, we can do stuff. The simplest thing you looked at was perceptron networks.

Networks that are just one layer, and if you think about it, every little neuron is actually

a binary classifier, or has in it this binary classifier, possibly softened by a logistic

function. If we have a perceptron network, that's essentially how it looks. An input layer,

an output layer, a couple of weights. Between any, and if we have a full network, between

any input and output neuron, we have a weight. The weight, basically, the weight matrix

really tells us about what the function is. In general, the function is something like

a multi-dimensional threshold function, a cliff. That's what a perceptron can be. Essentially

a multi-dimensional linear classifier smoothed over by a sigmoid. You look at that, the

mass of this is very easy. If you have a bigger network, with hidden layers in between,

things get more interesting. We can do more functions, as you've seen. We get essentially

bigger polynomials, more interesting functions that means. The idea is basically, in neural

networks, we can adjust the functions they correspond to. We have some input activation

which gets pushed or fed forward through the network. The weights actually change the

things that happen in the network so you get an output pattern. You can imagine the more

weights we have, the more interesting functions become. The kind of converse to this is that

single-layer perceptrons are relatively boring things. They are a relatively small set

of functions you can build with single-layer perceptrons. In particular, there are boolean

functions like the XO function, which, where you can already see that the data is actually

not linearly separable, so a single-layer perceptron cannot learn that function. If the boolean

function you want to represent is XO, then a decision tree we can find for XO, but a single

layer perceptron can't do the trick. Which is why? Which is why you almost never hear

about them anymore. We have networks that look more like this, that have multi-layers.

Somebody, and I must say I envy this person for the good idea, had the wonderful idea to

say, these things here are much better than these kind of things. We call those deep learning.

Isn't that a wonderful word? It makes you instantly famous because you're doing deep stuff.

It means nothing, but we have more than two layers, more than the input and the output

layers. There's nothing more to it. But it sounds great, doesn't it? Okay, so we have

to do deep learning. It just means we have to understand this math, rather than just

that math. Okay, so as a kind of an exercise you looked at, perceptron learning and the

idea is that since we're in supervised learning, we can actually have examples, and the examples

say an input pattern corresponds to an output pattern. So what easier than to say, oh, if

we feed forward the input pattern, we're getting an output, and then we'll just measure

how wrong the output is. We call that the loss, and then we just minimize that loss. Very

simple. There's a couple of questions left over. How do we measure the loss? And empirically,

the square error is a good measure, mostly because it's convex. The square is a convex

function, which means it has unique global maxima. Wonderful. Okay, that's essentially

the reason why we like the square error measurement. Okay, and so we basically do optimization

by gradient descent, since everything, since the math is so linear, essentially, we can

do the math very easily. We have differentiable functions. All of that is very nice, so we

can just do what you did in high school, namely realize that at the minimum, we have to

have zero in the partial derivatives. That's what we're looking for by gradient descent.

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

01:22:36 Min

Aufnahmedatum

2023-06-13

Hochgeladen am

2023-06-14 16:29:05

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen