6 - Artificial Intelligence II [ID:52578]
50 von 776 angezeigt

Okay, then let's start.

I'm noticing the numbers dwindling.

Is that a Thursday thing or I guess we will find out next week.

Okay, so to recap, I think this is currently probably the most important slide we have.

Namely, like the general cooking recipe of like how to compute probabilities that you're interested in given the probabilities that we know.

Usually the probabilities that we know are conditional probabilities to some extent.

And generally the first thing you should do is try to derive the full joint probability distribution.

Which you usually do by fixing some order of all the variables that we're interested in.

Applying the chain rule and then trying to like exploit anything with respect to independence, conditionally independence.

To somehow partition the full joint probability distribution as a product of the probabilities that we actually know.

Once we have that, we know that we can solve the problem at all in the sense of we know that any kind of probability that we're interested in we can derive in some way.

Then for some given setup that we're interested in, i.e. usually we have some variable that we want to query for.

And we have a bunch of evidence variables and we have a bunch of unknown variables.

Plug in the formula that you have derived for the full joint probability distribution in this general equation for enumeration.

And then ideally try to exploit everything we know about again to simplify this expression as much as possible.

And then we end up with a system of equation.

One equation for each value of the domain of the query variable.

Then we can compute the distribution.

We can compute this marginalization over all the unknowns.

We can normalize and then we're good.

We've done that at some extended example with the like Wumpo setup.

This is just a summary of everything we've done so far.

And then we've talked about Bayesian networks.

This is the running example that we're going to use.

So we assume we have some alarm at home.

We are on a holiday so we don't actually notice when the alarm goes off.

What we notice is that either Mary or John, my two neighbors, call me on my holiday to tell me, hey, by the way, your alarm went off.

That might be because someone is actually trying to burglar our house or it might be because there is an earthquake going on.

And both John and Mary are not necessarily deterministically bound to call us if our alarm goes off.

John might miss the alarm for some reason because he doesn't or like might call us saying the alarm goes off even though it's just his phone.

Mary might miss the alarm entirely because she's listening to music.

Everything in here is basically probabilistic.

We can model this as the following Bayesian network where we basically just draw arrows to represent the dependencies.

So both burglaries and earthquake impact the probability that my alarm goes off.

And my alarm going off impacts the probabilities that John might call me or that Mary might call me.

Again, we're using the same convention as we used earlier with respect to conditional independence, i.e.

we consider these arrows to indicate some quote unquote causal influence in the sense that if we have two nodes and they have a common parent,

then by that we mean to say that given the common parent, we assume those two variables to be conditionally independent.

And in particular, the ones that have no incoming arrows at all, we assume to be independent stochastically.

We can do that for that particular example.

We just plug in all of the things we know.

We compute the full joint probability distribution by, as usual, applying the chain rule,

partitioning everything into the probabilities that we know.

Then we plug it into the equation for enumeration.

And then we end up with this particular expression here.

In the general case, we can now define a Bayesian network as a directed acyclic graph where each node in the graph is associated with a random variable

and a conditional probability table with respect to that variable.

Given the parents of the node in the graph with, again, the convention that we assume that every xi is conditionally independent of any conjunctions of non-descendants of x,

i.e. everything that is not downstream dependent on that particular variable.

And then we can compute the full joint probability distribution by using the chain rule.

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

01:22:11 Min

Aufnahmedatum

2024-05-02

Hochgeladen am

2024-05-03 00:19:06

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen