Okay. Time to start. Remember, we're trying to look at the, in the moment, decision theoretic
foundations of agents that can survive or do well in a partially observable, possibly
non-deterministic world. Mostly about two things, three things actually. One is
estimating the likelihoods of possible worlds in a belief state. That's basically
the what is the world like part. And then of course estimating the utility of
possible actions. And the third ingredient is essentially choosing those
actions that maximize the expected utility. And at the end we have agents who
probably use or could use something like this. A decision network, which is
really just a Bayesian networks plus two new kinds of nodes that we've looked
at yesterday. Action nodes, that's the treatment, the square treatment node in
the center here. And utility nodes that compute the, at compute the utility of the
treatment in this case of the action nodes. Okay. And then you can just argmax.
Right. The expected utility is what you have down here. And then you argmax over
the actions. In this case we only have one action node. There might be multiple
of those. Right. Here, the treatment is actually, there might be multiple things
your doctor does to treat you. But here we've just put it into one big node.
Okay. Now there's kind of two things that are missing for a kind of a general
intelligent agent that can survive in even more complex worlds. One is that we
don't have a notion of time. Right. We're currently only looking at episodic.
Um, episodic environments, namely environments that we can do without
representing time explicitly. And the other one is that we've only done, and
that's one of the shortcomings of this network. We haven't reasoned about
information gathering actions. All of our actions that we've talked about is
actually doing something in the world that changes the world like a
treatment, like putting an, an airport in the center of Numbag or something
like this. Right. So the next two topics are really looking at information
gathering actions. And that's a very typical thing of intelligent beings. Right.
Agents. Namely that we see that it might actually be a good idea to get more
information so that we can do better decisions. Now in our framework where we're
choosing actions via their information via their utility, which is a kind of
a value-based, a utility-based decision, we can only kind of seamlessly fit
these information gathering actions like diagnosis or so on into the
picture if we can actually give information a value. Right. We could again
assess information gathering decisions by looking at what people do, what
their preferences are, is there are their preferences now to actually do more
information, but it turns out that we don't need to do that. That
we can actually compute the value of information just from everything we have.
Right. So that's what I want to do now. And so you see this information
gathering behavior all the time. I think probably 60 or 70% if you call an
expert committee to decide something, I should assist it suicide be allowed in
Germany or something like this. Only 30% or 40% will they actually say well we
should do this and that. Mostly what they do is say ah but we need more
information first. Okay. So it's an important thing and the more expert you
become you the more often you say well but we need more information first.
Okay. Doctors do it all the time. Right. They see before they go to a
diagnosis they first take your blood, send it in for, send it in for, for
analysis and then they decide that you have a cough or whatever. So the idea
here is that we extend rational agents action repertoire by gathering
information and we would like to fit it in by giving information and utility.
This is called traditionally information value theory but we should probably
Presenters
Zugänglich über
Offener Zugang
Dauer
01:28:39 Min
Aufnahmedatum
2023-05-10
Hochgeladen am
2023-05-10 17:19:03
Sprache
en-US