Agents¶
our textbook takes an agent-oriented perspective on AI, which can be diagrammed like this:
Agent Environment
+----------------------------------------------+ +-------------------+
| | | |
| | Percepts | |
| Sensors <--------------------------------------------+ |
| | | | |
| | | | |
| | | | |
| | | | |
| +-------v--------+ | | |
| | | | | |
| | | | | |
| | | | | |
| | ? | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| +-------+--------+ | | |
| | | | |
| | | | |
| | | | |
| v | Actions | |
| Actuators +------------------------------------------> |
| | | |
| | | |
+----------------------------------------------+ +-------------------+
the idea is that the agents senses the environment, and then, after some thinking, uses its actuators to modify the environment
- on a robot, actuators would be things like its hands, arms, body — anything that could change the state of the environment
- for this course, we will usually have little to say about actuators, and will assume that changing the environment can be done accurately and easily
- but for real-life robots, where actuators are a significant part of its interaction, much more care would need to be taken, e.g. it’s possible that an actuator could fail an not change the environment correctly
a percept is an agent’s perceptual input at any point in time
- a percept could come from a camera, voice, sonar, a text file, a bit stream, etc.
an agent’s percept sequence is the sequence of everything it has perceived
- in general, an agent’s choice of actions can depend upon its entire percept history (but on no percepts from the future)
e.g. the two-location vacuum-cleaner world in Figure 2.2 of the text is useful
there are two adjacent locations, A and B
the agent can perform these actions: move left, move right, suck up dirt, or do nothing (which is not really an action, but it can be useful to have an explicit symbol meaning no action is done)
the agent can perceive two different things:
- it’s location, either A or B
- whether its current location is clean or dirty
exactly how it does this perceiving we often don’t need to worry about; maybe it uses a camera, or a map (for the location), or a special (for dirt), etc.
[A, Clean] is an example of a percept for this agent: it perceives that it is in location A, and its is clean
[B, Dirty] is another percept, meaning the agent perceives that it is in location B and its dirty
it is quite possible that the agent’s perceptions could be wrong — it might have faulty or inaccurate sensors
- this is an important problem, but we will often ignore it, since the problems we deal with are often difficult enough already
a percept sequence for this agent might be this: [[A, Dirty], [A, Clean], [B, Clean]]
- this percept sequence corresponds to the agent start in location A, cleaning up the dirt in that location, and then moving to location B
a simple program for this agent might be something like this:
loop forever:
If most recent percept is [A, Clean], then move-right
If most recent percept is [A, Dirty], then suck-up-dirt
If most recent percept is [B, Clean], then move-left
If most recent percept is [B, Dirty], then suck-up-dirt
notice that this program only takes into account the most recent percept, and does not need to keep track of any history
Rational Agents¶
what does it mean for an agent to be rational?
intuitively, it means that the agent always does “the right thing”
what is the “right thing”?
we will generally use a performance measure on a sequence of environmental states
e.g. one possible performance measure for the vacuum-cleaning agent would be that there are as few dirty squares as possible
e.g. another possible performance measure for the vacuum-cleaning agent would be to maximize the amount of dirt it picks up over time
- but beware: if this was truly the performance measure, then a “clever” agent might realize that it could clean up all the dirt, then dump it out, clean it up again, dump it out, etc.
Definition of a Rational Agent¶
a rational agent needs:
- a performance measure that defines its criterion for success
- some prior knowledge of the environment
- to know the actions it can perform
- its percept sequence to date
so we can define a rational agent as follows:
For each possible percept sequence, a rational agent should select an action that is expected to maximize its performance measure, given the evidence provided by the percept-sequence and whatever built-in knowledge the agent has
memorize this definition!
notice that this definition implies we can’t necessarily tell if an agent is rational just by looking at its program — we must take into account the relevant performance measure
also, different agents working on the same problem in the same environment with the same percept sequence could behave differently due to their built-in knowledge
also, what is the most rational thing to do might not be the best thing to do, in the sense that the agent may have incomplete information about the environment
- a poker-playing agent can’t win all the time, because it doesn’t know what cards it will get
- but it can still act rationally, by making the best decisions at each point
also, if its actions allow it to do so, an agent might do information-gathering actions to learn more about the environment
- a robot that cross the street ought to look both ways before crossing
- a poker-playing agent might purposely play a hand in a crazy-seeming way to see how other players react, remembering what the learned for later
also, we usually want our agents to be autonomous, i.e. that they can operate on their own without a lot of prior information from their designers
autonomy requires learning, e.g. we’d like to be able to put a vacuum-cleaning robot in any environment without also having to give it a map of the environment
- so we would expect an autonomous vacuum-cleaning robot to not only keep things clean, but intelligently navigate its way around pretty much any reasonable environment
- if you are familiar with Roomba vacuum-cleaning robots, then you know that it is possible to do such things!
in general, we usually expect rational agents to deal with impartial and unknown information, and to learn new behaviors when necessary
Environments¶
the task environment in which an agent is working can have a big impact
we will use the PEAS model for describing task environments; a good description of a task environment must specify the:
- Performance measure
- Environment
- Actuators
- Sensors
e.g. a taxi-driving agent
- performance measure: safety, speed, legality, passenger comfort, profit
- environment: roads, including other traffic, pedestrians, weather conditions
- actuators: steering, breaks, accelerator, turn signals, horn, lights
- sensors: camera, sonar, speedometer, odometer, GPS, engine sensors
e.g. a poker-playing web agent
- performance measure: profit, fun for human players
- environment: online poker on the web
- actuators: API for placing bets, chatting, etc.
- sensors: API for checking state of game (e.g. size of pot, other players visible cards), etc.
e.g. a poker-playing robot
- performance measure: profit, fun for human players, energy consumption, safety, maintenance
- environment: home game table
- actuators: hand for holding cards, placing bets, facial expressions, sounds, voice for asking questions
- sensors: camera for seeing other cards, camera for facial recognition (to help guess if player has a weak/strong hand), voice recognition for listening to questions
e.g. medical diagnosis agent (Fig 2.5)
- performance measure: patient health; costs
- environment: patient, hospital, staff
- actuators: display of questions, tests, diagnoses, treatments, referrals
- sensors: keyboard entry of symptoms, findings, patients answers
Properties of Environments¶
fully observable vs. partially observable
- if an agents sensors give it access to the complete environment at each
point in time, we say the environment is fully observable
- in a fully observable environment, an agent need not store the state since its sensors tell it everything it needs to know
- if only a part of the environment can be sensed at each point in time, the
environment is said to be partially observable
- this is the more realistic case, since often impartial, incomplete, or inaccurate information is an ordinary part of the environment
deterministic vs. stochastic
- if the next state of the environment is completely determined by the
previous state sequence, and the action of the agent, then we say the
environment is deterministic; otherwise, we say the environment is
stochastic
- note that a partially observable environment could appear to an agent to be stochastic, even though in fact it is deterministic
- most practical environments are so complex that they are treated as if they are stochastic
static vs. dynamic
- if the environment can change while the agent is thinking, then we say the environment is dynamic; otherwise, it is static
- dynamic environments are usually more challenging to deal with, because they
put time-pressure on the agent’s thinking: if a decision takes too long, it
may no longer be valid
- e.g. a self-driving car must decide to hit the breaks or not very quickly in some situations!
discrete vs. continuous
- if time is handled in discrete sequential steps (like the discrete ticking of a CPU), then the environment is said to be discrete
- if instead time is handled as a continuous stream (as in physics)m then the environment is said to be continuous
- discrete math and continuous math is quite different!
known vs. unknown
- if the outcomes of all actions are known ahead of time by the agent, then we say the environment is known
- if instead the outcomes of some actions are not known ahead of time, meaning the agent must test things out to find out what will happen, then the environment is said to be unknown
The Structure of Agents¶
a large part of AI research is concerned with the structure of agent programs, and here we will consider a few high-level architectures for agents
Table-driven Agents¶
this is a look-up table agent that uses a (giant!) table of actions index by percept-sequences
given any percept sequence, the agent simply “looks up” in the able the action that it ought to do
this can be quite effective in many simple situations, but is completely unreasonable in even moderately complicated situations
to see how unreasonable table-driven agents are, we can do some simple calculations
let P be the set of different possible percepts, and T the “lifetime” of the agent, i.e. the length of time it exists
then the total number of percept sequences is exponential, i.e. \(\sum_{t=1}^{T}|P|^{t}\)
- e.g. a self-driving taxi that gets 27 megabytes of input per second from a camera sensor would need a look-up table with \(10^{250,000,000,000}\) entries, for an hours driving
- there are only about \(10^{80}\) atoms in the observable universe, so such a large table could not even be stored
- but even if we could create such a huge table, how could the entires be filled-in a reasonable amount of time?
Simple Reflex Agents¶
a simple reflex agent decides what to do based solely on the current percept, and ignores all previous percepts
a simple reflex thus does not need to store any state — it just reacts to whatever it currently “sees”
typically, simple reflex agents may consist of condition-action rules
- also known as situation-action rules, or if-then rules, or productions
e.g. if the-car-in-front-is-breaking, then apply-breaks
pro:
- simple programs!
con:
- single percepts are simply not enough in all situations to be able to do the
right thing
- e.g. the decision of whether or not to believe that a car driving in front
of you is breaking depends one more than just seeing if their break lights
are on
- what if the break lights are broken?
- or what if some light other than the break-light also goes on near the break light? (thus resulting in unnecessary breaking)
- e.g. the decision of whether or not to believe that a car driving in front
of you is breaking depends one more than just seeing if their break lights
are on
- can get stuck in infinite loops (e.g. see vacuum cleaner example on top of
p. 50)
- randomness can be used to help break out of such loops
Model-based Reflex Agents¶
model-based reflex agents save information about the external world using internal state
e.g. a vacuuming agent might keep an internal map of the environment it has visited so far, event the parts it can’t currently sense
such agents can act far more intelligently, since they are not limited to what they can currently perceive
a model-based agent could use its internal model to simulate actions before doing them, to estimate what might happen
- this is usually quite hard, since the agent will normally have incomplete information about the environment
- also, the simulation they do may only be an approximation of reality
Goal-based Agents¶
goals are descriptions of desirable states
e.g. if you come to a fork in the road where you can go either left or right, how do you choose which way to go? hopefully you have a goal about where you want to go that enables you to make a good decision
in general, achieving goals is can be quite tricky, involving long sequences of actions
e.g. the goal of baking a cake; you need to at least:
- have a recipe for the cake
- have all the ingredients for the cake
- if you are missing some ingredients, then you have the sub-goal of getting those ingredients
- have all the pots, pans, measuring cups, etc. for the cake
- you must follow the recipe for mixing and baking the ingredients
- your recipe may not give exact measurements, and may include instructions
like “season to taste”
- which suggests the agent must be able to taste the cake
- at any point in this process, unexpected things might happen that the agent
should try to “recover” from, e.g.
- the power could go out
- someone could come to the door and interrupt you
in AI, the sub-fields of search and planning are about this sort of problem
Utility Agents¶
utility agents are goal-based agents that also take into account the quality of the action sequences used to achieve goals
e.g. there might be three ways to drive from your house to the airport: a fast but bumpy way, a slow but smooth way, and a way that is fast, smooth, but noisy
- all three ways achieve the goal of getting to the airport
- but some ways might be better than others, depending upon the situation
- so we often add a utility function that is a measure of the agent’s “happiness” with a particular sequence of actions
- an agent will usually want to achieve its goals in a way that maximizes its overall utility
- an agent’s utility function is usually an internal version of the environments performance measure
- if the environment contains randomness, then the agent will usually try to maximize its expected utility, since in a random environment it can’t always be sure of the outcomes of its actions
utility-maximizing agents are the goal of much AI research
it may sound simple, but its hard to come up with good utility functions in complex environments, and computational complexity concerns are usually an issue (i.e. what if computing maximum utility requires exponential time?)
Learning Agents¶
while it is possible to program intelligent agents by hands, that can be extremely difficulty for many problems of interest to AI
e.g. suppose you wanted to make a character-recognition system that could recognize hand-written numbers on paper
- there are thousands of small and subtle variations in how people write numbers, and it is extremely difficult to say exactly what the rules are for, say, distinguishing between the number 4 and the number 7
- so in practice, the most successful hand-writing recognition have been learning programs that learn how to recognize numbers by considering thousands and thousands of training examples
in general, a learning agent is able to modify its various components in ways that improve its overall performance
any of the above kinds of agents could also have a learning component, and in practice learning has been one of the most successful areas of AI