next up previous


Postscript version of these notes

STAT 801 Lecture 24

Reading for Today's Lecture:

Goals of Today's Lecture:

Today's notes

Decision Theory and Bayesian Methods

Example: Decide between 4 modes of transportation to work:

Costs depend on weather: R = Rain or S = Sun.

Ingredients of Decision Problem: No data case.

In the example we might use the following table for L:

  C B T H
R 3 8 5 25
S 5 0 2 25

Notice that if it rains I will be glad if I drove. If it is sunny I will be glad if I rode my bike. In any case staying at home is expensive.

In general we study this problem by comparing various functions of $\theta$. In this problem a function of $\theta$ has only two values, one for rain and one for sun and we can plot any such function as a point in the plane. We do so to indicate the geometry of the problem before stating the general theory.

Example: Transport problem has no data so the only possible (non-randomized) decisions are the four possible actions B,C,T,H. For B and T the worst case is rain. For the other two actions Rain and Sun are equivalent. We have the following table:

  C B T H
R 3 8 5 25
S 5 0 2 25
Maximum 5 8 5 25

Smallest maximum: take car or bus.

Minimax action: take car or public transit to minimize worst case loss.

Now imagine: toss coin with probability $\lambda$ of getting Heads, take my car if Heads, otherwise take transit. Long run average daily loss would be $3 \lambda + 5(1-\lambda)$ when it rains and $5\lambda +2(1-\lambda)$when it is Sunny. Call this procedure $d_\lambda$; add it to graph for each value of $\lambda$. Varying $\lambda$ from 0 to 1 gives a straight line running from (2,5) to (5,3). The two losses are equal when $\lambda=3/5$. For smaller $\lambda$ worst case loss is for sun; for larger $\lambda$worst case loss is for rain.

Added to graph: loss functions for each $d_\lambda$, (straight line) and set of (x,y) pairs for which $\min(x,y) = 3.8$ -- worst case loss for $d_\lambda$ when $\lambda=3/5$.

The figure then shows that d3/5 is actually the minimax procedure when randomized procedures are permitted.

In general we might consider using a 4 sided coin where we took action B with probability $\lambda_B$, C with probability $\lambda_C$ and so on. The loss function of such a procedure is a convex combination of the losses of the four basic decisions making the set of losses achievable with the aid of randomization look like the following:

Randomization permits assumption that set of possible loss functions is convex -- important technical conclusion used to prove many basic decision theory results.

SO: replace decision space D with

\begin{displaymath}{\cal D} = \{(\delta_C,\delta_B,\delta_T,\delta_H); \delta_i \ge 0, \sum
\delta_i=1\},
\end{displaymath}

the set of probability distributions on D.

Graph shows many points in the picture correspond to bad decision procedures. Rain or shine taking my car to work has lower loss than staying home; staying home is inadmissible.

Definition: A decision $\delta$ is inadmissible if there is a decision $\delta^*$ such that

\begin{displaymath}L( \theta, \delta^*) \le L(\theta,\delta)
\end{displaymath}

for all $\theta$ and there is at least one value of $\theta$ where the inequality is strict. A decision which is not inadmissible is called admissible.

Admissible decisions have risks on lower left of graphs, i.e., lines connecting B to T and T to C are the admissible decisions.

There is a connection between Bayes decisions and admissible decisions. A prior distribution in our example problem is specified by two probabilities, $\pi_R$ and $\pi_S$ which add up to 1. If L=(LR,LS) is the loss function for some decision then the Bayes risk is

\begin{displaymath}r_\pi= \pi_R L_R + \pi_S L_S
\end{displaymath}

Consider the set of L such that this Bayes risk is equal to some constant. On our picture this is a straight line with slope $-\pi_R/\pi_S$. Consider now three priors $\pi_1 = (0.9,0.1)$, $\pi_2 = (0.5,0.5)$ and $\pi_3= (0.1,0.9)$. For say $\pi_1$imagine a line with slope -9 =0.9/0.1 starting on the far left of the picture and sliding right until it bumps into the convex set of possible losses in the previous picture. It does so at point B as shown in the next graph. Sliding this line to the right corresponds to making the value of $r_\pi$ larger and larger so that when it just touches the convex set we have found the Bayes decision.

Here is a picture showing the same lines for the three priors above.

Bayes decision for $\pi_1$ (you're pretty sure it will be sunny) is to ride your bike. If it's a toss up between R and S you take the bus. If R is very likely you take your car. Prior (0.6,0.4) produces the line shown here:

Any point on line BT is Bayes for this prior.

No data decision theory
Summary of Jargon and Results

Statistical Decision Theory

Statistical problems have another ingredient, the data. We observe X a random variable taking values in say $\cal X$. We may make our decision d depend on X. A decision rule is a function $\delta(X)$ from $\cal X$ to D. We will want $L(\delta(X),\theta)$ to be small for all $\theta$. Since X is random we quantify this by averaging over X and compare procedures $\delta$ in terms of the risk function

\begin{displaymath}R_\delta(\theta) = E_\theta(L(\delta(X),\theta))
\end{displaymath}

To compare two procedures we must compare two functions of $\theta$ and pick ``the smaller one''. But typically the two functions will cross each other and there won't be a unique `smaller one'.

Example: In estimation theory to estimate a real parameter $\theta$ we used $D=\Theta$,

\begin{displaymath}L(d,\theta) = (d-\theta)^2
\end{displaymath}

and find that the risk of an estimator $\hat\theta(X)$ is

\begin{displaymath}R_{\hat\theta}(\theta) = E[(\hat\theta-\theta)^2]
\end{displaymath}

which is just the Mean Squared Error of $\hat\theta$. We have already seen that there is no unique best estimator in the sense of MSE. How do we compare risk functions in general?

Statistical decision theory
Summary of Jargon and Results


next up previous



Richard Lockhart
2000-03-27