|
This HTML version of is provided for convenience, but it is not the best format for the book. In particular, some of the symbols are not rendered correctly. You might prefer to read the PDF version. Chapter 9 Two dimensions9.1 Paintball
Paintball is a sport in which competing teams try to shoot each other with guns that fire paint-filled pellets that break on impact, leaving a colorful mark on the target. It is usually played in an arena decorated with barriers and other objects that can be used as cover. Suppose you are playing paintball in an indoor arena 30 feet wide and 50 feet long. You are standing near one of the 30 foot walls, and you suspect that one of your opponents has taken cover nearby. Along the wall, you see several paint spatters, all the same color, that you think your opponent fired recently. The spatters are at 15, 16, 18 and 21 feet, measured from the lower-left corner of the room. Based on these data, where do you think your opponent is hiding? Figure 9.1 shows a diagram of the arena. Using the lower-left corner of the room as the origin, I denote the unknown location of the shooter with coordinates α and β, or alpha and beta. The location of a paint spatter is labeled x. The angle the opponent shoots at is θ or theta. The paintball problem in this chapter is a slightly modified version of the lighthouse problem, a common example of Bayesian analysis. My notation follows the presentation of the problem in Sivia, Data Analysis: a Bayesian Tutorial, Second Edition. 9.2 Trigonometry
At least initially, let’s assume that the opponent is equally likely to shoot at any angle. If that’s the case, he is most likely to hit the wall at location alpha, and less likely to hit the wall far away from alpha. With a little trigonometry, we can compute the probability of hitting any spot along the wall. Imagine that the shooter fires a shot at angle θ then turns slightly and fires again at angle θ + Δθ. The first pellet would hit the wall at location x, where
The second pellet would hit the wall at location x′, where
The derivative dx/dθ is what I’ll call the “strafing speed”, which is the speed of the target location along the wall as θ increases. The probability of hitting a given point on the wall is inversely related to strafing speed. Taking the derivative of the first equation yields
And solving the same equation for θ yields
So if we know the coordinates of the shooter and a location along the wall, we can compute strafing speed: def StrafingSpeed(alpha, beta, x):
theta = math.atan2(x - alpha, beta)
speed = beta / math.cos(theta)**2
return speed
alpha and beta are the coordinates of the shooter; x is the location of a paint spatter. The result is the derivative of x with respect to theta. Now we can compute the probability of hitting any spot along the wall. MakeLocationPmf takes the coordinates of the shooter, alpha and beta, and a list of possible values of x, locations. def MakeLocationPmf(alpha, beta, locations):
pmf = thinkbayes.Pmf()
for x in locations:
prob = 1.0 / StrafingSpeed(alpha, beta, x)
pmf.Set(x, prob)
pmf.Normalize()
return pmf
For each location, the probability is inversely related to strafing speed. The result is a Pmf of locations and their probabilities. Figure 9.2 shows the Pmf of location with alpha = 10 and a range of values for beta. As expected, the most likely impact location is x = 10, and as beta increases, so does the spread of the Pmf. 9.3 The suite
Next we define a suite of hypotheses for the coordinates of the shooter. Each value in the suite is a pair of coordinates: (alpha, beta). Here is the function that initializes the suite: class Paintball(thinkbayes.Suite, thinkbayes.Joint):
def __init__(self, alphas, betas, locations):
self.locations = locations
pairs = [(alpha, beta)
for alpha in alphas
for beta in betas]
thinkbayes.Suite.__init__(self, pairs)
Paintball inherits from Suite, which we have seen before, and Joint, which I will explain soon. alphas is the list of possible values for alpha; betas is the list of values for beta. pairs is a list of all (alpha, beta) pairs. locations is a list of possible impact locations along the wall; it is stored for use in Likelihood. The room is 30 feet wide and 50 feet long, so here’s the code that creates the suite: alphas = range(0, 31)
betas = range(1, 51)
locations = range(0, 31)
suite = Paintball(alphas, betas, locations)
This prior distribution assumes that all locations in the room are equally likely. Given a map of the room, we might choose a more detailed prior, but we’ll start simple. 9.4 LikelihoodAll we need now is a likelihood function: def Likelihood(self, hypo, data):
alpha, beta = hypo
x = data
pmf = MakeLocationPmf(alpha, beta, self.locations)
like = pmf.Prob(x)
return like
hypo contains the hypothetical coordinates of the shooter. data contains an impact location along the wall. pmf contains the probability of each location, given the coordinates of the shooter. From this Pmf, we select the probability of the particular location that was observed. And we’re done. To update the suite, we can use UpdateSet, which is inherited from Suite. suite.UpdateSet([15, 16, 18, 21]) The result is a distribution that maps each (alpha, beta) pair to a posterior probability. Since each value in the posterior distribution is a pair of variables, this distribution is called a “joint distribution” of the two variables. 9.5 Joint distributionsGiven the joint distribution, we can compute the distributions of each variable independently, which are called the “marginal distributions.” thinkbayes.Joint provides a method that computes marginal distributions: # class Joint:
def Marginal(self, i):
pmf = Pmf()
for vs, prob in self.Items():
pmf.Incr(vs[i], prob)
return pmf
i is the index of the variable we want; in this example i=0 indicates the distribution of alpha, and i=1 indicates the distribution of beta. Here’s the code that extracts the marginal distributions: marginal_alpha = suite.Marginal(0)
marginal_beta = suite.Marginal(1)
Figure 9.3 shows the results (converted to CDFs). The median value for alpha is 18, near the center of mass of the observed spatters. For beta, the most likely values are close to the wall, but beyond 10 feet the distribution is almost uniform, which indicates that the data do not distinguish strongly between these possible locations. Given the posterior marginals, we can compute credible intervals for each coordinate independently: print 'alpha CI', marginal_alpha.CredibleInterval(50)
print 'beta CI', marginal_beta.CredibleInterval(50)
The 50% credible intervals are (14, 21) for alpha and (5, 31) for beta. So the data provide evidence that the shooter is in the near side of the room. But it is not strong evidence. The 90% credible interval covers most of the room! 9.6 Conditional distributions
The marginal distributions contain information about the variables independently, but they do not capture the dependence between variables, if any. One way to visualize dependence is by computing conditional distributions. thinkbayes.Joint provides a method that does just that: def Conditional(self, i, j, val):
pmf = Pmf()
for vs, prob in self.Items():
if vs[j] != val: continue
pmf.Incr(vs[i], prob)
pmf.Normalize()
return pmf
Again, i is the index of the variable we want; j is the index of the conditioning variable, and val is the conditional value. The result is the distribution of the ith variable under the condition that the jth variable is val. For example, the following code computes the conditional distributions of alpha for a range of values of beta: betas = [10, 20, 40]
for beta in betas:
cond = suite.Conditional(0, 1, beta)
thinkplot.Pmf(cond)
Figure 9.4 shows the results, which we could fully describe as “posterior conditional marginal distributions!” The shapes of these distributions are different, which indicates that the variables are not independent. If we know somehow that beta = 10, then we can be fairly confident about alpha. But as beta increases, so does the spread in alpha. 9.7 Credible intervals
Another way to visualize the posterior joint distribution is to compute credible intervals. Last time we looked at credible intervals, I skipped over an important point: for a given distribution, there are many intervals with the same level of credibility. For example, if you want a 50% credible interval, you could choose any set of values whose probability adds up to 50%. When the values are one-dimensional, it is most common to choose the “central credible interval;” for example, the central 50% credible interval contains all values between the 25th and 75th percentiles. In multiple dimensions it is less obvious what the right credible interval should be. The best choice might depend on context. But one common choice is the maximum likelihood credible interval, which contains the most likely values that add up to 50% (or some other percentage). thinkbayes.Joint provides a method that computes maximum likelihood credible intervals. # class Joint:
def MaxLikeInterval(self, percentage=90):
interval = []
total = 0
t = [(prob, val) for val, prob in self.Items()]
t.sort(reverse=True)
for prob, val in t:
interval.append(val)
total += prob
if total >= percentage/100.0:
break
return interval
The first step is to make a list of the values in the suite, sorted in descending order by probability. Next we traverse the list, adding each value to the interval, until the total probability exceeds percentage. The result is a list of values from the suite. To visualize these intervals, I wrote a function that “colors” each value according to how many intervals it appears in: def MakeCrediblePlot(suite):
d = dict((pair, 0) for pair in suite.Values())
percentages = [75, 50, 25]
for p in percentages:
interval = suite.MaxLikeInterval(p)
for pair in interval:
d[pair] += 1
thinkplot.Contour(d, contour=False, pcolor=True)
thinkplot.Show()
d is a dictionary that maps from each value in the suite to the number of intervals it appears in. The loop computes intervals for several percentages and modifies d. Then I use functions from thinkplot to generate a pseudocolor plot. Figure 9.5 shows the result. The 25% credible interval is the darkest region near the bottom wall. For higher percentages, the credible interval is bigger, of course, and skewed toward the right side of the room. |
Are you using one of our books in a class?We'd like to know about it. Please consider filling out this short survey.
|