This HTML version of is provided for convenience, but it is not the best format for the book. In particular, some of the symbols are not rendered correctly.
You might prefer to read the PDF version.
Chapter 5 Odds and addends
One way to represent a probability is with a number between 0 and 1, but that’s not the only way. If you have ever bet on a football game or a horse race, you have probably encountered another representation of probability, called “odds.”
You might have heard expressions like “the odds are three to one against,” but you might not know what they mean. The odds in favor of an event are the ratio of the probability it will occur to the probability that it will not.
So if I think my team has a 75% chance of winning, I would say that the odds in their favor are three to one, because the chance of winning is three times the chance of losing.
You can write odds in decimal form, but it is most common to write them as a ratio of integers. So “three to one” is written 3:1.
When probabilities are low, it is more common to report the “odds against” rather than the odds in favor. For example, if I think my horse has a 10% chance of winning, I would say that the odds against are 9:1.
Probabilities and odds are different representations of the same information. Given a probability, you can compute the odds like this:
def Odds(p): return p / (1-p)
Given the odds in favor, in decimal form, you can convert to probability like this:
def Probability(o): return o / (o+1)
If you represent odds with a numerator and denominator, you can convert to probability like this:
def Probability2(yes, no): return yes / (yes + no)
When I work with odds in my head, I find it helpful to picture people at the track. If 20% of them think my horse will win, then 80% of them don’t, so the odds in favor are 20:80 or 1:4.
If the odds are 5:1 against my horse, then five out of six people think she will lose, so the probability of winning is 1/6.
5.2 The odds form of Bayes’s Theorem
In Chapter 1 I wrote Bayes’s Theorem in the “probability form”:
If we define H as “hypothesis H is false”, we can write the odds in favor of H like this:
Or writing o(H) for odds in favor of H:
In words, this says that the posterior odds are the prior odds times the likelihood ratio. This is the “odds form” of Bayes’s Theorem.
This form is most convenient for computing a Bayesian update on paper, or in your head. For example, let’s go back to the cookie problem:
Suppose there are two bowls of cookies. Bowl 1 contains 30 vanilla cookies and 10 chocolate cookies. Bowl 2 contains 20 of each.
The prior probability is 50%, so the prior odds are 1:1, or just 1. The likelihood ratio is 3/4 / 1/2, or 3/2. So the posterior odds are 3:2, which corresponds to probability 3/5.
5.3 Oliver’s blood
Here is another problem from MacKay’s Information Theory, Inference, and Learning Algorithms:
Two people have left traces of their own blood at the scene of a crime. A suspect, Oliver, is tested and found to have type ’O’ blood. The blood groups of the two traces are found to be of type ’O’ (a common type in the local population, having frequency 60%) and of type ’AB’ (a rare type, with frequency 1%). Do these data [the traces found at the scene] give evidence in favor of the proposition that Oliver was one of the people [who left blood at the scene]?
To answer this question, we need to think about what it means for data to give evidence in favor of (or against) a hypothesis. Intuitively, we might say that data favor a hypothesis if the hypothesis is more likely in light of the data than it was before.
In the cookie problem, the prior odds are 1:1, or probability 50%. The posterior odds are 3:2, or probability 60%. So we could say that the vanilla cookie is evidence in favor of Bowl 1.
The odds form of Bayes’s Theorem provides a way to make this intuition more precise. Again
Or dividing through by o(H):
The term on the left is the ratio of the posterior and prior odds. The term on the right is the likelihood ratio, also called the “Bayes factor”.
If the Bayes factor value is greater than 1, that means that the data were more likely under H than under H. And since the odds ratio is also greater than 1, that means that the odds are greater, in light of the data, than they were before.
If the Bayes factor is less than 1, that means the data were less likely under H than under H, so the odds in favor of H go down.
Finally, if the Bayes factor is exactly 1, the data are equally likely under either hypothesis, so the odds do not change.
Now we can get back to the Oliver’s blood problem. If Oliver is one of the people who left blood at the crime scene, then he accounts for the ’O’ sample, so the probability of the data is just the probability that a random member of the population has type ’AB’ blood, which is 1%.
If Oliver did not leave blood at the scene, then we have two samples to account for. If we choose two random people from the population, what is the chance of finding one with type ’O’ and one with type ’AB’? Well, there are two ways it might happen: the first person we choose might have type ’O’ and the second ’AB’, or the other way around. So the total probability is 2 (0.6) (0.01) = 1.2%.
The likelihood of the data is slightly higher if Oliver is not one of the people who left blood at the scene, so the blood data is actually evidence against Oliver’s guilt.
This example is a little contrived, but it is an example of the counterintuitive result that data consistent with a hypothesis are not necessarily in favor of the hypothesis.
If this result is so counterintuitive that it bothers you, this way of thinking might help: the data consist of a common event, type ’O’ blood, and a rare event, type ’AB’ blood. If Oliver accounts for the common event, that leaves the rare event still unexplained. If Oliver doesn’t account for the ’O’ blood, then we have two chances to find someone in the population with ’AB’ blood. And that factor of two makes the difference.
The fundamental operation of Bayesian statistics is Update, which takes a prior distribution and a set of data, and produces a posterior distribution. But solving real problems usually involves a number of other operations, including scaling, addition and other arithmetic operations, max and min, and mixtures.
This chapter presents addition and max; I will present other operations as we need them.
Dungeons and Dragons is a role-playing game where the results of players’ decisions are usually determined by rolling dice. In fact, before game play starts, players generate each attribute of their characters—strength, intelligence, wisdom, dexterity, constitution, and charisma—by rolling three six-sided dice and adding them up.
So you might be curious to know the distribution of this sum. There are two ways you might compute it:
class Die(thinkbayes.Pmf): def __init__(self, sides): d = dict((i, 1) for i in xrange(1, sides+1)) thinkbayes.Pmf.__init__(self, d) self.Normalize()
Now I can create a six-sided die:
d6 = Die(6)
dice = [d6] * 3 three = thinkbayes.SampleSum(dice, 1000)
def RandomSum(dists): total = sum(dist.Random() for dist in dists) return total def SampleSum(dists, n): pmf = MakePmfFromList(RandomSum(dists) for i in xrange(n)) return pmf
The drawback of estimating this distribution by simulation is that
it is only approximately correct. As
The other approach is to enumerate all pairs of values and
compute the sum and probability of each pair. This is implemented
def __add__(self, other): pmf = Pmf() for v1, p1 in self.Items(): for v2, p2 in other.Items(): pmf.Incr(v1+v2, p1*p2) return pmf
self and other can be Pmfs, or anything else that provides Items. The result is a new Pmf.
And here’s how it’s used:
three_exact = d6 + d6 + d6
When you apply the + operator to a Pmf, Python invokes
The approximate and exact distributions are shown in Figure 5.1.
The code from this section is available from http://thinkbayes.com/dungeons.py.
Having generated a Dungeons and Dragons character, I would be particularly interested in the character’s best attributes, so I might wonder what is the chance of getting an 18 in one or more attributes, or more generally what is the distribution of the best attribute.
There are three ways to compute the distribution of a maximum:
The code to simulate maxima is almost identical to the code for simulating sums:
def RandomMax(dists): total = max(dist.Random() for dist in dists) return total def SampleMax(dists, n): pmf = MakePmfFromList(RandomMax(dists) for i in xrange(n)) return pmf
All I did was replace “sum” with “max”. And the code for enumeration is almost identical, too:
def PmfMax(pmf1, pmf2): res = thinkbayes.Pmf() for v1, p1 in pmf1.Items(): for v2, p2 in pmf2.Items(): res.Incr(max(v1, v2), p1*p2) return res
In fact, you could generalize this function by taking the appropriate operator as a parameter.
The only problem with this algorithm is that if each Pmf has n values, the run time is proportional to n2.
If we convert the Pmfs to Cdfs, we can do the same calculation in linear time! The key is to remember the definition of the cumulative distribution function:
where X is a random variable that means “a value chosen randomly from this distribution.” So, for example, CDF(5) is the probability that a value from this distribution is less than or equal to 5.
If I draw X from CDF1 and Y from CDF2, and compute the maximum Z = max(X, Y), what is the chance that Z is less than or equal to 5? Well, in that case both X and Y must be less than or equal to 5.
If the selections of X and Y are independent,
where CDF3 is the distribution of Z. I chose the value 5 because I think it makes the formulas easy to read, but we can generalize for any value of z:
In the special case where we draw n values from the same distribution,
So to find the distribution of the maximum of n values,
we can enumerate the probabilities in the given Cdf
and raise them to the nth power.
def Max(self, n): cdf = self.Copy() cdf.ps = [p**n for p in cdf.ps] return cdf
Finally, here’s an example that computes the distribution of your character’s best attribute:
best_attr_cdf = three_exact.Max(6) best_attr_pmf = best_attr_cdf.MakePmf()
Are you using one of our books in a class?We'd like to know about it. Please consider filling out this short survey.