|
This HTML version of is provided for convenience, but it is not the best format for the book. In particular, some of the symbols are not rendered correctly. You might prefer to read the PDF version. Chapter 4 More estimation4.1 The Euro coin problemIn Information Theory, Inference, and Learning Algorithms, David MacKay poses this question: A statistical statement appeared in “The Guardian" on Friday January 4, 2002:When spun on edge 250 times, a Belgian one-euro coin came up heads 140 times and tails 110. ‘It looks very suspicious to me,’ said Barry Blight, a statistics lecturer at the London School of Economics. ‘If the coin were unbiased, the chance of getting a result as extreme as that would be less than 7%.’ To answer that question, we’ll proceed in two steps. The first is to estimate the probability that the coin lands face up. The second is to evaluate whether the data support the hypothesis that the coin is biased. Any given coin has some probability, x, of landing heads up when spun on edge. It seems reasonable to believe that the value of x depends on some physical characteristics of the coin, primarily the distribution of weight. If a coin is perfectly balanced, we expect x to be close to 50%, but for a lop-sided coin x might be substantially different. We can use Bayes’s Theorem and the observed data to estimate x. Let’s define 101 hypotheses, where Hi is the hypothesis that x is i%, for values of i from 0 to 100. I’ll start with a uniform prior where the probability of Hi is the same for all i. We’ll come back later to consider other priors.
The likelihood function is easy; in fact, it is the same as in the Four Urn problem. If Hi is true, then the probability of heads is i/100; the probability of tails is 1− i/100. class Euro(Suite):
def Likelihood(self, hypo, data):
x = hypo / 100.0
if data == 'H':
return x
else:
return 1-x
Here’s the code that makes the suite and updates it: suite = Euro(xrange(0, 101))
dataset = 'H' * 140 + 'T' * 110
for data in dataset:
suite.Update(data)
And then we can plot the posterior. thinkplot.Pmf(suite)
thinkplot.Show()
The result is in Figure 4.1. 4.2 Summarizing the posteriorAgain, there are several ways to summarize the posterior distribution.
One option is to find the most likely value in the posterior
distribution. def MaximumLikelihood(pmf):
"""Returns the value with the highest probability."""
prob, val = max((prob, val) for val, prob in pmf.Items())
return val
In this case the result is 56, which is also the observed percentage of heads, 140/250 = 0.56%. So that suggests (correctly) that the percentage in a sample is the maximum likelihood estimator for the population. We might also summarize the posterior by computing the mean and median: print 'Mean', suite.Mean()
print 'Median', thinkbayes.Percentile(suite, 50)
The mean is 55.95; the median is 56. Finally, we can compute a credible interval: print 'CI', thinkbayes.CredibleInterval(suite, 90) The result is (51, 61). Now, getting back to the original question, we would like to know whether the coin is fair. We observe that the posterior credible interval does not include 50%, which suggests that the coin is not fair. But that is not exactly the question we started with. MacKay asked, “ Do these data give evidence that the coin is biased rather than fair?” To answer that question, we will have to be more precise about what it means to say that data constitute evidence for a hypothesis. And that is the subject of the next chapter. But before we go, I want to address one possible source of confusion. Since we want to know whether the coin is fair, it might be tempting to ask for the probability that x is 50%: print suite.Prob(50) The result is 0.021, but that value is pretty much meaningless. The decision to evaluate 101 hypotheses was arbitrary; we could have divided the range into more or fewer pieces, and if we had, the probability for any given hypothesis would be greater or less. 4.3 Swamping the priors
Again we started with a uniform prior, and again we have to acknowledge that it was not a very good choice. I can believe that if a coin is lop-sided, x might deviate substantially from 50%, but it seems unlikely that the Belgian Euro coin is so imbalanced that x is 10% or 90%. So it might be more reasonable to choose a prior that gives higher probability to values of x near 50% and lower probability to extreme values. As an example, I constructed a triangular prior, shown in Figure 4.2. Here’s the code that constructs the prior: def TrianglePrior():
suite = Euro(xrange(0, 101))
for x in range(0, 51):
suite.Set(x, x)
for x in range(51, 101):
suite.Set(x, 100-x)
suite.Normalize()
Figure 4.2 shows the result. Updating this prior with the same dataset yields the posterior distribution shown in Figure 4.3. The point of this example is that even with substantially different priors, the posterior distributions are very similar. The medians and the credible intervals are identical; the means differ by less than 0.5%. This is an example of “swamping the priors:” with enough data, people who start with different priors will tend to converge on the same posterior. 4.4 OptimizationThe code I have shown so far is meant to be easy to read, but it is not very efficient. In general, I like to develop code that is demonstrably correct, then check whether it is fast enough for my purposes. If so, there is no need to optimize. For this example, if we care about run time, there are several ways we can speed it up. The first opportunity is to reduce the number of times we
normalize the suite.
In the original code, we call dataset = 'H' * heads + 'T' * tails
for data in dataset:
suite.Update(data)
And here’s what def Update(self, data):
for hypo in self.Values():
like = self.Likelihood(hypo, data)
self.Mult(hypo, like)
return self.Normalize()
Each update iterates through the hypotheses, then calls
def UpdateSet(self, dataset):
for data in dataset:
for hypo in self.Values():
like = self.Likelihood(hypo, data)
self.Mult(hypo, like)
return self.Normalize()
And here’s how we can invoke it: dataset = 'H' * heads + 'T' * tails
suite.UpdateSet(dataset)
This optimization speeds things up, but the run time is still
proportional to the amount of data. We can speed things up
even more by rewriting In the original version,
def Likelihood(self, hypo, data):
x = hypo / 100.0
if data == 'H':
return x
else:
return 1-x
As an alternative, we could encode the dataset as a tuple of
two integers: the number of heads and tails.
In that case def Likelihood(self, hypo, data):
x = hypo / 100.0
heads, tails = data
like = x**heads * (1-x)**tails
return like
And then we can call heads, tails = 140, 110
suite.Update((heads, tails))
Since we have replaced repeated multiplication with exponentiation, this version takes the same time for any number of spins. 4.5 The beta distributionThere is one more optimization that solves this problem even faster. So far we have used a Pmf object to represent a discrete set of possible values for x. Now we will use a continuous distribution, specifically the beta distribution (see http://en.wikipedia.org/wiki/Beta_distribution). The beta distribution is defined on the interval from 0 to 1 (including both), so it is a natural choice for describing proportions and probabilities. But wait, it gets better. It turns out that if you do a Bayesian update with a binomial likelihood function, as we did in the previous section, the beta distribution is a “conjugate prior.” That means that if the prior distribution for x is a beta distribution, the posterior is also a beta distribution. But wait, it gets even better. The shape of the beta distribution depends on two parameters, written α and β, or alpha and beta. If the prior is a beta distribution with parameters alpha and beta, and we see data with h heads and t tails, the posterior is a beta distribution with parameters alpha+h and beta+t. In other words, we can do an update with two additions. So that’s great, but it only works if the prior is beta. If beta distributions were unrealistic priors, they would not be very useful. But for many realistic priors there is a beta distribution that is at least a good approximation, and for a uniform prior there is a perfect match. The beta distribution with alpha=1 and beta=1 is uniform from 0 to 1. Let’s see how we can take advantage of all this. Here’s a class that represents a beta distribution: class Beta(object):
def __init__(self, alpha=1, beta=1):
self.alpha = alpha
self.beta = beta
By default def Update(self, data):
heads, tails = data
self.alpha += heads
self.beta += tails
data is a pair of integers representing the number of heads and tails, or in the language of Bernoulli trials: successes and failures. The mean of a beta distribution is a simple function of alpha and beta: def Mean(self):
return float(self.alpha) / (self.alpha + self.beta)
So we have yet another way to solve the Euro problem: beta = Beta()
beta.Update((140, 110))
print beta.Mean()
The posterior mean is 56% (again). To get the probability for any value of x, we can evaluate the PDF of the beta distribution: def EvalPdf(self, x):
return x**(self.alpha-1) * (1-x)**(self.beta-1)
This expression might look familiar. Here’s thinkbayes.EvalBinomialPmf def EvalBinomialPmf(x, yes, no):
return x**yes * (1-x)**no
It’s the same function, but in EvalPdf, we think of x as a random variable and alpha and beta as parameters; in EvalBinomialPmf, x is the parameter, and yes and no are random variables. Distributions like these that share the same PDF are called “conjugate distributions.” Beta is defined in http://thinkbayes.com/thinkbayes.py. 4.6 ExercisesExercise 1 Suppose that instead of observing coin tosses directly, you measure the outcome using an instrument that is not always correct. Specifically, suppose there is a probability y that an actual heads is reported as tails, or actual tails reported as heads. Write a class that estimates the bias of a coin given a series of outcomes and the value of y. How does the spread of the posterior distribution depend on y? Exercise 2 This exercise is inspired by a question posted by a “redditor” named dominosci on Reddit’s statistics “subreddit” at http://reddit.com/r/statistics. Reddit is an online forum with many interest groups called subreddits. Users, called redditors, post links to online content and other web pages. Other redditors vote on the links, giving an “upvote” to high-quality links and a “downvote” to links that are bad or irrelevant. A problem, identified by dominosci, is that some redditors are more reliable than others, and Reddit does not currently take this into account. So the challenge is to devise a system so that when a redditor casts a vote, the estimated quality of the link is updated in accordance with the reliability of the redditor, and the estimated reliability of the redditor is updated in accordance with the quality of the link. One approach is to model the quality of the link as the probability of garnering an upvote, and to model the reliability of the redditor as the probability of correctly giving an upvote to a high-quality item. Write class definitions for redditors and links and an update function that updates both objects whenever a redditor casts a vote. |
Are you using one of our books in a class?We'd like to know about it. Please consider filling out this short survey.
|