This HTML version of is provided for convenience, but it is not the best format for the book. In particular, some of the symbols are not rendered correctly.
You might prefer to read the PDF version.
Chapter 0 Preface
This version of the book is a rough draft. I am making this draft available for comments, but it comes with the warning that it is probably full of errors.
If you find some of those errors, please let me know. If I make a change based on your suggestion, I will add you to the contributors list (unless you ask me not to).
My theory, which is mine1
The premise of this book, and the other books in the Think series, is that if you know how to program, you can use that skill to help you learn other topics, including Bayesian statistics.
Most books on Bayesian statistics use mathematical notation and present ideas in terms of mathematical concepts like calculus. This book uses Python code instead of math, and discrete approximations instead of continuous mathematics. As a result, what would be an integral in a math book becomes a summation, and most operations on distributions are simple loops.
This presentation is easier to understand, at least for people with programming skills. It is also more general, because when we make modeling decisions, we can choose the most appropriate model without worrying too much about whether the model lends itself to conventional analysis.
Also, it provides a smooth development path from simple examples to real-world problems. Chapter 3 is a good example. It starts with a simple example involving dice, one of the staples of basic probability. From there it proceeds in small steps to the Locomotive Problem, which I borrowed from Mosteller’s, Fifty Challenging Problems in Probability, and from there to the German Tank Problem, a famously-successful application of Bayesian methods during World War II.
0.0.1 Modeling and approximation
Most chapters in this book are motivated by a real-world problem, so most chapters involve some degree of modeling. Before we can apply Bayesian methods (or any other analysis), we have to make decisions about which parts of the real-world system we have to include in the model, and which details we can abstract away.
For example, in Chapter 7, the motivating problem is to predict the winner of a hockey game. I chose to model goal-scoring as a Poisson process, which implies that a goal is equally likely at any point in the game. That is not exactly true, but it is probably a good enough model for most purposes.
In Chapter refevidence the motivating problem is interpreting SAT scores (the SAT is a standardized test used for college admissions in the U.S.). I start with a simple model that assumes that all SAT questions are equally difficult, but in fact the designers of the SAT deliberately include some questions that are relatively easy and some that are relatively hard. I present a second model that includes this feature, and conclude that it doesn’t have a big effect on the results after all.
I think it is important to include modeling as an explicit part of problem-solving because it reminds us to think about modeling errors; that is, errors due to simplifications and assumptions of the model.
Many of the methods in this book are based on discrete distributions, which makes some people worry about numerical errors. But for real-world problems, numerical errors are almost always much smaller than modeling errors.
Furthermore, the discrete approach often allows better modeling decisions; in that case, I would rather have an approximate solution to a good model than an exact solution to a bad model.
On the other hand, continuous methods sometimes yield performance advantages, for example by replacing a linear- or quadratic-time computation with a constant-time solution.
So I recommend a general process with these steps:
One benefit of this process is that Step 1 tends to be fast, so you can explore several alternative models before investing heavily in any of them.
Another benefit is that if you get to Step 3, you will be starting with a reference implementation that is likely to be correct, which you can use for regression testing (that is, checking that the optimized code yields the same answer, at least approximately).
Working with the code
Many of the code examples in this book use classes and functions defined in thinkbayes.py. You can download this module from http://thinkbayes.com/thinkbayes.py.
Most chapters contain references to code you can download from http://thinkbayes.com. Some of those files have dependencies you will also have to download. I suggest you keep all of these files in the same directory so they can import each other without changing the Python path.
You can download these files one at a time when you need them, or you
can download them all at once in a zip file,
http://thinkbayes.com/thinkbayes_code.zip. This file also
contains the data files used by some of the programs. When you
unzip it, it creates a directory named
One of the modules in that directory is thinkplot.py, which provides wrappers for some of the functions in pyplot, which is part of matplotlib. You can download matplotlib from http://matplotlib.org, if you don’t already have it installed.
Experienced Python programmers will notice that the code in this book does not comply with PEP 8, which is the most common style guide for Python (http://www.python.org/dev/peps/pep-0008/).
Specifially, PEP 8 calls for lowercase function names with
underscores between words,
The reason I broke this rule is that I developed some of the code while I was a Visiting Scientist at Google in 2009-10. So I followed the Google style guide for Python, which deviates from PEP 8 in a few places. Also, once I got used to Google style, I found that I liked it. And at this point, it would be too much trouble to change.
Also on the topic of style, I choose to write “Bayes’s Theorem” with an s after the apostrophe, which is preferred in some style guides and deprecated in others. I don’t have a strong preference. I had to choose one, and I chose this one.
There are several excellent modules for doing Bayesian statistics in Python, including pymc and OpenBUGS. I chose not to use them for this book because you need a fair amount of background knowledge to get started with these modules, and I want to keep the prerequisites minimal. If you know Python and a little bit about probability, you are ready to start this book.
Chapter 1 is about probability and Bayes’s Theorem; it has no code. Chapter 2 introduces Pmf, a thinly disguised Python dictionary I use to represent a probability mass function (PMF). Then Chapter 3 introduces Suite, a kind of Pmf that provides a framework for doing Bayesian updates. And that’s just about all there is to it.
Well, almost. In some of the later chapters, I use some common analytic distributions, including the Gaussian (normal) distribution, the exponential and Poisson distributions, and the beta distribution. In Chapter 14 I break out the less-common Dirichlet distribution, but I explain it as I go along. If you are not familiar with these distributions, you can read about them on Wikipedia. You could also read the companion to this book, Think Stats, or an introductory statistics book (although I’m afraid most of them take a mathematical approach that is not particularly helpful for practical purposes).
After you read this book, you should be ready to apply Bayesian methods to real-world problems. I have used this book in a college class with students who knew Python and a little bit of probability. After a few weeks, they were able to work on projects with real-world applications. As examples of the kind of work you can do with these methods, some of their reports are included as case studies at the end of this book.
I am always looking for examples that would lend themselves to additional case studies. If you use methods from this book on a real project, let me know. Maybe we can include a case study about your project in a future edition.
Allen B. Downey
If you include at least part of the sentence the error appears in, that makes it easy for me to search. Page and section numbers are fine, too, but not quite as easy to work with. Thanks!
Are you using one of our books in a class?We'd like to know about it. Please consider filling out this short survey.