Previous Up

Chapter 9  Stochastic modeling

9.1  Monty Hall

The Monty Hall problem might be the most contentious question in the history of probability. The problem is simple, but the correct answer is so counterintuitive that many people just can’t accept it, and many smart people have embarrassed themselves not just by getting it wrong but by arguing the wrong side, aggressively, in public.

Monty Hall was the original host of the game show Let’s Make a Deal. The Monty Hall problem is based on one of the regular games on the show. If you were on the show, here’s what happens:

The question is, should you “stick” or “switch” or does it make no difference?

Most people have the very strong intuition that it makes no difference. There are two doors left, they reason, so the chance that the car is behind Door A is 50%.

But that is wrong. In fact, the chance of winning if you stick with Door A is only 1/3; if you switch, your chances are 2/3. I will explain why, but I don’t expect you to believe me.

The key is to realize that there are three possible scenarios: the car is either behind Door A, B or C. Since the prizes are arranged at random, the probability of each scenario is 1/3.

If your strategy is to stick with Door A, then you will only win in Scenario A, which has probability 1/3.

If your strategy is to switch, you will win in either Scenario B or Scenario C, so the total probability of winning is 2/3.

If you are not completely convinced by this argument, you are in good company. When a friend presented this solution to Paul Erdős, he replied, “No, that is impossible. It should make no difference.1

No amount of argument could convince him. In the end, it took a computer simulation to bring him around.

Exercise 1  

Write a program that simulates the Monty Hall problem and use it to estimate the probability of winning if you stick, and if you switch.

Then read the discussion of the problem at wikipedia.org/wiki/Monty_Hall_problem.

Which do you find more convincing, the simulation or the arguments, and why?

Exercise 2  

To understand the Monty Hall problem, it is important to realize that by deciding which door to open, Monty is giving you information. To see why this matters, imagine the case where Monty doesn’t know where the prizes are, so he chooses Door B or C at random.

Assuming the Monty gets lucky and doesn’t open the door with the car, are you better off switching or sticking?

Exercise 3  

The following questions2 are similar to the Monty Hall problem—they are easy to get wrong, but also easy to simulate:

  1. If a family has two children, what is the chance that they have two girls?
  2. If a family has two children and we know that one of them is a girl, what is the chance that they have two girls?
  3. If a family has two children and we know that the older one is a girl, what is the chance that they have two girls?
  4. If a family has two children and we know that at least one of them is a girl named Florida, what is the chance that they have two girls?

You can use the assumptions that the probability that any child is a girl is 50%, and that the percentage of girls named Florida is small.

9.2  Poincaré

Henri Poincare was a French mathematician who taught at the Sorbonne from 1881 until his death in 1912. The following anecdote about him is probably apocryphal, but it makes an interesting probability problem.

Supposedly Poincarésuspected that his local bakery was selling loaves of bread that were lighter than the advertised weight of 1 kg, so every day for a year he bought a loaf of bread, brought it home and weighed it. At the end of the year, he plotted the distribution of his measurements and showed that it fit a normal distribution with mean 950 g and standard deviation 50 g. He brought this evidence to the bread police, who gave the baker a warning.

For the next year, Poincarécontinued the practice of weighing his bread every day. At the end of the year, he found that the average weight was 1000 g, just as it should be, but again he complained to the bread police, and this time they fined the baker.

Why? Because the shape of the distribution was asymmetric. Unlike the normal distribution, it was skewed to the right, which is consistent with the hypothesis that the baker was still making 950 g loaves, but deliberately giving Poincaréthe heavier ones.

Exercise 4  

Write a program that simulates a baker who chooses n loaves from a distribution with mean 950 g and standard deviation 50 g, and gives the heaviest one to Poincaré. What value of n yields a distribution with mean 1000 g? What is the standard deviation?

Compare this distribution to a normal distribution with the same mean and the same standard deviation. Is the difference in the shape of the distribution big enough to convince the bread police?

Exercise 5  

If you go to a dance where partners are paired up randomly, what percentage of (opposite sex) couples will you see where the woman is taller than the man?

You might have to look around to get the average and standard deviation of height for men and women.

9.3  Streaks and hot spots

People do not have very good intuition for random processes. If you ask people to generate “random” numbers, they tend to generate sequences that are random-looking, but actually more ordered than real random sequences. Conversely, if you show them a real random sequence, they tend to see patterns where there are none.

A classic example of the second phenomenon is the prevalence of belief in “streaks” in sports: a player that has been successful recently is said to have a “hot hand;” a player that has been unsuccessful is “in a slump.”

Statisticians have tested these hypotheses in a number of sports, and the consistent result is that there is no such thing as a streak3. If you assume that each attempt is independent of previous attempts, you will see occasional long strings of successes or failures. These apparent streaks are not sufficient evidence that there is any correlation between successive attempts.

A related phenomenon is the clustering illusion, which is the tendency to see clusters in spatial patterns that are actually random (see wikipedia.org/wiki/Clustering_illusion). Monte Carlo simulations are a useful way to test whether an apparent cluster is likely to be meaningful.

Exercise 6  

If there are 10 players in a basketball game and each one takes 15 shots during the course of the game, and each shot has a 50% probability of going in, what is the probability that you will see, in a given game, at least one player who hits 10 shots in a row? If you watch a season of 82 games, what are the chances you will see at least one streak of 10 hits or misses?

This problem demonstrates some strengths and weaknesses of Monte Carlo simulation. A strength is that it is often easy and fast to write a simulation, and no great knowledge of probability is required. A weakness is that estimating the probability of rare events can take a long time! A little bit of analysis can save a lot of computing.

Exercise 7  

A cancer cluster is defined by the Centers for Disease Control (CDC) as “greater-than-expected number of cancer cases that occurs within a group of people in a geographic area over a period of time.4

Many people interpret a cancer cluster as evidence of an environmental hazard, but many scientists and statisticians think that investigating cancer clusters is a waste of time5. Why? One reason (among several) is that identifying cancer clusters is a classic case of the Sharpshooter Fallacy (see wikipedia.org/wiki/Texas_sharpshooter_fallacy).

Nevertheless, when someone reports a cancer cluster, the CDC is obligated to investigate. According to their web page:

“Investigators develop a ‘case’ definition, a time period of concern, and the population at risk. They then calculate the expected number of cases and compare them to the observed number. A cluster is confirmed when the observed/expected ratio is greater than 1.0, and the difference is statistically significant.”

  1. Suppose that a particular cancer has an incidence of 1 case per thousand people per year. If you follow a particular cohort of 100 people for 10 years, you would expect to see about 1 case. If you saw two cases, that would not be very surprising, but more than than two would be rare.

    Write a program that simulates a large number of cohorts over a 10 year period and estimate the distribution of total cases.

  2. An observation is considered statistically significant if its probability by chance alone, called a p-value, is less than 5%. In a cohort of 100 people over 10 years, how many cases would you have to see to meet this criterion?
  3. Now imagine that you divide a population of 10000 people into 100 cohorts and follow them for 10 years. What is the chance that at least one of the cohorts will have a “statistically significant” cluster? What if we require a p-value of 1%.?
  4. Now imagine that you arrange 10000 people in a 100 × 100 grid and follow them for 10 years. What is the chance that there will be at least one 10 × 10 block anywhere in the grid with a statistically significant cluster?
  5. Finally, imagine that you follow a grid of 10000 people for 30 years. What is the chance that there will be a 10-year interval at some point with a 10 × 10 block anywhere in the grid with a statistically significant cluster?
Exercise 8  

In 1941 Joe DiMaggio got at least one hit in 56 consecutive games6. Many baseball fans consider this streak the greatest achievement in any sport in history, because it was so unlikely.

Use a Monte Carlo simulation to estimate the probability that any player in major league baseball will have a hitting streak of 57 or more games in the next century.

9.4  Bayes’ theorem

Bayes’ theorem is a relationship between the conditional probabilities of two events. A conditional probability, often written P(A|B) is the probability that Event A will occur given that we know that Event B has occurred. Bayes’ theorem states:

P(A|B) = 
P(B|A)P(A)
P(B)
 

To see that this is true, it helps to write P(A and B), which is the probability that A and B occur

P(A and B) = P(AP(B|A

But it is also true that

P(A and B) = P(BP(A|B

So

P(BP(A|B) = P(AP(B|A

Dividing through by P(B) yields Bayes’ theorem.

Bayes’ theorem is often interpreted as a statement about how a body of evidence, E, affects the probability of a hypothesis, H:

P(H|E) = P(H
P(E|H)
P(E)
 

In word, this equation says that the probability of H after you have seen E is the product of P(H), which is the probability of H before you saw the evidence, and the ratio of P(E|H), the probability of seeing the evidence assuming that H is true, and P(E), the probability of seeing the evidence under any circumstances (H true or not).

This way of reading Bayes’ theorem is called the “diachronic” interpretation because it describes how the probability of a hypothesis gets updated over time in light of new evidence. In this context, P(H) is called the prior probability and P(H|E) is called the posterior. The update term P(E|H)/P(E) is called the likelihood ratio.

A classic use of Bayes’ Theorem is the interpretation of clinical tests. For example, routine testing for illegal drug use is increasingly common in workplaces and schools (See aclu.org/drugpolicy/testing/index.html.). The companies that perform these tests maintain that the tests are sensitive, which means that they are likely to produce a positive result if there are drugs in a sample, and specific, which means that they are likely to yield a negative result if there are no drugs.

Studies from the Journal of the American Medical Association7 to estimate that the sensitivity of common drug tests is about 60% and the specificity is about 99%.

Now suppose these tests are applied to a workforce where the actual rate of drug use is 5%. Of the employees who test positive, how many of them actually use drugs?

In Bayesian terms, we want to compute the probability of drug use given a positive test, P(D|+). By Bayes’ Theorem:

P(D|+) = P(D
P(+|D)
P(+)
 

The prior, P(D) is the probability of drug use before we see the outcome of the test, which is 5%. The numerator of the likelihood ratio, P(+|D), is the probability of a positive test assuming drug use, which is the sensitivity.

The denominator, P(+) is a little harder to evaluate. We have to consider two possibilities, P(+|D) and P(+|N), where N is the hypothesis that the subject of the test does not use drugs:

P(+) = P(DP(+|D) + P(NP(+|N

The probability of a false positive, P(+|N), is the complement of the specificity, or 1%.

Putting it together, we have

P(D|+) = 
P(DP(+|D)
P(DP(+|D) + P(NP(+|N)

Plugging in the given values yields P(D|+) = 0.76, which means that of the people who test positive, about 1 in 4 is innocent.

Exercise 9  

Write a program that takes the actual rate of drug use, and the sensitivity and specificity of the test, and uses Bayes’ Theorem to compute P(D|+).

Suppose the same test is applied to a population where the actual rate of drug use is 1%. What is the probability that someone who tests positive is actually a drug user?

Exercise 10  

This exercise is from wikipedia.org/wiki/Bayesian_inference.

“Suppose there are two full bowls of cookies. Bowl 1 has 10 chocolate chip and 30 plain cookies, while Bowl 2 has 20 of each. Our friend Fred picks a bowl at random, and then picks a cookie at random. The cookie turns out to be a plain one. How probable is it that Fred picked it out of Bowl 1?”


1
See Hoffman, The Man Who Loved Only Numbers, page 83.
2
These questions are adapted from Mlodinow, The Drunkard’s Walk.
3
For example, see Gilovich, Vallone and Tversky, “The hot hand in basketball: On the misperception of random sequences,” 1985.
4
From cdc.gov/nceh/clusters/about.htm.
5
See Gawande, “The Cancer Cluster Myth,” New Yorker, Feb 8, 1997.
6
See wikipedia.org/wiki/Hitting_streak.
7
I got these number from Gleason and Barnum, “Predictive Probabilities In Employee Drug-Testing,” at piercelaw.edu/risk/vol2/winter/gleason.htm.

Previous Up