Sharon Bertsch Mcgrayne is a talented science writer whose portraits of great scientists of the past are incisive and entertaining. However, she evidently believes that one must studiously avoid dealing with any serious scientific issues in entertaining a popular audience. For this reason, this book is a total failure. Why should a reader care about the history of an idea of which he or she has zero understanding? Mcgrayne turns the history of Bayes rule into a pitched battle between intransigent opponents, but we never find out what the real issue are.
In fact, Bayes rule is a mathematical tautology, being the definition of conditional probability. Suppose A is an event with probability P(A) and B is an event with probability P(B). Let C be the event "both A and B occur." Then the conditional probability P(A|B) of event A, given that we know that B has occurred, just P(C)/P(B). Moreover, if a decision-maker knows P(A), P(B), and P(C), and discovers that B occurred, then he should revise the probability that A occurred to P(A|B) = P(C)/P(B). Why? Well, suppose we have a population of 1000 individuals, where the probability that an event E is true of an individual is P(E), where E is any one of A, B, and C. Then the expected number of individuals for which B is true is 1000*P(B). Of these, the number for which A is also true is 1000*P(A). Therefore, the probability that an individual satisfies A, given that he satisfies B, is 1000*P(A)/1000*P(B) = P(A|B).
For instance, suppose 5% of the population uses drugs, and there is a drug test that is correct 95% of the time: it tests positive on a drug user 95% of the time, and it tests negative on a drug nonuser 95% of the time. If an individual tests positive, we can show using Bayes rule that the probability of his being a drug user is 50%. To see this, let A be the event "subject uses drugs," and let B be the event "subject test positive for using drugs." First, what is the probability P(B) of event B? Well, take a random subject. With probability 1/20 he is a drug user, so with probability (19/20)(1/20)=19/400 he is a drug user testing positive. With probability 19/20 he not a drug user, so he is a non-user testing positive with probability (1/20)(19/20)=19/400. Thus P(B) = 19/400+19/400=38/400. Let event C be "subject uses drugs and tests positive for using drugs." This probability is (1/20) times (19/20) = 19/400. Thus P(A|B) = P(C)/P(B) = 1/2.
If this seems mystifying, consider the following interpretation. Suppose we test 10000 people. The expected number of drug users will be 500, and 95% of them, or 475, will test positive for drug use. But 9500 people will be non-drug users, and 5% of them will erroneously test positive for drug use, which is 475 people. Thus, 50% of those who test positive for drugs are actually drug users.
The real brilliance of Bayes Rule lies in the fact that sometimes we want to find P(A|B) when we don't know either P(C) or P(B), but we do know P(B|A) and P(A). For instance, want to know P(A|B), meaning the probability that an individual who test positive is actually a drug user, but we only know the frequency P(A) of drug use in the population (5%) and the accuracy of the test, which is P(B|A) = 95% (a drug user tests positive with probability 0.95). Then we can write P(A|B)P(B) = P(C)=P(B|A)P(A). From the first and third terms we get P(A|B) = P(B|A)P(A)/P(B). In our case, this gives P(A|B) = 0.95(0.05)/P(B)=0.0475/P(B). But we can also calculate P(B) as follows.
Let N mean "A is false for the subject." Thus P(N) = 1-P(A) = 0.95. Then we have
P(B) = P(B|A)P(A) + P(B|N)P(N), as can be verified by simply multiplying out the right hand side of the equation. Thus in our case we have, given that we know that P(B|N) = 0.95 (the test accurately predicts that a non-user is a non-user with probability 0.95), so we have P(B) = 0.95(0.05) + 0.05(0.95) = 0.095. Thus P(A|B) = 0.0475/0.095 = 1/2.
Isn't this a simple and beautiful result? Only arithmetic and grade school algebra are used to arrive at this stunning result. By the way, for more on Bayes Rule, see my textbook Game Theory Evolving (Princeton 2009).
Now, who could dispute this analysis? It is clearly correct. So where does all of the vehement opposition to Bayes rule come from? The answer is that when a group of individuals (e.g., professional scientists) do not agree on P(A) then you cannot apply Bayes rule. You can however show that under many conditions, repeated observations of events A can lead to mutually acceptable values for P(A). For instance, suppose you know that the weight of a substance per ounce is variable and unknown, and each scientist i has his personal prior probability Pi that the weight is less than one gram per ounce. Suppose we take unbiased samples that are each about one ounce, and we take unbiased measurements of the weight. Then the long-run average of the sample weights will be accepted by all scientists as the updated probability. This is Bayesian updating.
However, it is not true that Bayesian updating always lead to convergence to a common probability distribution. See, for instance, papers by Mordecai Kurz, of Stanford University. Moreover, when observations are limited, the range of assessments of probabilities can be quite wide. This is why Bayes rule is considered "subjective." However, when we really know the probabilities, as in the case of the drug testing example, there is no controversy about the value of Bayes rule. It is extremely valuable, indeed indispensable, in such cases.
This book manages to obfuscate a very simple issue, turning sciences into a vast morality play. Now of course there are deep issues in the philosophy of probability that implicate Bayes rule, but one does not learn what they are from this book.