Today's handout is a tutorial
on Bayesian networks written by Prof. Daphne
Koller of Stanford.
If X and Y are discrete random variables then Pr(X=x|Y=y) is a two-dimensional table with one entry for each pair of values <x,y>.
The "product rule" says that Pr(A & B) = Pr(A|B)Pr(B).
Note that Pr(A|B) = 0.8 does not mean "whenever B is true then Pr(A) = 0.8" because it may be the case, for example, that Pr(A|B,C) = 0.
By definition, the event A is independent of the event B if and only
if Pr(A & B) = Pr(A) * Pr(B)
Notice that then Pr(A|B) = Pr(A & B) / Pr(B) = (Pr(A)
* Pr(B)) / Pr(B) = Pr(A)
The statement "A is independent of B" is equivalent to Pr(A|B) =
Pr(A) and also equivalent to Pr(B|A) = Pr(B)/
Supposedly Bayes did not publish his discovery because he thought it was hubris for humans to investigate the will of God. He discovered how to compute Pr(Y|X) based on knowledge of Pr(X|Y).
What Bayes discovered is this formula:
Pr(Y|X) = Pr(Y &
X)/Pr(X) = (Pr(X|Y)Pr(Y)) / Pr(X)
Example:
The following article review (author unknown), available at http://www.stat.unipg.it/ncsu/info/jse/v5n3/resource.html
shows how difficult commonsense reasoning with probabilities is.
Review of The Psychology of Good Judgment by Gerd Gigerenzer (1996). Medical Decision Making, 16(3), 273-280.
Gigerenzer argues that physicians and their patients will better understand the chance of a false positive result if we replace the conventional conditional probability analysis by an equivalent frequency method. The success of this method is illustrated in terms of an experiment that Gigerenzer and his colleague Ulrich Hoffrage carried out. They asked 48 physicians in Munich to answer questions relating to four different medical-diagnosis problems. For the four questions given to each physician, two were given using the probability format and two using the frequency format.Note: Joe Drish has pointed out that the numbers above are wrong. 0.096 should simply be 0.10, and 0.078 should be 0.0748.One of the four diagnostic problems was a mammography problem: To facilitate early detection of breast cancer, women are encouraged from a particular age on to participate at regular intervals in routine screening, even if they have no obvious symptoms. Imagine you conduct in a certain region such a breast cancer screening using mammography. For symptom-free women aged 40 to 50 who participate in screening using mammography, the following information is available for this region.
Probability format:
The probability that one of these women has breast cancer is 1%. If a woman has breast cancer, the probability is 80% that she will have a positive mammography test. If a woman does not have breast cancer, the probability is 10% that she willstill have a positive mammography test. Imagine a woman (aged 40 to 50, no symptoms) who has a positive mammography test in your breast cancer screening. What is the probability that she actually has breast cancer? _____%
Frequency format:
Ten out of every 1,000 women have breast cancer. Of these 10 women with breast cancer, 8 will have a positive mammography test. Of the remaining 990 women without breast cancer, 99 will still have a positive mammography test.Imagine a sample of women (aged 40 to 50, no symptoms) who have positive mammography tests in your breast cancer screening. How many of these women do actually have breast cancer? _____ out of _____
In a classic study by D. M. Eddy (see Dowie J. Elstein (ed.) (1988), Professional Judgment: A Reader in Clinical Decision Making, Cambridge University Press, pp. 45-590), essentially this same question, with just the probability format, was given to 100 physicians. Ninety-five of the physicians gave the answer of approximately 75% instead of the correct answer, which, in this example, is 7.8%.
In the present study, Gigerenzer found that, when the information was presented in the probability format, only 10% reasoned with the Bayes computation
P(breast cancer | positive test) =
(.01)(.80)/[(.01)(.80) + (.99)(.096)] = .078.
For the group given the frequency format, 46% computed the Bayes probability in the simpler form:
P(breast cancer | positive test) = 8/(8 + 99) = .078.
The article discusses some of the reactions of the physicians to even considering such problems. Here are some quotes:On such a basis one can't make a diagnosis. Statistical information is one big lie.Some doctors commented that getting the answer in the frequency form was simple. A more detailed analysis of this kind of study can be found in the article How to Improve Bayesian Reasoning Without Instruction: Frequency Formats by Gigerenzer and Hoffrage (1995), Psychological Review, 102, 684-704.I never inform my patients about statistical data. I would tell the patient that mammography is not so exact, and I would in any case perform a biopsy.
Oh, what nonsense. I can't do it. You should test my daughter. She studies medicine.
Statistics is alien to everyday concerns and of little use for judging individual persons.
Formally, a Bayesian network is an acyclic directed graph where the nodes are random variables. Intuitively, an edge in a Bayesian network goes from X to Y iff Y depends directly on X. In other words, iff X causally influences Y directly.
See these notes on Bayesian networks written by Prof. Daphne Koller of Stanford.
The independence assumptions made by a naive Bayesian classifier can be represented graphically using a very simple Bayesian network. This network has a random variable C for the class, and random variables A1 through Ak for the k attributes. There is an edge going from C to each Aj and no other edges.
The absence of an edge between Ai and Aj
indicates
that Pr(Ai | Aj, C) = Pr(Ai
| C)