CSE 250A LECTURE NOTES

March 12, 2001
 
 

ANNOUNCEMENTS

I'll return the third project on Wednesday.  I'll be emailing a few people to come for ten minutes to discuss writing skills.  Feel free to email me directly for an appointment if you like.

Today's handout is a tutorial on Bayesian networks written by Prof. Daphne Koller of Stanford.
 
 

CONDITIONAL PROBABILITIES

We can talk about probabilities of propositional combinations of events.
                                             Pr(A & B)
By definition   Pr(A|B) = ---------   where A and B are events.
                                                  Pr(B)

If X and Y are discrete random variables then Pr(X=x|Y=y) is a two-dimensional table with one entry for each pair of values <x,y>.

The "product rule" says that  Pr(A & B) = Pr(A|B)Pr(B).

Note that Pr(A|B) = 0.8 does not mean "whenever B is true then Pr(A) = 0.8" because it may be the case, for example, that   Pr(A|B,C) = 0.

By definition, the event A is independent of the event B if and only if  Pr(A & B) = Pr(A) * Pr(B)
Notice that then  Pr(A|B) = Pr(A & B) / Pr(B) = (Pr(A) * Pr(B)) / Pr(B) = Pr(A)

The statement "A is independent of B" is equivalent to Pr(A|B) = Pr(A) and also equivalent to Pr(B|A) = Pr(B)/
 
 

BAYES' RULE

Bayes' rule is actually a theorem, due to the Reverend Thomas Bayes of England, 1702-1761.  This discovery was published posthumously in 1763.

Supposedly Bayes did not publish his discovery because he thought it was hubris for humans to investigate the will of God.  He discovered how to compute Pr(Y|X) based on knowledge of Pr(X|Y).

What Bayes discovered is this formula:
        Pr(Y|X) = Pr(Y & X)/Pr(X) = (Pr(X|Y)Pr(Y)) / Pr(X)

Example:

What is the probability that a patient with a stiff neck actually has meningitis?
 
 

INFORMAL PROBABILITY REASONING

The same facts can be stated as ecologically valid frequencies:  of 100000 patients, 2 have meningitis, one of those has a stiff neck, and overall 5000 have a stiff neck.  Reasoning with numbers in this format is much easier, presumably because experience gives us absolute counts directly, not frequency ratios.

The following article review (author unknown), available at http://www.stat.unipg.it/ncsu/info/jse/v5n3/resource.html
shows how difficult commonsense reasoning with probabilities is.

Review of The Psychology of Good Judgment by Gerd Gigerenzer (1996). Medical Decision Making, 16(3), 273-280.

Gigerenzer argues that physicians and their patients will better understand the chance of a false positive result if we replace the conventional conditional probability analysis by an equivalent frequency method. The success of this method is illustrated in terms of an experiment that Gigerenzer and his colleague Ulrich Hoffrage carried out. They asked 48 physicians in Munich to answer questions relating to four different medical-diagnosis problems. For the four questions given to each physician, two were given using the probability format and two using the frequency format.

One of the four diagnostic problems was a mammography problem: To facilitate early detection of breast cancer, women are encouraged from a particular age on to participate at regular intervals in routine screening, even if they have no obvious symptoms. Imagine you conduct in a certain region such a breast cancer screening using mammography. For symptom-free women aged 40 to 50 who participate in screening  using mammography, the following information is available for this region.

     Probability format:

The probability that one of these women has breast cancer is 1%. If a woman has breast cancer, the probability is 80% that she will have a positive mammography test. If a woman does not have breast cancer, the probability is 10% that she willstill have a positive mammography test. Imagine a woman (aged 40 to 50, no symptoms) who has a positive mammography test in your breast cancer screening. What is the probability that she actually has breast cancer? _____%

     Frequency format:

Ten out of every 1,000 women have breast cancer. Of these 10 women with breast cancer, 8 will have a positive mammography test. Of the remaining 990 women without breast cancer, 99 will still have a positive mammography test.Imagine a sample of women (aged 40 to 50, no symptoms) who have positive mammography tests in your breast cancer screening. How many of these women do actually have breast cancer? _____ out of _____

In a classic study by D. M. Eddy (see Dowie J. Elstein (ed.) (1988), Professional Judgment: A Reader in Clinical Decision Making, Cambridge University Press, pp. 45-590), essentially this same question, with just the probability format, was given to 100 physicians.  Ninety-five of the physicians gave the answer of  approximately 75% instead of the correct answer, which, in this example, is 7.8%.

In the present study, Gigerenzer found that, when the information was presented in the probability format, only 10% reasoned with the Bayes computation

     P(breast cancer | positive test) =

     (.01)(.80)/[(.01)(.80) + (.99)(.096)] = .078.

For the group given the frequency format, 46% computed the Bayes probability in the simpler form:

     P(breast cancer | positive test) = 8/(8 + 99) = .078.
 

Note: Joe Drish has pointed out that the numbers above are wrong.  0.096 should simply be 0.10, and 0.078 should be 0.0748.
 
The article discusses some of the reactions of the physicians to even considering such problems. Here are some quotes:
 On such a basis one can't make a diagnosis. Statistical information is one big lie.

 I never inform my patients about statistical data. I would tell the patient that mammography is not so exact, and I would in any case perform a biopsy.

 Oh, what nonsense. I can't do it. You should test my daughter. She studies medicine.

 Statistics is alien to everyday concerns and of little use for judging individual persons.

Some doctors commented that getting the answer in the frequency form was simple. A more detailed analysis of this kind of study can be found in the article How to Improve Bayesian Reasoning Without Instruction: Frequency Formats by Gigerenzer and Hoffrage (1995), Psychological Review, 102, 684-704.

 

BAYESIAN NETWORKS

Independence relationships can be drawn graphically using a so-called Bayesian network (BN).  BNs are also called belief networks, probabilistic networks, and causal networks.

Formally, a Bayesian network is an acyclic directed graph where the nodes are random variables.  Intuitively, an edge in a Bayesian network goes from X to Y iff Y depends directly on X.  In other words, iff X causally influences Y directly.

See these notes on Bayesian networks written by Prof. Daphne Koller of Stanford.

 

THE "NAIVE" BAYES NETWORK

Bayesian learning is an approach to supervised learning using the language of probability theory and Bayes' rule.

The independence assumptions made by a naive Bayesian classifier can be represented graphically using a very simple Bayesian network.  This network has a random variable C for the class, and random variables A1 through Ak for the k attributes.  There is an edge going from C to each Aj  and no other edges.

The absence of an edge between Ai and Aj indicates that   Pr(Ai | Aj, C) = Pr(Ai | C)