CSE 150 LECTURE NOTES

October 27, 2004
 
    

CONDITIONAL PROBABILITIES

                                                                                                                                                Pr(A & B)
Let A and B be any two events.  By definition, the probability of A given B is    Pr(A|B) = --------- .
                                                                                                                                                    Pr(B)

Note that Pr(A|B) = 0.8 does not mean "Pr(A) = 0.8 always when B is true."  It may be the case, for example, that  Pr(A|B) = 0.8  and also  Pr(A|B & C) = 0.

If the event A is independent of the event B, then  Pr(A|B) = Pr(A & B) / Pr(B) = (Pr(A) * Pr(B)) / Pr(B) = Pr(A), assuming that the denominators are non-zero.  When doing probability calculations, you always have to pay attention to the case of probabilities that are zero.

 

PRODUCT RULE

The "product rule" says that  Pr(A & B) = Pr(A|B)Pr(B).  This is true by the definition of Pr(A|B).

Recursively, we have Pr(A1 & A2 & A3) = Pr(A1)Pr(A2|A1)Pr(A3|A1,A2) and so on.


BAYES' RULE

Bayes' rule is actually a theorem, due to the Reverend Thomas Bayes, 1702-1761, who was a clergyman in the town of Tunbridge Wells in England..  This discovery was published posthumously in 1763.  Supposedly Bayes did not publish his discovery because he thought it was hubris for humans to investigate the will of God.  He discovered how to compute Pr(Y|X) based on knowledge of Pr(X|Y).

What Bayes discovered is this formula:

        Pr(Y|X) = Pr(Y & X)/Pr(X) = (Pr(X|Y)Pr(Y)) / Pr(X)

Often we do not know directly what Pr(X) is, but we can calculate it easily using the fact  Pr(X) = Pr(X and Y) + Pr(X and not Y).

Example (taken from below):  "... a mammography problem:  ...  For symptom-free women aged 40 to 50 who participate in screening using mammography, the following information is available ...  The probability that one of these women has breast cancer is 1%. If a woman has breast cancer, the probability is 80% that she will have a positive mammography test. If a woman does not have breast cancer, the probability is 10% that she willstill have a positive mammography test. Imagine a woman (aged 40 to 50, no symptoms) who has a positive mammography test in your breast cancer screening. What is the probability that she actually has breast cancer? 

... The correct answer is     P(breast cancer | positive test) = (.01)(.80)/[(.01)(.80) + (.99)(.10)] = .0748.

 

INFORMAL PROBABILITY REASONING

The same facts can be stated as frequencies.  Reasoning with numbers in this format is much easier, presumably because experience gives us absolute counts directly, not frequency ratios.  The following article review (author unknown), available at http://www.stat.unipg.it/ncsu/info/jse/v5n3/resource.html shows how difficult commonsense reasoning with probabilities is.

Review of The Psychology of Good Judgment by Gerd Gigerenzer (1996). Medical Decision Making, 16(3), 273-280.

Gigerenzer argues that physicians and their patients will better understand the chance of a false positive result if we replace the conventional conditional probability analysis by an equivalent frequency method. 

... Frequency format:  Ten out of every 1,000 women have breast cancer. Of these 10 women with breast cancer, 8 will have a positive mammography test. Of the remaining 990 women without breast cancer, 99 will still have a positive mammography test.Imagine a sample of women (aged 40 to 50, no symptoms) who have positive mammography tests in your breast cancer screening. How many of these women do actually have breast cancer? _____ out of _____

In a classic study by D. M. Eddy (see Dowie J. Elstein (ed.) (1988), Professional Judgment: A Reader in Clinical Decision Making, Cambridge University Press, pp. 45-590), essentially this same question, with just the probability format, was given to 100 physicians.  Ninety-five of the physicians gave the answer of  approximately 75% instead of the correct answer, which, in this example, is 7.48%.

In the present study, Gigerenzer found that, when the information was presented in the probability format, only 10% reasoned with the Bayes computation  P(breast cancer | positive test) = (.01)(.80)/[(.01)(.80) + (.99)(.10)] = .0748.

For the group given the frequency format, 46% computed the Bayes probability in the simpler form:     P(breast cancer | positive test) = 8/(8 + 99) = .0748. 

The article discusses some of the reactions of the physicians to even considering such problems. Here are some quotes:
 On such a basis one can't make a diagnosis. Statistical information is one big lie.

 I never inform my patients about statistical data. I would tell the patient that mammography is not so exact, and I would in any case perform a biopsy.

 Oh, what nonsense. I can't do it. You should test my daughter. She studies medicine.

 Statistics is alien to everyday concerns and of little use for judging individual persons.


FRAMEWORK FOR LEARNING A CLASSIFIER

With N training examples, the training data are a matrix with N rows and p columns, where each example is represented by values for p different features.  Let feature value j for example i be x_ij.  The label of example i is y_i, for example y_i = 1 if message i is spam and y_i = 0 if it is not spam.

Each test example is also represented as a row vector of length p.  The label y for a test example is unknown.  A classifier is a function whose input is this row vector, and whose output is a guess at what the value of y is.

Terminology:  the words "feature," "attribute," and "predictor" are all synonyms; also the words  "class" and "label" are synonyms.