Mentioned in class 1 October, due before end of the quarter.

As part of your class participation, you must do a project; you can do more than one for extra credit. You may choose to work as part of a team of up to three people. You must prepare a project proposal, including the topic, approach, team members, their responsibilities, and a schedule; you must get this project proposal approved by me.

The following are some suggestions; you also can suggest your own topics.

- Research some non-trivial statistic of public interest in some detail,
and report your results; support your conclusions carefully; contact some
subject area experts (not just politicians!) by phone or in person; you
may also survey some non-experts, and report the results of your survey.
Here are some suggestions:
**Unemployment**. I read that during the Reagan administration, unemployment was kept down by changing its definition! Not once, but several times. (And also by putting political pressure on the agency responsible for computing the statistics.) Discover what changes (if any) were made in the computation of this statistic, when were they made, and what was their likely effect.**US Census**. I read that changes in the way census data are collected may cause certain minority groups to be underrepresented. Find out what major changes have been made in how US census data is collected, and what its likely effects would be.**US Income Spread**. It has been widely claimed that changes in US economic policy have lead to greater income disparities, i.e., more poor and more rich; this has been characterized as unjust. Try to find out what measures of income dispersion are involved in the press reports, and also try to find out, e.g., sample fourth spread values. Find out how various statistics have changed over time, and try to correlate these changes with policy. Identify factors that need to be considered carefully.

**Zipf's Law**is a remarkable empirical observation, that frequencies of word use in a large text are roughly inversely proportional to their ranks in the decreasing frequency order). More formally, f(r) = c / (r ** b), where r is the rank of a word, f(r) is its frequency, and c and b are constants, with b approximately 1.

For this project, you should write a computer program to count words in fairly large texts (at least 10,000 words) and see if the law is true; see if different texts have different words at different ranks, and by how much they differ; try to get some texts that intuitively seem similar, and others that intuitively seem different; use at least 3 texts; each text should be homogeneous (same author, same subject, same style). You should also look into the literature on Zipf's law: there is a book by Mandelbrodt with a "proof" (that many people think is not convincing); I didn't find anything on the web, but it wasn't a thorough search; but there was a paper at a recent conference with a nice proof, and a number of references.

**Statistics on the Web.**There seems to be a lot of information about statistics on the World Wide Web, including some Java applets that demonstrate statistical principles and procedures. For this project, you should explore what is available on the web, make a homepage with links to some of the best (and maybe the worst) examples, with an evaluation for each. You could also copy over some of the best material to the local site, and illustrate, improve and/or extend it.

**Critical assessment of Control Chart and Taguchi methods.**Although their usefulness has not been (very much) questioned, there has much discussion about*WHY*control charts and their extensions work, and what are their limits. It would be interesting to talk with both practitioners and theoreticians and read some of the critical literature to discover what the issues are and assess their practical significance. This project should also carefully consider a small number of studies of real cases. Some relevant literature is found on page 698 of Devore.

**Bayesian Assessment of Authorship.**A fascinating book by Mosteller and Wallace,*Applied Bayesian and Classical Inference: the Case of The Federalist Papers*(Springer-Verlag, 1984), discusses methods for assessing authorship of disputed texts, and applies them to "The Federalist Papers," written by Hamilton, Madison and Jay to argue for adoption of the Constitution by the States. The book makes much use of the negative binomial and Poisson distributions, and is difficult, but an ambitious group of students could get a fascinating project out of it. In any case, you might want to look at this book some day.

Return to CSE 270 home page.

11 November 1996.