Some Possible Projects for CSE 270
Mentioned in class 1 October, due before end of the quarter.
As part of your class participation, you must do a project; you can do more
than one for extra credit. You may choose to work as part of a team of up to
three people. You must prepare a project proposal, including the topic,
approach, team members, their responsibilities, and a schedule; you must get
this project proposal approved by me.
The following are some suggestions; you also can suggest your own topics.
- Research some non-trivial statistic of public interest in some detail,
and report your results; support your conclusions carefully; contact some
subject area experts (not just politicians!) by phone or in person; you
may also survey some non-experts, and report the results of your survey.
Here are some suggestions:
You can choose other statistical problems and do similar research.
Something on manufacturing would be great. It is not vital that the area
be politically sensitive, but it is vital that objective information is
available on which to base your discussion.
- Unemployment. I read that during the Reagan administration,
unemployment was kept down by changing its definition! Not once,
but several times. (And also by putting political pressure on the
agency responsible for computing the statistics.) Discover what
changes (if any) were made in the computation of this statistic,
when were they made, and what was their likely effect.
- US Census. I read that changes in the way census data are
collected may cause certain minority groups to be underrepresented.
Find out what major changes have been made in how US census data is
collected, and what its likely effects would be.
- US Income Spread. It has been widely claimed that changes in
US economic policy have lead to greater income disparities, i.e.,
more poor and more rich; this has been characterized as unjust.
Try to find out what measures of income dispersion are involved in
the press reports, and also try to find out, e.g., sample fourth
spread values. Find out how various statistics have changed over
time, and try to correlate these changes with policy. Identify
factors that need to be considered carefully.
- Zipf's Law is a remarkable empirical observation, that
frequencies of word use in a large text are roughly inversely
proportional to their ranks in the decreasing frequency order). More
formally, f(r) = c / (r ** b), where r is the rank of a word, f(r) is
its frequency, and c and b are constants, with b approximately 1.
For this project, you should write a
computer program to count words in fairly large texts (at least 10,000
words) and see if the law is true; see if different texts have different
words at different ranks, and by how much they differ; try to get some
texts that intuitively seem similar, and others that intuitively seem
different; use at least 3 texts; each text should be homogeneous (same
author, same subject, same style). You should also look into the
literature on Zipf's law: there is a book by Mandelbrodt with a "proof"
(that many people think is not convincing); I didn't find anything on the
web, but it wasn't a thorough search; but there was a paper at a recent
conference with a nice proof, and a number of references.
- Statistics on the Web. There seems to be a lot of information
about statistics on the World Wide Web, including some Java applets that
demonstrate statistical principles and procedures. For this project, you
should explore what is available on the web, make a homepage with links
to some of the best (and maybe the worst) examples, with an evaluation
for each. You could also copy over some of the best material to the
local site, and illustrate, improve and/or extend it.
- Critical assessment of Control Chart and Taguchi methods.
Although their usefulness has not been (very much) questioned, there has
much discussion about WHY control charts and their extensions
work, and what are their limits. It would be interesting to talk with
both practitioners and theoreticians and read some of the critical
literature to discover what the issues are and assess their practical
significance. This project should also carefully consider a small number
of studies of real cases. Some relevant literature is found on page 698
- Bayesian Assessment of Authorship. A fascinating book by
Mosteller and Wallace, Applied Bayesian and Classical Inference: the
Case of The Federalist Papers (Springer-Verlag, 1984), discusses
methods for assessing authorship of disputed texts, and applies them to
"The Federalist Papers," written by Hamilton, Madison and Jay to argue
for adoption of the Constitution by the States. The book makes much use
of the negative binomial and Poisson distributions, and is difficult, but
an ambitious group of students could get a fascinating project out of it.
In any case, you might want to look at this book some day.
Return to CSE 270 home page.
11 November 1996.