DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
UNIVERSITY OF CALIFORNIA, SAN DIEGO

CSE 291: Statistical Learning

Assignment 1

Due Thursday January 29 in class.



GENERAL INSTRUCTIONS

You are encouraged to collaborate while solving the problems posed, and to use any books and other resources you wish.  However, you must write up your final solutions independently.  You are encouraged to share code, but you should have your own working implementation and you should write your own explanation of experimental results.  Your answers should be written in good, concise English with all necessary diagrams, plots, and explanations.  If necessary, you may make assumptions that are reasonable, and that do not make a question trivial.  If you do make an assumption, state it clearly.

These assignments are training for writing research papers.  Write up your answer to each question as if it were a piece of a research paper.  Polish your explanations, cite your sources, contribute something clear and definite (i.e. the result you are asked to show), but do not reinvent the wheel and do not get stuck on any single minor issue.

LaTeX is the intergalactic standard for writing research papers with mathematical content, so you should use it.  Only LaTeX can really typeset equations in a perfectly correct way.  (Mathematica, Word, and troff do not.)  Explain your work in full sentences and paragraphs, but make the answer to each question less than two pages single-spaced, unless it is really necessary to use more space.  Use BibTeX for citations.  Insert figures generated by Matlab into your LaTeX text at the appropriate places.  Make figures as simple and small as possible while still making them easy to read. On the due date, you should submit a stapled 8.5x11 printout in class.

Mathematical proofs should be clear and not contain unexplained leaps, but it is not necessary to go into technical detail about measure theory, etc.  The statistical ideas are what is most important.  In a proof, you have a lot of latitude to assume standard results, but if you do so, you should state each result precisely and cite a source

For every problem below, you should create numerical examples in Matlab to check experimentally the correctness of the claim.  Depending on the claim, you may want to do the numerical verification before, after, or in parallel with the proof.  In your submission, you should describe your verification process concisely, usually with a relevant plot generated in Matlab.

The Matlab examples are important.  They are the computational, modern component of the course.  The lesson to learn is how to use computation to advance understanding, to confirm symbolic results and to provide new insights that can be the springboard for further mathematical results.


PROBLEMS

Please use http://www.quicktopic.com/25/H/AGpLyTktYkR6 to ask questions about these problems.


(1) Silvey, 1.1 parts (a) and (b).  

Note that just obtaining the mean and variance for either part is not sufficient.  You must also verify that the distribution actually is Gaussian in part (a), and chi-squared in part (b).  These results are basic, but proving them is non-trivial.  For this assignment, you should just verify them numerically; no proofs are required.

Make your verification careful, convincing, and easy to understand.  In general, this is what is needed for any result that you believe is true but that is too difficult to prove formally.  To be convincing, take advantage of fast computers to use large enough sample sizes to reduce unwanted noise.  To be easy to understand, often, one two-dimensional figure showing two superimposed functions (e.g. an empirical histogram and a theoretical distribution) for visual comparison is ideal.  Always explain precisely but concisely how the functions plotted are defined mathematically. 


(2) Silvey, 2.4.  

Here, you should give a proof.  Some general notes on proving theorems:  

  1. Do not try to show a difficult inequality using one global series of inequalities.  Instead, decompose the problem into parts, and prove each part separately.
  2. When you have a series of equations or inequalities, explain the most important steps of reasoning in English.  Use full sentences, not verbless phrases.  Use words like "therefore" rather than informal mathematical symbols such as =>.
  3. It is easy to make mistakes in the algebraic manipulations required for this problem.  It is much easier to check a proof than to invent one, so you should never engage in wishful thinking and submit a proof with an algebraic error.


(3) Silvey, 1.3.  

Here also, you should give a proof.  Note that the answer is a well-known standard distribution--which one?  In general, it is vital to express mathematical results as simply as possible, and to relate new results to old concepts.  Doing this makes it much easier to build on the new result in further reasoning.


(4) Silvey, 2.8.
 
Here, you should clarify how you are following the informal algorithm for finding MVUEs explained in class.  In general, always make the high-level outline of a proof explicit.

Make clear what the sample space is, and what the family of probability distributions is.  You must do this explicitly as the foundation for finding any MVUE.

To illustrate numerically that you have found the MVUE, you should show that your estimator is unbiased, and that it has variance smaller than some other reasonable unbiased estimator.  When you present numerical results in a figure or table, always say explicitly in English what the important lesson(s) to be drawn are.

The answer M/x is incorrect, but for large M it is very close to the correct answer.  Be sure that your numerical experiments reveal that M/x is in fact incorrect.  This is a good example of how numerical experiments can never prove that a mathematical result is correct.  They can only cast doubt on it or suggest that it is approximately correct.

 

(5) Silvey, 2.1.