We have been discussing RelFbk from the individual user's point of view. We've focused on how this information might be collected, and how it might be used, both in the short-term to modify users' retrievals and in a longer term way to change the document's indices. Now we want to consider a third use for RelFbk information. When we have more than one system to use for retrieval and would like to evaluate which is doing the better job, users' assessments of retrieved documents' relevance can be used as ``grades''. If one system can consistently, across a range of typical queries, more frequently retrieve documents that the users mark as relevant, and fewer that they mark as irrelevant, then that system is doing a better job.

