Query syntax

Keywords therefore have a special status in IR and as part of the FOA process. Not only must they be exhaustive enough to capture the entire topical scope reflected by the corpora's domain of discourse, but they must also be expressive enough to characterize any information needs the users might have.

Of course we need not restrict our users to only one of these keywords. It seems quite natural for queries to be composed of two or three, perhaps even dozens, of keywords. Recent empirical evidence suggests that many typical queries have only two or three keywords (cf. Section §8.1 ), but even this number provides a great combinatorial extension to the basic vocabulary of single keywords. Other applications, for example using a document itself as a query, (i.e. using it as an example - Give me more like this.'') can generate queries with hundreds of keywords. Regardless of size, queries defined only as sets of keywords will be called SIMPLE QUERIES . Many Web search engines support only simple queries. Often however, the search engines also provide more advanced interfaces including OPERATORS in the query language as well. Perhaps, because you have previously been warped by an exposure to computer science:), you may be thinking that sets of keywords might be especially useful if joined by Boolean operators. For example, if we have one set of documents about \term{NEURAL NETWORKS} and another set of documents about \term{SPEECH RECOGNITION}, we can expect the query: \smallskip \term{NEURAL NETWORKS AND SPEECH RECOGNITION} \smallskip to correspond to the intersection of these two sets, while \smallskip \term{NEURAL NETWORKS OR SPEECH RECOGNITION} \smallskip would correspond to their union.

The Boolean operator is a bit more of a problem. If users say they want things that are {\em not} about \term{NEURAL NETWORKS}, they are in fact referring to the vast majority of the corpus. That is, \term{NOT} is more appropriately considered a binary, subtraction operator. To make this distinction explicit we will call it \term{BUT\_NOT} .

There are other syntactic operators that are often included in a search engine's query language, but these will be put off until later. Even with these simple Boolean connectives and a keyword vocabulary of reasonable size, users are capable of constructing a vast number of potential queries in attempting to express their information need.

Subsections

FOA © R. K. Belew - 00-09-21