*********************************************************************** File ~/../public/A5/README Homework Assignment 5 CSE 150, UCSD Winter 1997 Due: Tuesday, March 11, midnight *********************************************************************** >>> SUMMARY: This assignment will explore natural language processing, using a very simple language. (In fact, it is *finite*!) Parsing means that we are trying to identify the phrase structure of a "sentence" of a particular language. A "valid" sentence is one that has an acceptable phrase structure. We will give you a parser, and you will modify it to give it additional coverage. Then you will write a "semantic checker" to reject certain sentences. Although Prolog has "Grammar Rules" that facilitate this, you are *NOT* to use this in the assignment. We are trying to emphasize the basics of Prolog, and while this could be useful later, taking the time to see the "underbelly" of parsing and Prolog is useful. ----------------------------------------------------------------------- Part I. Augment our Parser. We will define our language that is to be parsed in terms of a grammar. The set of rules that define the *terminals* in this grammar is called the lexicon. The building blocks are nouns, verbs, determiners, and proper nouns (names). For example, there are two determiners in our language: "a" and "the". The set of grammar rules that specify the correct structures for these elements is a simplified version of English. So, for example, a noun phrase consists of either a name or a determiner followed by a noun. The following is in standard BNF. Our lexicon: noun -> vase | lamp | sandwich | apple | hammer | rock verb -> breaks | loves | eats determiner -> the | a name -> billy | john | fiona | mary Our grammar: sentence -> noun_phrase verb_phrase noun_phrase -> name | determiner noun verb_phrase -> verb | verb noun_phrase Using these rules above, we have constructed a Prolog parser. Try to work through what this code is doing and see if you understand how it applies the above rules. This is called a "recursive descent" parser: For each rule in the grammar, there is a procedure that recognizes phrases of that type. In the following grammar, all of the predicates take a second argument that contains the "rest" of the sentence after the phrase has been recognized. Our parser: %% Note that a sentence is a noun phrase followed by %% a verb phrase, and after parsing the verb phrase, %%there should be nothing left to parse. Hence the empty list %%on the verb phrase call. sentence(S0) :- noun_phrase(S0,S1), verb_phrase(S1,[]). noun_phrase(S0,S) :- parse_name(S0,S). noun_phrase(S0,S) :- parse_determiner(S0,S1), parse_noun(S1,S). verb_phrase(S0,S) :- parse_verb(S0,S). verb_phrase(S0,S) :- parse_verb(S0,S1), noun_phrase(S1,S). parse_determiner([X|S],S) :- determiner(X). parse_noun([X|S],S):- noun(X). parse_verb([X|S],S) :- verb(X). parse_name([X|S],S) :- name(X). name(billy). name(john). name(fiona). name(mary). noun(vase). noun(lamp). noun(sandwich). noun(apple). noun(hammer). noun(rock). verb(breaks). verb(loves). verb(eats). determiner(the). determiner(a). The parser accepts sentences in list format. (i.e. sentences of the form: [fiona,eats,the,apple] or [billy,breaks,the,vase]). For example: ?- sentence([billy,breaks,the,vase]). yes ?- YOUR TURN: Using the same style as above, add prepositional phrases, making the following changes to the lexicon: prep -> with verb_phrase -> verb_phrase prep_phrase prep_phrase -> prep noun_phrase So there is now an additional word "with", which will require a "parse_prep" rule, and two new rules: a verb_phrase can be a verb_phrase followed by a prep_phrase, and a prep_phrase is just a prep followed by a noun_phrase. ----------------------------------------------------------------------- Part II. A "real" parser The solution given above is merely a "recognizer". Now alter the solution by defining a new predicate sentence(S0,Parse) that returns a parse tree in Parse, following the structure given in the lecture notes for sentences. I.e., for the following call, ?- sentence([john,loves,mary],P). P=sentence(noun_phrase(name(john)), verb_phrase(verb(loves),noun_phrase(name(mary)))) yes A simple technique for doing this is to add another argument to every parsing rule that describes the parse. Depending on how you do it, this could involve simply describing the result in the head of the procedure call, or adding a unification at the end of the rule that binds the result to "the answer". An example of the first method for the "determiner" class is: parse_determiner([X|S],S,determiner(X)) :- determiner(X). Notice that prolog doesn't care if a term (an argument to a predicate) "looks like" a predicate (that is, "determiner" in the above is also a predicate, but prolog doesn't care). Also notice that prolog doesn't care if I define this new rule for parse_determiner that takes three arguments -- prolog keeps track of the two argument version and the three argument version, and they are DIFFERENT predicates. After you have defined these rules, your parser should return answers such as: | ?- sentence([billy,breaks,the,vase],P). P = sentence(noun_phrase(name(billy)), verb_phrase(verb(breaks),noun_phrase(determiner(the),noun(vase)))); no | ?- sentence([billy,breaks,the,vase,with,a,hammer],P). P = sentence(noun_phrase(name(billy)), verb_phrase( verb_phrase(verb(breaks), noun_phrase(determiner(the),noun(vase))), prep_phrase(prep(with), noun_phrase(determiner(a),noun(hammer))))); no | ?- As far as we can tell, this problem does NOT require the use of "cut". However, you do have to do something slightly sneaky to not get an infinite recursion on the verb phrase in your verb phrase with a prepositional phrase rule. ----------------------------------------------------------------------- Part III. Putting in semantic constraints. The current version of our parser will happily recognize "silly" sentences like: "a lamp breaks the sandwich with fiona" We would like to treat such sentences as incorrect, by imposing semantic (meaning) constraints on our parses. Write a "semantic checker" that rejects such sentences. We will do this by noticing that verbs specify the things they "expect" in a sentence. For example, "likes" expects someone doing the liking (the AGENT), something or someone being liked (the OBJECT). "breaks" expects someone doing the breaking (the AGENT), something being broken (the OBJECT), and possibly something being used to do the breaking (the INSTRUMENT). So in the sentence, "John breaks the window with the hammer", John is the AGENT, the window is the OBJECT, and the hammer is the INSTRUMENT. This is called the "case frame" for the verb, and the roles (AGENT, OBJECT, INSTRUMENT) are called "cases." Notice that cases have semantic restrictions on them: AGENTs have to be ANIMATE. Some verbs OBJECTs have to be INANIMATE (e.g. "John breaks Mary" sounds weird), while others do not "John loves Mary" is ok. We can specify such semantic restrictions by stating facts like agent(X,likes,P) :- subject(X,P),animate(X). That is, X is the agent of the parse tree P if X is the subject of P and X is animate. Facts like "animate" can just be added to the database: animate(john). On the other hand, you have to devise a rule for "subject". X is the subject of P if X is the "head" of the noun phrase of P. The head of the noun phrase is the noun in the noun_phrase, or it is the name in the noun_phrase. You can define similar rules for object and instrument. Note you will need separate rules for every verb and for every case of that verb, the way we are doing it. [A more flexible semantic checker would make the generalization that there ard verb types and declaring rules for each type of verb. But we are only going to worry about our three verbs.] The predicate should be called: semantics_ok(P) where P is a parse tree generated by Part II. It will check the semantics by checking the rule for agent is satisfied, the rule for object is satisfied, etc. So we could do: ?- sentence([john,loves,mary],P),semantics_ok(P). P=sentence(noun_phrase(name(john)),verb_phrase(verb(loves), noun_phrase(name(mary)))) yes But: ?- sentence([lamp,loves,mary],P),semantics_ok(P). no. We will do this via the following rules: 1. Verbs that take an animate agent require an animate subject. "loves", "breaks", and "eats" are such verbs. 2. Verbs of destruction (e.g., "breaks") require an artifactual object: hammer, vase, and lamp are artifacts. 3. "eats" requires a food object. 4. "breaks" requires that if there is a "with" prepositional phrase, it should be something that is capable of being a "break" INSTRUMENT, e.g., rock or hammer. There are exceptions to the above rules, but we won't worry about them here. After this, your semantics_ok predicate should accept (for example) the following sentences: billy loves mary billy eats the sandwich fiona breaks the lamp. fiona breaks the lamp with a hammer. fiona breaks the hammer with the hammer. mary breaks the vase with the rock. But it should reject the following sentences: the rock loves mary. billy eats the rock. the rock breaks the lamp. % notice that this one is really % ok, but not by our rules! billy breaks the lamp with fiona. % ditto billy breaks the sandwich with the hammer. billy breaks fiona with the hammer. % we are advocating % peace by our rules.