**2.1** Stansifer's ideas on natural language seem to have come
mainly from formalists like Chomsky, rather than from linguists who study what
real people actually write and say. For example, it is easy to write a short
play about painters redoing a trading room in a bank, where desks are named
"one desk", "two desk", "FX desk", etc., and where one of the painters has the
line, "Painted two desk" in response to his boss's asking what he did. A
number of disgruntled empirical linguists have written little poems that end
with the line "colorless green ideas sleep furiously", meaning something like
"Chomsky's uninteresting untried theories do nothing much after a lot of
effort". (It is an interesting exercise to try this yourself.) Similarly, it
easy to imagine a Star Trek episode in which some creatures called "time
flies" have affection for a certain arrow. We may conclude that almost
anything can be made meaningful, given the right context.

The important point here is that in natural language, context determines whether or not something makes sense; formal syntax and semantics are very far from adequate, and indeed, the distinction among syntax, semantics and pragmatics does not hold up under close examination. On the other hand, the formal linguists' way of looking at syntax and semantics works rather well for programming languages, because we can define things to work that way, and because traditionally, programming language designers want to achieve as much independence from context as we can (though this might change in the future).

**2.1.1** These principles are really important; please think about
them, and the examples that are given. Also, notice that the situation for
natural language is very different.

**2.2** If we denote the empty set by "`{}`

" and the empty
string by the empty character, then there will not be any way to tell the
difference between the empty set and the set containing the empty string. So
this is a bad idea. Instead, we can use the greek letter epsilon for the
empty string, and the Danish letter "O-with-slash" for the empty set, as is
usual in mathematics. Sometimes I like to write "`[]`

" for the
empty string, while Stansifer sometimes writes `""`

for it, and
some other people use the Greek letter lambda! I am afraid that you will have
to get used to there being a lot of notational variantion in mathematics, just
as there is a lot of notational variation for programming languages; in fact,
I am afraid that notational variation will be an ongoing part of the life of
every computer science professional for the foreseeable future, so that
developing a tolerance for it should be part of your professional training.
But to help out a bit, I will write the epsilon for the empty string and for
set membership in different ways; the set epsilon will be much bigger, while
the string epsilon will be (insofar as I can do it) slanted a bit.

Notice that the definition of "language" in section 2.2.1 is not suitable for natural language, where context determines whether something is acceptable or not, and where even then, there may be degrees of acceptability, and these may vary among different subcommunities, and even individuals, who speak a given langauge; morever, all of this changes (usually slowly) over time. For example, "yeah" is listed as a word in some dictionaries but not in others; perhaps it will gradually become more and more accepted, as have many other words over the course of time. At the level of syntax, English as spoken in black Harlem (a neighborhood of New York City) differs in some significant ways from "standard English" (if there is such a thing), in its lexicon, its syntax, and its pronunciation; this dialect has been studied under the name "Black English" by William Labov and others, who have shown that in some ways it is more coherent than so called "standard English".

It may be interesting to know that the first (known) grammar was given by Panini for Sanskrit, an sacred language of ancient India, more than 2,500 years ago. This work included very sophisticated components for phonetics, lexicon, and syntax, and was in many ways similar to modern context free grammars. The motivation for this work was to preserve the ancient sacred texts of Hinduism.

If we let R be the set of regular expressions over a finite alphabet A,
then the "meaning" or **denotation** for R is given by a function

[[_]] : R -> 2 ** A* ,where

`**`

indicates exponentiation (sorry about that - HTML is
lousy for formulae), so that `2 ** X`

indicates the set of all
subsets of `X`

, and `A*`

is the set of all finite
strings over `A`

. Notice that the semantics given by Stansifer for
regular expressions in Although there are clever ways that can give a compositional semantics the property that the meaning of a part depends on the context within which it occurs, in the sense of the other parts of which it is a part, a mathematical semantics for a programming language is not going to give us what we usually intuitively think of as the meaning of programs written in it. For example, the fact that a certain FORTRAN program computes a certain formula does not tell us that it gives the estimated yield of a certain variety of corn under various weather conditions, let alone that this figure is only accurate under certain assumptions about the soil, and even then it is only accurate to within about 5 percent. Yet all this is an important part of the meaning of the program.

**2.3** Stansifer's exposition of attribute grammars can seem pretty
difficult to understand. However, the special case of grammars with just
synthesized attributes is much easier; it can be explained without too much
notation, and it can also be illustrated rather simply. For instance, here is
a context free grammar for a simple class of expressions:

S -> E E -> 0 E -> 1 E -> (E + E) E -> (E * E)where

`N = {S, E}`

and `T = {0, 1, (, ), +, *}`

. We can
find the value of any such expression by using an attribute grammar with just
one (synthesized) attribute, `val`

, by associating the following
equations with the above rules:
S.val = E.val E.val = 0 E.val = 1 E1.val = E2.val + E3.val E1.val = E2.val * E3.valThen a parse tree for the expression

`(1 + 1) * (1 + 0)`

looks as
follows
S | E / | \ / | \ / * \ / \ E E / | \ / | \ E + E E + E | | | | 1 1 1 0where for simplicity the parentheses are left out. Then the synthesized attribute will percolate up the tree, by applying the equations from the bottom up, producing the values that are shown here

2 | 2 / | \ / | \ / * \ / \ 2 1 / | \ / | \ 1 + 1 1 + 0where we see that the final value is 2.

For another illustration, we can do the binary digits example in section 2.3.1 of Stansifer in a simpler way. The grammar here is

B -> D B -> DB D -> 0 D -> 1and the equations associated with these four rules are

B.pos = 0 B.val = D.val B1.pos = B2.pos + 1 B1.val = D.val *(2 ** B1.pos) + B2.val D.val = 0 D.val = 1It is now a good exercise to compute the value of the binary number

`1010`

by first writing its parse tree and then computing how the
values of the two (synthesized) attributes perculate up the tree.
We will see later than the parse trees of a context grammar form a very nice little algebra, such that the values of synthesized attributes are given by a (unique) homomorphism into another algebra of all possible values.

There is a slight inconsistency in Stansifer about whether In(S) should be
empty; my preference is that it need not be, because S can occur on the right
side of rules, as in S -> ASA on page 55. Stansifer is also not very
forthright about the evaluation of attributes; the fact is that it is possible
for a definition to be inconsistent, so that some (or even all) attributes do
not have values; however, it is rather complex to give a precise definition
for consistency. I also note that the diagrams for attribute evaluation are
*much* more effective if they are drawn in real time using several
colors; there are many other cases where live presentation works much better
than reading a book; this should be good motiviation for your coming to class!

**2.4** The first two occurrences of the word "list" in section 2.4.1
(page 62) should be replaced by "set". Stansifer uses the phrase "tally
notation" the notation that represents 0, 1, 2, 3, ... by the empty string, |,
||, |||, ..., but it is a variant of what we will later call Peano notion.
There is a typo on page 64, where it says that || + || = |||, i.e., 2 + 2 = 3!
For future reference, there is an OBJ version of
the Post system for the propositional calculus. Also, the production

xax --- xxxon page 65 is called an "axiom" but it isn't.

The "proof" on page 68 does not really deserve to be called a proof,
because it only sketches one direction and it completely omits the other
direction, which turns out to be much harder than what is sketched. It is
remarkable that the term "theorem" appears at different *three* levels:
(1) a theorem of the predicate calculus, i.e., some `x`

for which
`Th x`

is provable; (2) a theorem of the Post system for the
predicate calculus, which means a derivable term of that system, which
includes some terms of the form `Th x`

, others of the form ```
P
x
```

, etc.; and (3) a theorem of mathematics, the proof of which is
discussed in the previous sentence.

**2.5** I don't know why Stansifer is so dismissive about issues of
concrete syntax in this section; he should be pleased that he did such a good
job discussing them, and motivating various formalisms earlier in the chapter.

In algebraic terms, destructors are **left inverses** of constructors.
For example,

FirstOfBlock(Block(W1, W2)) = W1 SecondOfBlock(Block(W1, W2)) = W2

To CSE 230 homepage

To CSE 230 notes page

Maintained by Joseph Goguen

© 2000, 2001, 2002 Joseph Goguen

Last modified: Wed Feb 13 21:47:46 PST 2002