CSE 130: Principles of Programming Languages
Notes for Chapter 2 of Sethi (syntax, grammar, expressions)

Sethi first defines context free grammars using the notation of Backus-Naur forms (BNF), and then later also discusses extended BNF (EBNF) and syntax charts. These three formalisms are equivalent, in the sense that exactly the same set of languages can be defined using any of them as using the others; moreover, there are several other formalisms that are equivalent to these, including context free grammars in the sense of Chomsky, which replace the symbol ::= by ->, and do not use the | symbol. Moreover, there are several other kinds of grammar, including recursive grammars and Post systems, which are more expressive than context free grammars, in the sense that the set of languages that can be defined using them is strictly larger than the set that can be defined using context free grammars; and there are also grammars that are less expressive than the context free grammars, including regular expressions.

Perhaps it will help to be really formal about some of this, since Sethi is less formal. So here we go:-

Mathematically, a language is a set of (finite) strings, where a string is a sequence of letters, from another set that is usually called the alphabet of the language. Formally, we write A* for the set of all finite strings from the alphabet A, including the empty string, which we will usually write [] - but beware, because many other notations are used for it, including the Greek letter lambda and the Greek letter epsilon; Sethi uses the notation <empty>.

For programming, context free languages are especially important, and as remarked above, BNF is just one kind of grammar for defining these languages. Here is Chomsky's original version:

A context free grammar or CFG is a 4-tuple (N,T,P,S), where:

The language defined by (N,T,P,S) is the set of all terminal strings that can be derived using the rules, starting from S. The formal definition of derivation is a little complex, and is given in the next paragraph. If G is a grammar, then its language is usually denoted L(G).

Given a CFG G, let N be a nonterminal, p a production N -> r, and e some string in A*. Then we write

     e => e'  with p 
to indicate that some single instance of N in e is replaced by r, and we may also omit "with p" when the above holds for some p in P. Next, we write
     e =*=> e' 
when e => e1 => ... => en = e' for some n; this is called a derivation. Finally, we can now define
     L(G) = { t | S =*=> t and t in A*} . 
and we can make an important definition: A grammar G is ambiguous if there is an element of L(G) that has more than one derivation; this is the same as that element having more than one parse tree. Note that this is a syntactic notion of ambiguity, and that not all syntactic ambiguities lead to semantic ambiguities in programming languages (e.g., the syntactic ambiguity of a+b+c), though some do (e.g., the infamous dangling else ambiguity).

Sethi seems a bit confused about the notion of abstract syntax and in particular, the discussion of 4-2-1 on page 44 seems misleading, because it is actually impossible to get the subexpression 4-2 in the right associative grammar R. For our purposes, we can just define the abstract syntax of an expression to be just the intended parse tree for that expression. (It is possible to be more precise about this, but not worth the trouble for this course.)

Here is a "recipe" for eliminating ambiguities in infix operations: If there are N levels of precedence, use N+1 nonterminals, one for each level plus one for atoms (i.e., the smallest subterms); if an operation @ is left associative, use the production L -> L @ K, and if it is right associative, use the production R -> K @ R, where @ is any (binary infix) operator and K is the nonterminal for expressions having the next lower precedence from @.


Regular Expressions were invented by the logician Steven Kleene to describe a certain simple class of languages, the regular languages, which we now know are also those defined by regular grammars and accepted by finite state automata; the "*" operation in regular expressions is called the "Kleene star" in his honor. We can give the following BNF grammar for regular expressions:
    <rexp> ::= Ø
    <rexp> ::= (<rexp> <rexp>)
    <rexp> ::= (<rexp> + <rexp>)
    <rexp> ::= <rexp>*
    <rexp> ::= a                  for each a in A
where <rexp> is the nonterminal for regular expressions. In the first line, "Ø" is a Danish letter, standing for the empty set, as usual in mathematics. The second line has an implicit binary infix operator, that is usually called juxtaposition or concatenation. The last line is really an abbreviation for a finite number of lines, one for each "letter" (or terminal symbol) in the alphabet A. It is usual to leave out the parentheses when they can be inferred by the associativity of + or of justaxposition, or by assuming a stronger binding for juxtaposition than for +. For a simple example of a regular expression,
    (a b)(a b)* 
defines the language consisting of finite nonempty strings of ab. Here is another example,
    (a* b)(a* b)* + (b* a)(b* a)* 
defining the language of nonempty strings of b's with some number (possibly zero) of a's in front of each b, and of nonempty strings of a's with some number (possibly zero) of b's in front of each a.

Many computer scientists' ideas about natural language seem to have been influenced by concepts like the above, which take a formal view along the lines Noam Chomsky, rather than that of linguists who study what real people actually write and say. An important point is that in natural language, context determines whether or not something makes sense; formal syntax and semantics are very far from adequate, and indeed, the distinction among syntax, semantics and pragmatics does not hold up under close examination. On the other hand, the formal linguists' way of looking at syntax and semantics works well for programming languages, because we can define things to work that way, and because traditionally, programming language designers want to achieve as much independence from context as possible (though this might change for future computer systems!).

It follows that the above definition of "language" is not suitable for natural language, where context determines whether something is acceptable or not, and where even then, there may be degrees of acceptability, and these may vary among different subcommunities, and even individuals, who speak a given langauge; morever, all of this changes (usually slowly) over time. For example, "yeah" is listed as a word in some dictionaries but not in others; perhaps it will gradually become more and more accepted, as have many other words over the course of time. At the level of syntax, English as spoken in black Harlem (a neighborhood of New York City) differs in some significant ways from "standard English" (if there is such a thing), in its lexicon, its syntax, and its pronunciation; this dialect has been studied under the name "Black English" by William Labov and others, who have shown that in some ways it is more coherent than so called "standard English".

It may be interesting to note that the first (known) grammar was given by Panini for Sanskrit, a sacred language of ancient India, more than 2,500 years ago. This work included very sophisticated components for phonetics, lexicon, and syntax, and was in many ways similar to modern context free grammars. The motivation for this work was to preserve the ancient sacred texts of Hinduism.

If we denote the empty set by "{}" and the empty string by the empty character, then there will be no way to tell the difference between the empty set and the set containing the empty string. So this is a bad idea, and it helps to explain why we use [] or maybe the Greek letter epsilon for the empty string, and Ø for the empty set. I am afraid you will have to get used to there being a lot of notational variantion in mathematics, just as there is a lot of notational variation for programming languages; in fact, I am afraid that notational variation will be an ongoing part of the life of every computer science professional for the foreseeable future, so that developing a tolerance for it should be part of your professional training.


A favorite question of philosophers is "What is meaning?" and a favorite way to try to answer it is to provide a denotation for some class of sentences; this approach is called denotational semantics. As one simple example, we can define the "meanings" or denotations of regular expressions with a denotation function, which maps the expression to the language that it defines. If we let R be the set of regular expressions over a finite alphabet A, then denotation for R is given by a function
     [[_]] : R -> P(A*) , 
where A* indicates the set of all finite strings over A and P(A*) indicates the "power set" of A*, i.e., the set of all subsets of A (another notation uses exponentiation, so that 2 ** X indicates the set of all subsets of X). Then we can actually define this denotation function recursively as follows:
     [[Ø]]      =  Ø
     [[E E']]   = [[E]] º [[E']]
     [[E + E']] = [[E]] U [[E']]
     [[E*]]     = [[E]]*
     [[a]]      = {a}              for each a in A
where the operations º and * on sets of strings are defined as follows:
      A º B = { ab | a in A  and  b in B }  
      A* = { [] } U { a1 a2...an | ai in A, n > 0 }  
Using all these equations, we can compute the denotation of any regular expression, just by applying them recursively until all [[_]] pairs have been eliminated. Notice that this semantics is compositional, in the sense that the denotation of each part is "composed from" (i.e., computed from) the denotations of its parts. For example,
      [[ (a b) (a b)* ]] = [[ (a b) ]] [[ (a b)* ]]
                         = ([[ a ]] [[ b ]]) [[ (a b) ]]* 
                         = ({ a } { b }) ({ a } { b })*
                         = { ab } { ab }* 
                         = { (ab)n | n > 0 } 

Although there are clever ways to give a compositional semantics the property that the meaning of a part depends on the context within which it occurs, in the sense of the other parts of which it is a part, a mathematical semantics for a programming language is not going to give us what we usually intuitively think of as the meaning of programs written in it. For example, the fact that a certain FORTRAN program computes a certain formula does not tell us that it gives the estimated yield of a certain variety of corn under various weather conditions, let alone that this figure is only accurate under certain assumptions about the soil, and even then it is only accurate to within about 5 percent. Yet all this is an important part of the meaning of the program.

For another example, just a little more complex, here is a context free grammar G for a very simple class of expressions:

     E -> 0
     E -> 1
     E -> (E + E)
     E -> (E * E) 
where N = {E} with E as the start symbol, and T = {0, 1, (, ), +, *}. Note that this grammar is ambiguous. We can define a denotation function
     [[_]] : L(G) -> NAT 
for these expressions, where NAT is the set of natural numbers, with the following:
     [[ 0 ]] = 0
     [[ 1 ]] = 1
     [[ E + E' ]] = [[ E ]] + [[ E' ]]
     [[ E * E' ]] = [[ E ]] * [[ E' ]]
From this, we can compute, for example, that
     [[ (1 + 1) * (1 + 1) ]] = 4  


Attribute grammars were invented by Don Knuth. They provide another way to give meanings to expressions. Although they can be pretty difficult to understand, the special case of grammars with just "synthesized attributes" is much easier, and can be explained without too much notation, and can also be illustrated rather simply. For example, we can define the values of expressions over the above grammar. But first, it is convenient to add to the grammar one non-terminal, S, so that now N = {S, E}, and one more production, S -> E. The attribute grammar has just one (synthesized) attribute, val, and associates the following equations with the productions of the grammar:
     S.val = E.val
     E.val = 0
     E.val = 1
     E1.val = E2.val + E3.val
     E1.val = E2.val * E3.val 
Then a parse tree for the expression (1 + 1) * (1 + 0) looks as follows
              S
              |
              E
            / | \
           /  |  \
          /   *   \
         /         \
        E           E
      / | \       / | \
     E  +  E     E  +  E
     |     |     |     |
     1     1     1     0  
where for simplicity the parentheses are left out. Then the synthesized attribute will percolate up the tree, by applying the equations from the bottom up, producing the values that are shown here
              2
              |
              2
            / | \
           /  |  \
          /   *   \
         /         \
        2           1
      / | \       / | \
     1  +  1     1  +  0  
where we see that the final value is 2.

For another illustration, we can do the semantics of binary digits; the grammar here is

    B -> D
    B -> DB
    D -> 0
    D -> 1 
and the equations associated with these four rules are
    B.pos = 0              B.val  = D.val
    B1.pos = B2.pos + 1    B1.val = D.val *(2 ** B1.pos) + B2.val
    D.val = 0
    D.val = 1 
It is now a good exercise to compute the value of the binary number 1010 by first writing its parse tree and then computing how the values of the two (synthesized) attributes perculate up the tree.


To CSE 130 homepage
To CSE 130 notes page
Maintained by Joseph Goguen
© 2000 - 2006 Joseph Goguen
Last modified: Thu Jan 19 16:08:35 PST 2006