Spring 2006 CSE 141L
Project/Computer Architecture
Though the description is written for GNU C/C++ flex tool, the Java version jflex is almost identical, and you should not have any trouble adapting the description for Java.
definitions%%
rules
%%
user code
Definitions allow one to assign regular expressions to some name, for use in the rules section. One can also place regular expression in the rules too. The following are examples of definitions, where the name DIGIT is assigned to a regular expression matching one number digit, and the name ID is assigned to a regular expression that matches variables that start with a letter followed by any combination of letters and numbers. There is a description of regular expressions in the following section.
|
DIGIT |
[0-9] |
|
ID |
[a-z][a-z0-9]* |
Rules have the form
|
Pattern |
Action |
where pattern is a regular
expression or name, followed by the action which is some code to execute if the
pattern matches. The following example on the name ID match, prints what it
finds, and returns a number 5. This return value tells the caller to the lexer
that it found an ID token. If several patterns match a string input, flex takes
the longest match. If they are still equivalent, it uses the first matching
rule in the rules specification.
|
{ID} |
{
printf("found a ID token %s\n", yytext); return 5; } |
is equivalent to
|
[a-z][a-z0-9]* |
{
printf("found a ID token %s\n", yytext); return 5; } |
The lexer is called up from
another module via the interface "int yylex();". Yylex contains the code
for rules. It assumes that the input text file to read is passed through
"FILE* yyin". When the lexer reaches the end-of-file of yyin, yylex
returns a 0.
The rules section has other features such as supporting predicating the string matching, input stream look ahead, and manipulating the input stream. We will not cover this.
When flex runs on the source file, it generates C code (compatible with C++) to perform the parsing. Code in the user code section is copied verbatim to the output file. Similarly code found in the definitions and rules section surrounded "{%" and "%}" or as indented text is copied verbatim to the output file. This generated output of flex is by default called lex.yy.c, though this can be specified from the command line. In our example file, another module calls up the scanner in lex.yy.c, to perform analysis (dump out strings found).
Regular Expression
x
match the character `x'
.
any character (byte) except newline
[xyz]
a "character class"; in this case, the pattern matches either an `x', a `y', or a `z'
[abj-oZ]
a "character class" with a range in it; matches an `a', a `b', any letter from `j' through `o', or a `Z'
[^A-Z]
a "negated character class", i.e., any character but those in the class. In this case, any character EXCEPT an uppercase letter.
[^A-Z\n]
any character EXCEPT an uppercase letter or a newline
r*
zero or more r's, where r is any regular expression
r+
one or more r's
{name}
the expansion of the "name" definition (see above)
r|s
either an r or an s
The source code is located
on the following pages:
C/C++
Makefile,
sample_lex.l,
scanner.C,
token.h
Java uses jflex
Makefile,
sample_flex.flex,
Jscanner.java,
Token_type.java
Test input: test.s
The java flex compiler jflex
is not installed on the ACS computers. You either have to add the path
/home/linux/ieng6/cs141s/public/jflex to your PATH, or use the full path
/home/linux/ieng6/cs141s/public/jflex/jflex.
Download code to a
directory. To build, run make on the respective Makefile.
> make
To run C/C++, run as
> scanner <input text
file>
To run Java, run as
> java Jscanner <input
text file>