W. G. Griswold, D. C. Atkinson, C. McCurdy, ``Fast, Flexible Syntactic Pattern Matching and Processing,'' Proceeedings of the IEEE 1996 Workshop on Program Comprehension, March, 1996.

Copyright 1996 IEEE. Published in the Proceedings of the IEEE Fourth Workshop on Program Comprehension (WPC-96), March 29-31, 1996, Berlin, Germany. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works, must be obtained from the IEEE.


Abstract

Program understanding can be assisted by tools that match patterns in the program source. Lexical pattern matchers provide excellent performance and ease of use, but have a limited vocabulary. Syntactic matchers provide more precision, but may sacrifice performance, retargetability, ease of use, or generality.

To achieve more of the benefits of both models, we extend the pattern syntax of awk to support matching of abstract syntax trees, as demonstrated in a tool called tawk. Its pattern syntax is language-independent, based on abstract tree patterns. As in awk, patterns can have associated actions, which in tawk are written in C for generality, familiarity, and performance. The use of C is simplified by high-level libraries and dynamic linking. To allow processing of program files containing non-syntactic constructs, mechanisms have been designed that allow transparent matching in a syntactic fashion. So far, tawk has been retargeted to the MUMPS and C programming languages.

We survey and apply prototypical approaches to concretely demonstrate the tradeoffs. Our results indicate that tawk can be used to quickly and easily perform a variety of common software engineering tasks, and the extensions to accommodate non-syntactic features significantly extend the generality of syntactic matchers.