FINAL REVIEW - Exam will cover material covered in class + Some recall + Problem solving: both plug-and-chug and open-ended engineering questions - Compilers are just translators + A problem that can be stated as translating an input stream to an output stream might be amenable to compiler techniques: - lexing input - parsing input - semantic integrity (type) checking of input - output (code) generation + Examples: word processor, html processor, network packet manipulation - Compilers as an engineering discipline + Engineering creates things that are of use to people + Cost/benefit analysis is inherent - ``perfect'' artifact has infinite cost - many variables/dimensions to optimize + ``Failures'' drive engineering process: actual, predicted, imagined + Failure is inherent in ``improvement'' + Creative step is seeing how existing technology can be applied + Risks and responsibilities - Modularity + Hide changeable details within a module to localize future change + Common detail to hide is data structure: ADT's and classes + Modularity is easier said than done; requires diligence, experience + Example: Symbol table entries, variables, types - all entries of a symbol table have two things in common: name, scope - otherwise might be a type, a variable, function, etc. - one possibility is to put all the fields (members) for these into the symbol table entries. - but then changing type stuff means you're digging around variable stuff; changing lookup method requires digging around variables, etc. - symbol table entry is an association between a name and value, but what type of value is irrelevant, so keep value separate (point to it) - also, entities don't always have names (anonymous types), and names don't always have entities - can get the best of both worlds with OO (sort of) + basic STO class is a name and no value + subclasses get values of all different sorts + thus the special cases are all separated out, but all share common STO root - Notion of type + Set of values and operations on those values + Full notion of type includes behavior of those operations + ``Static'' notion of type only articulates required input-output rules of operations, not operation's (run-time) behavior. + Types are organized in a hierarchy (e.g., subrange < int < real) - coercion: automatically make lower value into higher value - overloading: reuse names of operations - assignability: just coercion, except when rules not consistent + Need a representation of type that permits: - equality comparison (structure vs. name) - retrieving types of subcomponents: element type, parameter, field + parameters accessed by position + fields accessed by name: like ST - subtyping, etc. - Static semantic checking (actions) + Lexer tokenizes, and thereby checks that all tokens are legal + Parser organizes tokens into structured form (tree), thereby checking that all structures are legal + Actions check syntactic (``static semantics'') issues not captured by compiler - relation between variables and operations - initializers are constants - exits in loops + Could do more checking with parser, but would be complicated and large - Attribute management + Two sources of attributes (values) during parsing - synthesized: values computed from children (e.g., :e1, :e2, etc.); values are propagated upward (via assignment to RESULT during parsing. - inherited: values computed from ancestors; values are passed downwards during parsing. + Main issue is that information at different places in the parse need to be brought together to be combined or compared for consistency. - Propagate ``loop'' to ``exit''; propagate IDList to TypeDecl. - Sometimes have choice of bringing one to other, or vice versa - Can also reorganize grammar to change I/S relationship; icky + LALR parsers don't support I.A. because there is no parent to inherit from, due to bottom-up nature: parent is LHS, which is created by reduction rule. + Easy to simulate, though, because most constructs have a tell-tale lead-off tag such as FUNCTION, LOOP, etc. - After seeing tell-tale construct, push attribute on a user-defined attribute stack especially designed for that attribute - While in the construct, can check attribute via top of stack - When exit construct, pop off attribute - Runtime systems + Advanced features available to compiler or programmer to simplify - stack - call sequence - memory management (malloc, free, garbage collection) - debugging - libraries (string manipulation, math) + Help bridge gap between source and target language - link in library code - added to program with emit - Syntax-Directed Code Generation + Recognition of construct during parse results in emitting code for that construct + Generate code on a rule-by-rule basis + A single emit cannot span two rules + Ordering issues + Language must be designed so that parse order and execution order are similar + All needed information must be available at time of parse - constructs must be defined before use - Compiler writing as model matching + We have two languages (machines), a program in high-level language + Want to create program in other (low-level) language + Languages are often radically different, and at different levels + Figure out how target language constructs can mimic source constructs - ask questions about source lang to identify its complexities + recursion, first class functions, backtracking, etc. - stack and call frames for procedure call - base + offset for variables names, etc. - load-load-compute-store for memory/register mapping - as idioms, these represent a higher-level version of target, sometimes called ``compiler writer's virtual machine'' + Need to respect tools available for the translation - parsing technology, grammar, symbol table, attributes, conventions - i.e., want source and target structures to match, but also need to be consistent with existing solution structure (or fix) + Our specific approach: feature/grammar driven - write source problem - write target solution - target solution needs to respect target's high-level model + call frame, stack, memory layout, base+offset - match source and target portions to grammar, STO's - redo target solution to fit grammar, if necessary - generalize example to emit routines - Architecture module + Architecture might change, independent of language, low-level + Module consists of - type information (sizes of objects) - names of entities (stack pointer, global pointer, frame pointer) - instruction names, probably emit routines - register names, maybe whole allocator - Optimization + Want to take programmer's elegant, inefficiently coded solution and make it really fast - but can't change algorithm - remove instructions - find cheaper implementations of instructions + Options (must try all to get most benefit) - find idioms in programmer's code and remove inefficiencies + e.g., repeated expressions - make good use of registers by keeping frequently used variables there - reorder instructions to avoid memory access stalls - exploit odd features of architecture + The desire for high-performance of translated code adds complexities - optimizing in several ways - need global information about program: many passes over code - pointers, function calls make more difficult + New concepts - intermediate languages: e.g., expression trees and three-address code + support multiple passes + good for opt: exposes optimization possibilities + independent of source and target, so can reuse + BB's and CFGs - data flow analysis + need to know relationships amongst statements to optimize - e.g., redundant computations + gather facts by propagating facts through graph - e.g., ``b * c is available in t1'' + improve performance - limiting to one function at a time (lose information, though) - collapsing graph - propagating all facts at once - using bit sets + Optimization process is translation, too, typical IL-to-IL - atomic optimizations combinable in any order - pattern --> application structure, where pattern is often "semantic" (pattern matching on derived data flow, dependences, etc.) - Complexity of engineering + 80/20 rules - The perfect solution is not the best solution, because of cost - Must figure out what is really important and do well on that - 80% of the benefit comes at 20% of possible cost - Last 20% of benefit costs remaining 80% of possible cost - In fact, perfect solution has infinite cost - Compromise different than poor work: tradeoffs are conciously managed + Solutions to new problems as they arise is not formulaic - Fitting new solutions into existing solution may cause problems + inconsistency, complexity, performance - Must look to all aspects of solution and problem to find ``best'' sol'n + change load/store model (addresses) + change grammar + more inherited attributes + extend symbol table