Syntax-Directed Type Checking (with example using check 1) - So how do we turn these declarative type rules into code that checks an actual program? - Basic idea is to augment declarative syntax rules with executable semantic checks LeftHandSide ::= RightHandSide1:rhs1 RightHandSide2:rhs2 ... {: RESULT = obj.StaticSemanticCheck(rhs1,rhs2,...); :} or more specifically Expr ::= Expr:e1 Op:op Expr:e2 {: RESULT = BinaryExpr.action(e1, op, e2); :} + note that details of action are hidden in separate class/method; give check a good name, so can tell what it does at a high level. + invoking off of class BinaryExpr directly is a "design pattern" called "singleton": there is just one BinaryExpr "object"; no object is allocated; we cheat by declaring all fields and methods static. It is slightly preferable stylistically to allocate a single BinaryExpr object in main and invoke off a real object. Why? + can anyone imagine a different class structure for invoking action? + the singleton class BinaryExpr also exhibits the design pattern "facade"; loosely speaking it is a layer of abstraction that provides a convenient interface to a complex set of abstractions (in our case, type elaboration, type checking, error reporting, and code gen). - Challenge 1: relating syntax to semantics + generic problem-solving process: structure matching - identify problem (domain) structure + paraphrase check in language of infrastructure (CUP + code) - match to solution infrastructure + CUP rules and actions + information available from ST, VarSTO's, types - build on solution infrastructure + add or extend CUP actions + extend STO's, types + additions should be consistent with existing code - regularity, conceptual integrity + Note that domain and infrastructure match nearly identically with respect to syntax-directed concept: (a) syntax implies semantics in spec, (b) syntax implies actions in CUP grammar + specific method: syntax-directed type checking (and later, code gen); 1. relate operational grammar (E12 + E13) to abstract syntax (E + E) 2. for each piece of syntax, establish semantic rules 3. write a class connecting syntax to semantics 4. write class/method invocation into parser actions - the less helpful (more ambiguous) syntax is, the more tricky the check is (e.g., function call and array ref look the same) + analogous structure of problem and solution eases coding, testing, debugging, and understanding + Note this is one of many instances of ``engineering principles'' that have wide applicability, but gain power from adding domain-specific knowledge in applying the principle - By taking advantage of syntax-directed type-checking, we can treat operations on values to be operations on types. Thus, ``y[i]'' in program execution is a way of retrieving a value from an array. But during parsing (next execution), in terms of types, the subscript accesses the first component of the type ``array(int, range(0,19))'', thus yielding ``int'' (this is an application of the mimickry principle described in the first-day handouts). runtime parse-time program syntax-directed execution typing and type checking ------------- ------------------------ y Ty = typeof(y) i Ti = typeof(i) y[i] RTy = rangeof(Ty) assignable?(RTy, Ti) (* type check, not range check! *) elemtypeof(Ty) - recall that we cannot check if i is in the actual range of y until runtime - it is this view of things that has us saying things like ``int plus real equals real''. - Challenge 2: So many things to check, so many rules to add code to! + Even if we do a ``small language'' like this, it's a lot! P ::= D ';' S D ::= D ';' D D ::= ID ':' T T ::= integer | real | boolean E ::= ID | INTLIT | REALLIT | BOOLLIT E ::= E '+' E E ::= E '<' E S ::= S ';' S S ::= ID ':=' E S ::= IF E THEN S + Approach part A: INCREMENTAL syntax-directed software development - One type check at a time - One syntax rule at a time - One type at a time - Experience with early pieces lends insight that makes later pieces easier. + Approach part B: ``Coding to the specifiction'' - Write code that looks/sounds like the spec - Also incremental by writing from ``top to bottom'' - Example: check 1, plus operator, integers only 1. Read the spec 2. Identify relevant rule and abstract behind facade: Expr ::= Expr:e1 AddOp:op Expr:e2 {: RESULT = BinaryExpr.action(e1, op, e2); :} 3. Paraphrase action in pseudo-code if type of e1 and e1 is numeric then type of Expr is smallest numeric type including the types of both operands else error (numeric expected) + A lot going on here, so let's break down into pieces, top-down. Each function should do just one thing: separate checking from rest of action (like code generation, to come later) // In BinaryExpr VarSTO action(STO a, int op, STO b) { VarSTO result = new VarSTO(); /* don't know type yet, give fake name */ if (!BinaryExpr.elaborate(result, a, op, b)) oberon.stopCodeGeneration(); return result; } boolean elaborate(VarSTO result, STO a, int op, STO b) { if (a.isNumeric && b.isNumeric()) { result.type = a.smallestNumericType(b); return true; } else { oberon.reportError(ErrorMsg.error1n_Expr, a.isNumeric() ? b.type() : a.type()); result.type = ErrorType.computeTypeOnError(TypeValues.intType); return false; } } + So what we've done here is write the ``top level'' code to look like the spec 1. We don't have implementations for any of these methods, so we have to go implement them, but ONLY TO WORK FOR INTEGERS // first method delegates to the second boolean VarSTO::isNumeric() { return m_type.isInt(); } // type field boolean IntType::isInt() { return true; } // type itself - Why not just store a type flag in VarSTO? Types are complex. Types are things. There are many variables, but only one type called integer. Things should be made into classes. This will let us handle all sorts of Oberon types in the future. 2. Whenever I get "stuck", I call a high-level method and then figure out how to implement it later. If I change my mind later, I change the method implementation later. EXAMPLE: must compute type when there's an error so compiler can keep executing. So, compute a fake type. I have no idea how to do this (spec is silent on this), so I delegate to something that should know, a new class called ErrorType. I pass it a hint to help it out. 3. What is the TypeValues.intType thing? It's a way of doing globals in Java: // class full of static/final (constant) fields class TypeValues { public static final IntegerType intType = new IntegerType(); // more to come! } - These also come in handy in passing around types in grammar: BasicType ::= T_INTEGER {: RESULT = TypeValues.intType; :} | ... - Elsewhere, I use statics to get access to the symbol table and error printer. + Why don't we put elaborate in Type? - Consider now adding *, -, REAL type - Operators *, - come for free. (Same rules.) Yay! - REAL is fairly easy, too: boolean IntType::isReal() { return false; } boolean RealType::isReal() { return true; } boolean RealType::isInt() { return false; } But to use the two interchangeably, we need some superclasses: class Type { /* all the above methods and more */ } class BasicType extends Type { /* maybe isBasicType method */ } class IntType extends BasicType { ... } - Now consider "/" + Result type is now always REAL! + Similar problems will arise for DIV and MOD (isInt() not isNumeric()) - Abstractions need enhancing (result of working incrementally) + Will require lots of change later for handling of other operators + Not reusable, either + Type computations should be hidden in the type module in operations + Use the language of the Oberon manual to guide us. Here is a snippet from the manual (note bold words): ---------------------------------------------------------------------- Expression COMPATIBLE: operator first operand second operand RESULT TYPE + - * numeric numeric smallest numeric type including both operands / numeric numeric REAL DIV MOD integer integer integer ---------------------------------------------------------------------- + What we see here is that compatibility and return type are triggered off of what operator is being used. - Since each operator has unique rules (*, /, MOD each different), might as well encode these differences in an object-oriented style. - Whereas we began by using terms like "numeric" and "smallest numeric", we see that those don't work for /, DIV, and MOD. - ``expression compatible'' and ``result type'' are key terms; these *category* names do work. That leads to this design: VarSTO action(STO a, String op, STO b) { // op NOW STRING!!! VarSTO result = new VarSTO(); OpSTO opSTO = (OpSTO) symTab.lookup(op); if (!BinaryExpr.elaborate(result, a, opSTO, b)) { ... } ... } boolean elaborate(VarSTO result, STO a, OpSTO op, STO b) { if (opSTO.isCompatible(a, b)) // a, b in ParamList later return opSTO.resultType(a, b); else { opSTO.reportError(a, b); // Error also dependent on operator ... } } + Why do we put the op lookup in action rather than elaborate? - Look at how much cleaner the code is! It reads like the spec! - Recall that in the discussion of types that we observed that operators are no different *semantically* from functions/procedures; once we have parsed the input, we can take advantage of this; we need only insert all the predefined operators in the symbol table and attach their types to them so that they are accessible. - This requires changes in other parts of the compiler: + Grammar: AddOp ::= T_PLUS {: RESULT = "+"; :} | ... + Symbol table addition (I did this in OberonParser): m_symtab.insert(new OpSTO("+", plusOpType)); + Or bypass the string/lookup and just return the proper opSTO: AddOp ::= T_PLUS {: RESULT = TypeValues.plusOpSTO; :} | T_MINUS {: RESULT = TypeValues.minusOpSTO; :} ... + Types and STO's for operators!!! class OpSTO extends STO { public boolean isCompatible(STO a, STO b) { return ((OpType)m_type).isCompatible(a.type(), b.type()); } ... class PlusOpType extends OpType { public boolean isCompatible(Type a, Type b) { return a.isNumeric() && b.isNumeric(); // Moved old code! } ... + The way these "specficiation level" operations get designed and implemented is by delegation, as before. In particular, we have OpSTO's that are subclass of ProcSTO, and OpType's that are a subclass of ProcType. // note that the body is just code from before boolean OpSTO::compatible(STO a, STO b) { return this.type().compatible(a.type(),b.type()); } Type OpSTO::resultType(STO a, STO b) { return this.type().resultType(a.type(),b.type()); } // note that the body is just code from before boolean PlusOpType::compatible(Type a, Type b) { return (a.isNumeric() && b.isNumeric()); } Type PlusOpType::resultType(Type a, Type b) { return a.smallestNumericType(b); } + Note now that we've come up with a *design* that is fairly sound, but with an incomplete implementation (ints, reals, a few operators only). That's OK: - The design becomes the vessle into which we pour our implementation. - We're incrementally accruing points on the project. A Note on Handling Type Errors (Three approaches) - NOTE: to make type names easy to print, overload toString(), which is Java's way of treating objects like strings in string operations: class IntType { ... public String toString() { return "INTEGER"; } ... - Two checking issues: avoiding spurious error messages and not "pulluting" compiler code with lots of checks for error conditions. + don't have a result type for expr when its type check fails, so consuming expressions barf + don't want to be checking all over your compiler for a bad-type situation. It's hard to maintain - Recommendation: make an "error" type called ErrorType and have an instance of it returned from elaborate: result.putType(ErrorType.computeTypeOnError(Hint)); which means defining computeTypeOnError to operate as follows: class Error { ... public static Type computeTypeOnError(Type hint) { return TypeValues.errorType; } ... } NOTE: this use of a static method is called a "factory". It makes things, and controls what it makes according to what parameters are passed. It is like a constructor, but can make a variety of things. - Question: How should your compiler treat a variable with type ErrorType? + Always treat as a bad type, and continue reporting errors? This is bad because you get a cascade of errors, each reporting a bad type "ERROR", which the compiler user has no idea about! + Always treat as a "good" type and hence suppress further errors. This is generally good. We can get this effect by how we write the ErrorType class: class ErrorType extends Type { boolean isError() { return true; } boolean isInt() { return true; } boolean isReal() { return false; } // dicey, explained below boolean isBool() { return true; } ... String toString() { return "ERROR"; } } By returning TRUE for most type queries, when an ErrorType is returned from the resultType() query or the compatible() check, ErrorType behaves "nicely" in type queries without having to add "Error" code all over the place. Note that this is very hard to implement if one is using instanceof or type codes to check types! + Possible catch: Because arithmetic operations compute result type of "smallest numeric type including the operand types", it returns FALSE for isReal() to ensure the smallest type is really returned, rather than forcing it to the larger REAL type and then causing a problem with assignment to an integer: VAR x, y : INTEGER; VAR b : BOOLEAN; BEGIN x := (x + b) + y END. Here, (x + b) causes an error to be reported and ErrorType to be returned. Then INTEGER type is computed for the overall expression, etc. + So note that: 1. ErrorType gets "morphed" into an actual type in compound expressions. This is OK as long as it doesn't get morphed to a "bigger" type like REAL. 2. Although there is no cascade of "bad type error" messages, more than one error can be reported in an expression if there is more than one real error: x := (x + b) + y + b Using this approach, the second "b" reference will trigger a second error, since the originally computed ErrorType will be morphed into an INTEGER by the time the second b is encountered.