TYPE CHECKING (STATIC SEMANTIC CHECKING, Chapter 6) Type checking is one kind of ``static'' (compile time) check int x, y; float z; z = x + y; /* + takes ints, int assignable to real */ x = z + y; /* + takes float/int, but float not assignable to int */ - Others include checking "break" appears in loop, redeclaration of variable - Static checks are important because the checking guarantees that such errors will not occur at runtime, avoiding catastrophic failures. - For the above additions, the grammar rule looks something like: Expr ::= Expr Op Expr {: /* Type check goes here */ :} What do we put in the braces to do the type check? But, what is a *type*, anyway? - A representation (an integer is a contiguous sequence of 32 bits) + this definition requires an interpretation of the bits, such as value(I) = sum_i I[i]*2^i - But what does a representation *mean*? - Ultimately, type denotes BEHAVIOR + humans, cars, cats, dogs, foxes, class objects - Divides into *static* and *dynamic* behavior, for purposes of type checking and compilation + static is what can (generally) be done at compile time + dynamic is what can (only) be done at run-time + can statically check x/y is type-correct, but cannot check for divide-by-0 until runtime. - If we know the types of variables and operations, then we can check them for consistency and generate (efficient) code for them - Of course, to compute on values, we need a representation for them, and we need operations that operate on that representation such that they are consistent with our concepts of the values and the operations on them: 2 + 3 --> 10 binary+ 11 | | | | V V 5 <-- 101 A type system, then is a set of rules about how values, variables, and operators can be legally combined. - Consistency (no contradictions): x: int = TRUE; /* set x to 1? */ - ``integer variables can only be assigned integer values'' - Lack of ambiguity (exactly one interpretation) x + 1.0 /* int or real? */ - ``if one operand integer, and the other is real, then result is real'' - A program written using a ``sound'' type system can be completely type checked at compile time (i.e., ``static''), but such a type system is not very useful: class ASCIITable { public static void main(String args[]) throws IOException { String asciiTable[256] = { ... "a", "b", "c" ... }; int charNumber = System.in.read(); String c = asciiTable[charNumber]; /* charNumber in bounds? */ System.out.println(c); } } This question can only be answered by a dynamic type check. Most languages check these bounds for you and kill the program if the answer is out of bounds. C and C++ don't even check! A language with a sound type system is called strongly typed, although we often use the term to mean ``as strongly typed as is reasonable given the current technology.'' Type expressions: atomic (basic) or compound (constructed) - Since we have a need for a large variety of types, need type constructors - Basic C/C++ types: int, float, char - Constructed C/C++ types: int *px, int a[10] + Also known as composite types or compound types - Constructed types are created from type expressions + atomic types + composition operators: *, [], struct - declarations int x; int a[10]; float f(int a, int b); struct { int a; int b; }; int *px; - even some executable code f(1, x); /* argument list to function call */ - Unlike arithmetic expressions, these are usually evaluated at compile time + actually varies widely depending on language (C and Pascal conservative) - These declarations are statements about sets of values and the operations on them: + an array "int A[2]" stands for the set of all possible pairs of integers: { ... (0,0), (0,1), (1, 0), (1, 1), ...} + any array has the subscript operation: A[1] + there are rules about the use of subscripting: A can only be subscripted by 0 and 1. + and of course there is an underlying representation: 2 consecutive 32 bit integers; the left one is the first integer, the right one is the second integer. - To check typing issues, we need to represent type exprs (in our compiler). How? + How about an operational representation, since we have type operators and type constants: array(int, range(0,9)) function(paramlist(int, int), float) - this type denotes the set of all functions that take two int's and return a float record(int a, int b) pointer(int) paramlist(int, int) + Operators are *values* that have types, too: integer +'s is type: function(paramlist(int, int), int) real +'s is type: function(paramlist(real, real), real) What is the type of integer / (divide)? + These type representations give us a formal basis for comparing types: two types are equivalent if they have the same structure. - for some definition of ``same'' - this is useful for type checking, when the type of an argument to an operation has to be equivalent to the type of the parameter declaration + Another way to think of type expressions are as trees: array / \ int range / \ 0 9 But this is just another way of drawing the parenthesis stuff above, where a downward edge means "argument of" or "part of": array(int, range(0,9)) + The tree view suggests representing a type expression as a class object (tree edges are pointers to children/fields). The root of the tree is a composite type object, and the children are pointers to other type subexpression objects. - Don't necessarily need subclasses for all the different kinds of types (e.g., int a subtype of real) - It's probably OK to have subclasses for basic and composite. - What kind of operations (methods) will be defined for this type? (type field access, type root query, equivalence, assignability) class ArrayType : CompositeType { // the pointers to children private Type elementType; private RangeType range; // accessors public } + The type of an identifier or expression is often called a *signature*. There is also sometimes a special syntax, especially for function symbols: f: int X int --> float +: int X int --> int + OK, what are the types (i.e., type signatures) of the following expressions, based on the program: [1] int i, x, y[20]; [2] float z; [3] int f(int a, int b); [4] z = (x + y[i]) * f(i, x); [5] x = z + y[i]; i y[i] f + (line 4) + (line 5) (More on this later) Specific hints for what you can do now: - Separate actions from parser + Grammar is one design decision, actions are another + Improves readability and speeds compilation Expr ::= Expr:e1 Op:op Expr:e2 {: RESULT = BinaryExpr.action(e1, op, e2); :} - Extend parser to allow passing STO's as values + modify type on "non terminal" Expr, Designator, and Factor declarations to be STO + Have expression rules, etc. pass around STO's - This STO will often be a variable with a type + for expressions, pass an ExprSTO that is a ``fake'' variable with a type, because later it will be a variable: x := y + w * z t1 := w * z t2 := y + t1 x := t2 + for now, just think of passing a variable as an ``easy'' way to pass a type - Create a way to represent types and encapsulate it into a class or module + sufficient for integer, real, boolean + extensible to composites + methods for constructing objects of each type + methods for creating, accessing components, and comparing (equal and assignable) - Extend STO's to support types + add type field, so variables can have a type + add ops: get type (Type type()) and set type (void setType()) - Create a unique instance of each basic type and make it easily accessible + what you do here actually depends on your grammar; if your grammar looks like: TypeSpec ::= INTEGER | REAL | ... you can just store the type objects for int and real in global variables: TypeSpec ::= INTEGER {: RESULT = TypeValues.IntegerType; :} | REAL ... (a public static class field is essentially a global) + or include type objects in symbol table as top-level pervasive; this allows all types to be treated the same (no special cases): TypeSpec ::= ID:id {: TypeSTO typeSTO = symTab.lookup(id.string); // "INTEGER" in symTab if (typeSTO != nil) RESULT = typeSTO.type(); else error... :} + or do both, to make structure and contents of symbol table match definition of symbol space in language, and still accessible by globals - Write test cases + Syntax-directed type checking: one or more tests per rule in handout Type rules for Oberon integers, reals, and booleans (and trivially, strings) Numeric types The INTEGER and REAL types Same types Two variables a and b with types Ta and Tb are of the same type if 1. Ta and Tb are both denoted by the same type identifier Type inclusion Numeric types include (the values of) smaller numeric types according to the following hierarchy: REAL >= INTEGER Assignment compatible An expression e of type Te is assignment compatible with a variable v of type Tv if one of the following conditions hold: 1. Te and Tv are the same type; 2. Te and Tv are numeric types and Tv includes Te; Expression compatible For a given operator, the types of its operands are expression compatible if they conform to the following table (which shows also the result type of the expression). Character arrays that are to be compared must contain 0X as a terminator. operator| first operand second operand result type --------+--------------------------------------------------------- + - * | numeric numeric smallest numeric | type including | both operands / | numeric numeric REAL DIV MOD | integer integer integer OR & ~ | BOOLEAN BOOLEAN BOOLEAN = # < | numeric numeric BOOLEAN = # | BOOLEAN BOOLEAN BOOLEAN - There is a difference between type sameness and assignment compatibility: integer can be assigned to real, but integer is NOT type-equivalent to real + Check for assignment compatibility: - When doing "statement -> Ref ASSIGN Expr" - When doing pass-by-value parameters - When checking the initialization types of variables and constants + Check for type equivalence/sameness (more on these later): - When checking var parameters - Rules for individual operators in grammar Syntax-Directed Type Checking (with example using check 1) - So how do we turn these declarative type rules into code that checks an actual program? - Basic idea is to augment declarative syntax rules with executable semantic checks LeftHandSide ::= RightHandSide1:rhs1 RightHandSide2:rhs2 ... {: RESULT = obj.StaticSemanticCheck(rhs1,rhs2,...); :} or more specifically Expr ::= Expr:e1 Op:op Expr:e2 {: RESULT = BinaryExpr.action(e1, op, e2); :} + note that details of action are hidden in separate class/method; give check a good name, so can tell what it does at a high level. + invoking off of class BinaryExpr directly is a "Design Pattern"* called "Singleton": there is just one BinaryExpr "object"; no object is allocated; we cheat by declaring all fields and methods static. It is slightly preferable stylistically to allocate a single BinaryExpr object in main and invoke off a real object. Why? *``Design Patterns: Elements of Reusable Object-Oriented Software'' by Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides, published by Addison-Wesley, 1994. (An excellent reference) + can anyone imagine a different class structure for invoking action? + the singleton class BinaryExpr also exhibits the design pattern "facade"; loosely speaking it is a layer of abstraction that provides a convenient interface to a complex set of abstractions (in our case, type elaboration, type checking, error reporting, and code gen). - Challenge 1: relating syntax to semantics + generic problem-solving process: structure matching - identify problem (domain) structure + paraphrase check in language of infrastructure (CUP + code) - match to solution infrastructure + CUP rules and actions + information available from symbol table, STO's, types - build on solution infrastructure + add or extend CUP actions + extend STO's, types + additions should be consistent with existing code - regularity, conceptual integrity + Note that domain and infrastructure match nearly identically with respect to syntax-directed concept: (a) syntax implies semantics in spec, (b) syntax implies actions in CUP grammar + specific method: syntax-directed type checking (and later, code gen) 1. relate operational grammar (E12 + E13) to abstract syntax (E + E) 2. for each piece of syntax, establish semantic rules 3. write a class connecting syntax to semantics 4. write class/method invocation into parser actions - the less helpful (more ambiguous) syntax is, the more tricky the check is (e.g., function call and array ref look the same) + analogous structure of problem and solution eases coding, testing, debugging, and understanding + Note this is one of many instances of ``engineering principles'' that have wide applicability, but gain power from adding domain-specific knowledge in applying the principle - By taking advantage of syntax-directed type-checking, we can treat operations on values to be operations on types. Thus, ``y[i]'' in program execution is a way of retrieving a value from an array. But during parsing (next execution), in terms of types, the subscript accesses the first component of the type ``array(int, range(0,19))'', thus yielding ``int'' (this is an application of the mimickry principle described in the first-day handouts). runtime parse-time program syntax-directed execution typing and type checking ------------- ------------------------ y Ty = typeof(y) i Ti = typeof(i) y[i] RTy = rangeof(Ty) assignable?(RTy, Ti) (* type check, not range check! *) elemtypeof(Ty) - recall that we cannot check if i is in the actual range of y until runtime - it is this view of things that has us saying things like ``int plus real equals real''. - Challenge 2: So many things to check, so many rules to add code to! + Even if we do a ``small language'' like this, it's a lot! P ::= D ';' S D ::= D ';' D D ::= ID ':' T T ::= integer | real | boolean E ::= ID | INTLIT | REALLIT | BOOLLIT E ::= E '+' E E ::= E '<' E S ::= S ';' S S ::= ID ':=' E S ::= IF E THEN S + Approach part A: INCREMENTAL syntax-directed software development - One type check at a time - One syntax rule at a time - One type at a time - Experience with early pieces lends insight that makes later pieces easier. + Approach part B: ``Coding to the specifiction'' - Write code that looks/sounds like the spec - Also incremental by writing from ``top to bottom'' - Example: check 1, plus operator, integers only 1. Read the spec 2. Identify relevant rule and abstract behind facade: Expr ::= Expr:e1 AddOp:op Expr:e2 {: RESULT = BinaryExpr.action(e1, op, e2); :} 3. Paraphrase action in pseudo-code if type of e1 and e1 is numeric then type of Expr is smallest numeric type including the types of both operands else error (numeric expected) + A lot going on here, so let's break down into pieces, top-down. Each function should do just one thing: separate checking from rest of action (like code generation, to come later) // In BinaryExpr ExprSTO action(STO a, int op, STO b) { ExprSTO result = new ExprSTO(); /* don't know type yet, give fake name */ if (!BinaryExpr.elaborate(result, a, op, b)) oberon.stopCodeGeneration(); return result; } boolean elaborate(ExprSTO result, STO a, int op, STO b) { if (a.isNumeric && b.isNumeric()) { result.setType(a.smallestNumericType(b)); return true; } else { oberon.reportError(ErrorMsg.error1n_Expr, a.isNumeric() ? b.type() : a.type()); result.setType(ErrorType.computeTypeOnError(TypeValues.integerType)); return false; } } + What did we need to do? 0. allocate and initialize space for RESULT 1. retrieve types of e1, e2 2. check compatible with respect to operation + check type checking specification handout 3. compute result type + check type checking specification handout 4. store result in RESULT 5. report error, if necessary Consider this a check list you need to follow for all type checks + So what we've done here is write the ``top level'' code to look like the spec 1. We don't have implementations for any of these methods, so we have to go implement them, but initially ONLY TO WORK FOR INTEGERS // first method delegates to the second, m_type "protected" boolean STO::isNumeric() { return m_type.isInt(); } // type field boolean IntegerType::isInt() { return true; } // type itself - Why not just store a type flag in STO? Types are complex. Types are things. There are many variables, but only one type called integer. Things should be made into classes. This will let us handle all sorts of Oberon types in the future. 2. Whenever I get "stuck", I call a high-level method and then figure out how to implement it later. If I change my mind later, I change the method implementation later. EXAMPLE: must compute type when there's an error so compiler can keep executing. So, compute a fake type. I have no idea how to do this (spec is silent on this), so I delegate to something that should know, a new class called ErrorType. I pass it a hint to help it out. 3. What is the TypeValues.integerType thing? It's a way of doing globals in Java: // class full of static/final (constant) fields class TypeValues { public static final IntegerType integerType = new IntegerType(); // more to come! } - These also come in handy in passing around types in grammar: BasicType ::= T_INTEGER {: RESULT = TypeValues.integerType; :} | ... - Elsewhere, I use statics to get access to the symbol table and error printer. + Why don't we put elaborate in Type? Interlude on two complications: Coercion & Overloading (dealing with subtypes) - Coercion is when a value is implicitly converted from one type to another: int x; float y; y = x + y; /* the value retrieved from x is coerced to float so it can be added to y */ + This is done because ints are similar to floats but represented very differently for performance reasons. Also, explicit coercion is inconvenient. Downside is ``hidden'' cost of coercion, which is significant compared to actual add. // x above is implicitly converted; this is equivalent: y = ((float) x) + y; // coercion actually produces a new (hidden) variable: float t1; t1 = (float) x; y = t1 + y; - Overloading is having one name for two different operators. x = x + x; /* + for ints */ y = y + y; /* + for reals */ + This is done because the similarities outweigh the differences with regard to convenience vs. confusion. - Coercion and overloading interact powerfully but sometimes confusingly - Subtypes are the key to understanding + Review: A type is - A set of values - A set of operations on those values - Rules for their combination + Intuitively, type S is a ``subtype'' of T if an S always behaves like a T, but sometimes T could behave unlike an S. - Cats are animals, but not all animals are cats - Ints are like reals, but all reals are not like ints - Think of this as an ``is-a'' relationship + *Statically*, in a programming language, type S is a ``subtype'' of T if - S contains only values of T - S supports all operations of T (but perhaps more) * The ints contain only reals (1, 2, ...), * The ints support all real operations (+, *, ...) * Ints support extra operations like DIV, MOD Also, subrange or enum of ints is theoretically a subtype of the ints + Subtypes can be modelled by a structure called lattice, but many programming languages don't bother, and only let you describe trees of types. real / int / \ 0..5 1..6 \ / 1..5 - although Oberon doesn't have explicit integer subranges, array declarations imply these: VAR arr : ARRAY 10 of INTEGER; (* subrange 0..9 for subscript *) - With classes + a tree of types corresponds to single inheritance, + a lattice of types corresponds to multiple inheritance, base / \ list sorted \ / sorted_list + Subtype behavior is what allows automatic coercion (assignability) rules - Automatic coercion int to real on assignment is allowed because (representation anomalies are ignored) int real - Automatic coercion real to int on assignment is not allowed because real ! int - C++ enums and ints assign both ways, but legality not checked at runtime int x; enum { ONE=1, TWO, THREE, FOUR, FIVE } s; . . . x = s; // no problem, needn't be checked s = x; // might be error, not checked! // proper code if (x >= ONE && x <= FIVE) s = x; else runtimeError("x's value out of range of s's in assignment"); + Subtype behavior is what allows automatic operator overloading x = x + x; /* + for ints +: int x int --> int */ y = y + y; /* + for reals +: float x float --> float */ y = y + x; /* + for reals, x is coerced to float first */ + Both are done because the similarities outweigh the differences with regard to convenience vs. confusion. Syntax-Directed Type Checking: Adding *, -, REAL type (etc.) - Operators *, - come for free. (Same rules.) Yay! - REAL is fairly easy, too: boolean IntegerType::isReal() { return false; } boolean RealType::isReal() { return true; } boolean RealType::isInt() { return false; } But to use the two interchangeably, we need some superclasses: class Type { /* all the above methods and more */ } class BasicType extends Type { /* maybe isBasicType method */ } class IntegerType extends BasicType { ... } // elaborate's code is unchanged! if (opSTO.isCompatible(a, b)) return opSTO.resultType(a, b); else // error ... // only code in type class changes: clients are stable! boolean OpType::compatible(Type a, Type b) { return (a.isNumeric() && b.isNumeric()); } Type OpType::resultType(Type a, Type b) { return smallestInclusiveType(a, b); // does not hold for divide! } - Note that I'm using the Oberon language definition and structure of the handout as much as possible to make visual verification easier. Because I've also designed my objects around the language definition, my changes are isolated to the Type classes. + As an engineering rule, designing for the future is important because change is pervasive. Without designing for change, your product will be too soon obsolete. One way to design for change is to isolate each likely change in a (hopefully) small class. - Now consider "/" + Result type is now always REAL! + Similar problems will arise for DIV and MOD (isInt() not isNumeric()) - Abstractions need enhancing (result of working incrementally) + Will require lots of change later for handling of other operators + Not reusable, either + Type computations should be hidden in the type module in operations + Use the language of the Oberon manual to guide us. Here is a snippet from the manual (note bold words): Expression compatible ===================== operator| first operand second operand result type --------+--------------------------------------------------------- + - * | numeric numeric smallest numeric | type including | both operands / | numeric numeric REAL DIV MOD | integer integer integer + What we see here is that compatibility and return type are triggered off of what operator is being used. - Since each operator has unique rules (*, /, MOD each different), might as well encode these differences in an object-oriented style. - Whereas we began by using terms like "numeric" and "smallest numeric", we see that those don't work for /, DIV, and MOD. - ``expression compatible'' and ``result type'' are key terms; these *category* names do work. That leads to this design: ExprSTO action(STO a, String op, STO b) { // op NOW STRING!!! ExprSTO result = new ExprSTO(); OpSTO opSTO = (OpSTO) symTab.lookup(op); if (!BinaryExpr.elaborate(result, a, opSTO, b)) { ... } ... } boolean elaborate(ExprSTO result, STO a, OpSTO op, STO b) { if (opSTO.isCompatible(a, b)) // a, b in ParamList later return opSTO.resultType(a, b); else { opSTO.reportError(a, b); // Error also dependent on operator ... } } + Why do we put the op lookup in action rather than elaborate? - Look at how much cleaner the code is! It reads like the spec! - Recall that in the discussion of types that we observed that operators are no different *semantically* from functions/procedures; once we have parsed the input, we can take advantage of this; we need only insert all the predefined operators in the symbol table and attach their types to them so that they are accessible. - This requires changes in other parts of the compiler: + Grammar: AddOp ::= T_PLUS {: RESULT = "+"; :} | ... + Symbol table addition (I did this in OberonParser): m_symtab.insert(new OpSTO("+", plusOpType)); + Or bypass the string/lookup and just return the proper opSTO: AddOp ::= T_PLUS {: RESULT = TypeValues.plusOpSTO; :} | T_MINUS {: RESULT = TypeValues.minusOpSTO; :} ... + Types and STO's for operators!!! class OpSTO extends STO { public boolean isCompatible(STO a, STO b) { return ((OpType)m_type).isCompatible(a.type(), b.type()); } ... class PlusOpType extends OpType { public boolean isCompatible(Type a, Type b) { return a.isNumeric() && b.isNumeric(); // Moved old code! } ... + The way these "specficiation-level" operations get designed and implemented is by delegation, as before. In particular, we have OpSTO's that are subclass of ProcSTO, and OpType's that are a subclass of ProcType. // note that the body is just code from before boolean OpSTO::compatible(STO a, STO b) { return this.type().compatible(a.type(),b.type()); } Type OpSTO::resultType(STO a, STO b) { return a.type().resultType(a.type(),b.type()); } // note that the body is just code from before boolean PlusOpType::compatible(Type a, Type b) { return (a.isNumeric() && b.isNumeric()); } Type PlusOpType::resultType(Type a, Type b) { return a.smallestNumericType(b); } + Note now that we've come up with a *design* that is fairly sound, but with an incomplete implementation (ints, reals, a few operators only). That's OK: - The design becomes the vessle into which we pour our implementation. - We're incrementally accruing points on the project. - Need to be more clever with a languages like C or C++: int, short, long, char, unsigned *, float, double (9^2 combinations). Ideas? Types and your Type Classes - We've derived some interfaces for your type objects, but have not said anything about the design of your type class hierarchy. - We want to design a hierarchy that reflects the semantics of the language and uses the terminology of the language's specification. In other words, ``Coding to the specification.'' - Key facts + All types (integer, real, etc.) are types (e.g., they are not symbols) This means that you probably want to have a class "Type" from which all Type classes are subclassed (extended). + A type is unique. For example, there is only one type "integer" This means that you probably want to have just *one* Type object of class Integer. - A recommended design Type / BasicType / | \ ... IntegerType RealType BooleanType + Other subtypes will come later + Put all type comparison operations in Type and define them to return false. class Type { boolean isInt() { return false; } boolean isReal() { return false; } ... } + In your subclasses, redefine these as necessary. In IntegerType: class IntegerType extends BasicType { boolean isInt() { return true; } } + Create a "singleton" class with several static public fields (data members) to hold an instance of each atomic type. In main(), when you create your symbol table, create an instance of each Integer, Real... as well and save them in the public fields (integerType, realType, ...) of the class: class TypeValues { public static IntegerType integerType; ... } main(...) { ... TypeValues.integerType = new IntegerType(); ... } + Define your STO's to have a type field (STO* in C++) + See below for rest How the Other Rules Fit In - So we were pretending that "E" already had a ExprSTO assigned to it. Where did it come from? - E ::= ID /* from prev. assignment, lookup VarSTO, check if var */ + But where does ID's VarSTO come from? - D ::= ID ':' T /* ``lookup'' type, make/insert VarSTO for ID */ + But where does T's type come from? - T ::= integer | real | boolean + rather than treating type names specially, why not put types in ST, too? + types are in same namespace as other values, anyway. + more important when have user types, too. - E ::= INTLIT | REALLIT | BOOLLIT /* make LitSTO, ConstSTO, or VarSTO to hold type (AND value) */ - S ::= ID ':=' E /* lookup VarSTO for ID, check assignability, not const */ - S ::= IF E THEN S /* E must be boolean expression (ExprSTO type is bool) */ So, now you can do rest of phase I static semantic checking - Add remaining type class operations (isInt, typeEqual, assignable) - Extend STO's to query type properties of entity/var (this is called *delegation*) + can replicate the operations on the type to be operations on the VarSTO: boolean VarSTO::assignmentCompatible(STO other) { return this.type().assignable(other.type()); /* yucky syntax */ } + another approach would be to have assignment operator in symbol table: ... assignOp = symTab.lookup(":="); // left param/operand is var, right is val return assignOp.isCompatible(a, b); ... - Add actions to grammar - Test, test, test (instrument, instrument, instrument) + remember that syntax-directed can help you write test cases - Anticipate future additions that could affect present design decisions + Composite type (represent infinite number of types) + Alias types (type can have many names) + Constant initialization + Code generation - Don't forget to ``program to the interface'' like in plus rule above, but also plan for the future as best you can Handling Type Errors - NOTE: to make type names easy to print, overload toString(), which is Java's way of treating objects like strings in string operations: class IntegerType { ... public String toString() { return "INTEGER"; } ... - Two checking issues: avoiding spurious error messages and not "pulluting" compiler code with lots of checks for error conditions. + don't have a result type for expr when its type check fails, so consuming expressions barf + don't want to be checking all over your compiler for a bad-type situation. It's hard to maintain - Recommendation: make an "error" type called ErrorType and have an instance of it returned from elaborate: result.setType(ErrorType.computeTypeOnError(hint)); which means defining computeTypeOnError to operate as follows: class Error { ... public static Type computeTypeOnError(Type hint) { return TypeValues.errorType; } ... } NOTE: this use of a static method is called a "factory". It makes things, and controls what it makes according to what parameters are passed. It is like a constructor, but can make a variety of things. - Question: How should your compiler treat a variable with type ErrorType? + Always treat as a bad type, and continue reporting errors? This is bad because you get a cascade of errors, each reporting a bad type "ERROR", which the compiler user has no idea about! + Always treat as a "good" type and hence suppress further errors. This is generally good. We can get this effect by how we write the ErrorType class: class ErrorType extends Type { boolean isError() { return true; } boolean isInt() { return true; } boolean isReal() { return false; } // dicey, explained below boolean isBool() { return true; } ... String toString() { return "ERROR"; } } By returning TRUE for most type queries, when an ErrorType is returned from the resultType() query or the compatible() check, ErrorType behaves "nicely" in type queries without having to add "Error" code all over the place. Note that this is very hard to implement if one is using instanceof or type codes to check types! + Possible catch: Because arithmetic operations compute result type of "smallest numeric type including the operand types", it returns FALSE for isReal() to ensure the smallest type is really returned, rather than forcing it to the larger REAL type and then causing a problem with assignment to an integer: VAR x, y : INTEGER; VAR b : BOOLEAN; BEGIN x := (x + b) + y END. Here, (x + b) causes an error to be reported and ErrorType to be returned. Then INTEGER type is computed for the overall expression, etc. + So note that: 1. ErrorType gets "morphed" into an actual type in compound expressions. This is OK as long as it doesn't get morphed to a "bigger" type like REAL. 2. Although there is no cascade of "bad type error" messages, more than one error can be reported in an expression if there is more than one real error: x := (x + b) + y + b Using this approach, the second "b" reference will trigger a second error, since the originally computed ErrorType will be morphed into an INTEGER by the time the second b is encountered. A Note on Layering + Layering is a software design methodology that avoids circular references amongst objects of certain types. Although perfect layering is hard to achieve, striving for it will make your system design a lot easier to understand. + Loosely, layering means if A calls B, then B shouldn't call A: you get a hierarchy. More strictly, it means that all the operations within a "layer" are involved with one kind of object or one kind of system property. However, this is different than a class. There may be many classes in a layer. However, classes, packages, or namespaces can be used to implement layers. + In your compiler there are currently three obvious layers from top to bottom, with the upper layer calling the lower layer, but not vice versa: (1) Parsing (calls) (2) Syntax-Directed Checking (e.g., BinaryExpr.elaborate) (calls) (3) Symbol Table (Scoping, STO's, etc.) (calls) (4) Types (Type classes, compatibility checks) If layering is enforced, then the Symbol Table layer won't call the Syntax-Directed Checking layer. + Example: STO's (e.g., STE's, SymbolObjects) - STO's should refer to Type objects, but not the other way around: ///////////////////////////////////////////////////////////////////// ///// these two are good //////////////////////////////////////////// boolean ExprSTO::isInt() { return this.type().isInt(); } boolean IntegerType::isInt() { return true; } // no sym's here ///////////////////////////////////////////////////////////////////// ///// these two are bad ///////////////////////////////////////////// boolean OpSTO::compatible(STO a, STO b) { return this.isCompatible(a, b); } boolean OpType::isCompatible(STO a, STO b) { // type refers to sym! Type aType = a.type(); Type bType = b.type(); ... } ///////////////////////////////////////////////////////////////////// ///// these two are good //////////////////////////////////////////// boolean OpSTO::compatible(STO a, STO b) { return this.compatible(a.type(), b.type()); // types exposed here } boolean OpType::compatible(Type a, Type b) { // no sym's ref'd here! ... } + Unfortunately, layering can't always be perfect. For example, we'll see in the next section that the RECORD type contains a symbol table of fields. That is, the type really does contain variables (and hence sym's).