- Names, Variables (Section 5.1,5.2,5.3) - Expressions and assignments (chapter 7) - Storage binding (Section 5.4.3) ===================================================================== * Names: ===================================================================== - Names are associated to program constructs (labels, subprograms, variables, formal parameters, etc.) - some languages impose a length limit: - earlier languages: 1 character - FORTRAN I: 6 characters - Ada: 200 characters - C89: no limit, but only first 31 were significant - C++: implementation dependent - case-sensitive names is by now considered a good thing by most - Special words (for, while, ..) - keywords: special only in certain contexts FORTRAN: REAL INTEGER INTEGER REAL - reserved words: cannot be used as names (known to be better by now) ===================================================================== * Variables: ===================================================================== - a variable is a 6-tuple: - Name - Address (l-value: LHS of an assignment) - Value (r-value: RHS of an assignment) - Type - Lifetime - Scope (will discuss in future lectures) ============================================================= * Expression and Assignments ============================================================= * Operand evaluation order matters - Example: int f1(int *x) { (*x)++; return *x; } int f2(int *x) { (*x) *= (*x); return *x; } int main() { int a = 1; printf("%d\n",f2(&a) + f1(&a)); } left-to-right: f2: set a to 1; return 1 f1: set a to 2 return 2; print 3 right-to-left: f1: set a to 2 return 2 f2: set a to 4 return 4 print 6 - This happens only because functions f1 and f2 have side-effects function changes a global variable or its parameters - Solution: 1. Disallow side effects - eliminates a lot of convenient flexibility - removing global variables causes major speed problems 2. Strict evaluation order precludes some compiler optimizations - No perfect solution - Pascal, Ada, Scheme: language implementors can do whatever they want - Other example: x = 1; x += (x=2); 3 possibilities: x = 2; oldx = x; x = oldx + 2; --> 4 oldx = x; x = 2; x = oldx + 2; --> 3 oldx = x; x = oldx + 2; x = 2; ---> 2; - Java: semantics enforce second order --> 3 - C: implementation-dependent (on my laptop: returns 4) - Rule of thumb: When in doubt, use temporary variables! * Assignment statements - The code in the last example above worked because "assignment" is an operator: x = 1: return 1 and, as a "side effect" assign value 1 to variable x. - makes it possible to write compact loops: while ((ch = getchar()) != EOF) { } // reads a file void strcpy(char *q, char *p) { // copies a string into another while (*q++ = *p++); } - makes it possible to define associativity a = b = 0; (right associativity) (a = (b = 0)): means something ((a = b) = 0): means nothing - cause of error in C/C++: if (x = y) instead of if (x == y) (lint can be used to detect this) - Java solves this: only allows boolean expressions. * Short-circuit evaluation - An expression in which the result is determined without evaluating all of the operands - goal: efficiency - Example: (x - 4) * (b * 2)? if x = 4, one already knows that the expression evaluates to 0 (a >= 0) and (b < 10) if a < 0, evaluates to false (a >= 0) or (b < 10) if a >= 0, evaluates to true - Java example: index = 1; while ((index < listlen) && (list[index] <> key)) index++; - if no short-circuit, both are evaluated, which causes an exception because list[listlen] is evaluated! - Short circuit evaluation exposes the problem of side-effects: (a > b) || (b++ / 3) - if (a>b) then b++ is never evaluated and b's value does not change - if (a<=b) then b++ is evaluated and b's value changes - a programmer may forget and assume that b will always be incremented! - Can be worse - instead of b++ one has a function call that modifies some "unrelated" global variable. - Ada makes it explicit: "and then" "or else" - In C, Java: && and || are short-circuited - Common Lisp does short-circuit ===================================================================== * Binding: ===================================================================== - Binding: general notion of an association between two entities - In the context of PLs, binding can take place at: - language design time - language implementation time (or, compiler design time) - compile time - link time - load time - run time - Examples: - '*' is bound to multiplication at language design time - 'INTEGER' type in FORTRAN is bound to a range of possible values at language implementation time - A Java variable is bound to a type at compile time - A call to a function is bound to the function code at link time - A variable can be bound to storage at load time - A value is bound to a variable at run time - Example from the book: int count; count = count + 5; - set of possible types for count: bound at language design time - type of count: bound at compile time - set of possible values of count: bound at language implementation time - value of count: bound at execution time - set of possible meaning for the operator +: bound at language design time - Meaning of the operator symbol '+': bound at compile time - Internal representation of literal 5: bound at language implementation time - Static binding: occurs before run time and remains unchanged throughout program execution - Dynamic binding: occurs during run time or can change throughout program execution. ========================================================================== * Storage binding and lifetime ========================================================================== * Introduction - How does one associate memory locations to variables? - Fundamental for a PL, and it's important to really understand: - to detect bugs - to understand performance issues - Thre definitions: 1. "Allocation": the process of binging a variable to a memory cell that is taken from a pool of available memory. 2. "Deallocation": the process of unbinding a variable and placing its memory cell back in the pool of available memory. 3. "Lifetime of a variable": the time during which the variable is bound to a specific memory location. - Four categories of variables: 1. static 2. stack-dynamic 3. explicit heap-dynamic variables 4. implicit heap-dynamic variables * The runtime memory - The logical organization of the memory that is used by a running problem - 3 main areas for variables: - the global or static area - the stack (contains "activation records") - the heap + the registers for temporary data + read-only memory pages for constant data and code +-----------------+ | static area | +-----------------+ | stack | +-----------------+ | | | | v | | | | ^ | | | | +-----------------+ | heap | +-----------------+ * Static variables - bound to memory cell before program execution begins, and remain bound to those same memory cells until termination. - lifetime = entire program execution - Global variables: - advantage: convenient and efficient - disadvantage: detrimental to modularity - History sensitive variables int f(int x) { static int a = 0; a += 2; printf("%d\n",a); } f(2) --> 2 f(4) ---> 4 - advantage: efficient (no indirection + no alloc/de-alloc) - disadvantage: does not allow recursion (FORTRAN I, II, IV) * Stack-dynamic Variables: - bound to storage when their declaration statements are "elaborated", i.e. when execution reaches the code to which the declaration is attached, at run time. int f(int x) { int a = 0; ... f(y+a); ... } - lifetime = while the subprogram is active - stack-dynamic variables are allocated from the run-time stack. | | Stack pointer ----> +--------------+ | | +- - - - - - - + Activation Record for f() | | +--------------+ | | Activation Record for g() +--------------+ | | Activation Record for main() +--------------+ Runtime Stack - advantage: enable recursion as each active copy of the program can have its local storage. - disadvantage: overhead (bookkeeping + indirection) - we'll talk about the stack in more detail when we talk about scoping * Explicit Heap-dynamic Variables: - the heap is a unstructured pool of memory cells. - variables are bound to memory cells that are allocated and deallocated by explicit run time instructions specified by the programmer. - lifetime = from explicit allocation to explicit deallocation - the variables can only be pointers or references. C: int *intptr; intptr = (int *) malloc(1 * sizeof(int)); free(intptr); - advantage: high flexibility, convenient to implement data structures - disadvantage: low reliability - Solutions in Java: - remove pointers (only references) - array bound checking - garbage collection * Implicit heap-dynamic variables: - bound to heap storage only when assigned values, with implicit deallocation. - lifetime = from implicit allocation to implicit deallocation - In C: char *x; x = "foo"; x = "foobar"; // this is what the book says, we'll see that // it's actually trickier than this - In Javascript: x = 2; x = "foo"; x = [1,2,3,4,5]; - in this last example storage is bound at each assignment as well as the type (see our discussion of type binding) * Can be tricky: - Example in C // implicit // explicit int main() { int main() { char *a="foo"; char *a; a[0] = 'F'; a = (char *)malloc(4*sizeof(char)); } strcpy(a,"foo"); a[0] ='F'; } - The code on the left causes a "Bus Error" but the code on the right works! - Explanation: - In the code on the left the compiler treats "foo" as a constant and allocates it in the static area a compile time, so that overwriting any piece of it is illegal - So it would seem that this is not really heap-dynamic - Even weirder perhaps: the following code works perfectly: int main() { char a[4]="foo\0"; a[0]='F'; } - This indicates that the array 'a' is indeed heap-dynamic. - The compiler I used must make a distinction between strings and character arrays in terms of storage binding when the binding is implicit.