Advanced Code Gen for Classes
Bill Griswold
Adapted by Beth Simon
NOTE: The content of this outline is no longer maintained
Review from last couple classes
- Addresses are more powerful than values!
For both VAR parameters and arrays, we realized that sometimes a reference
to a variable meant that we wanted the value of that variable and sometimes it meant we wanted the address of that variable:
-
consider Example 1: x := A[i]
we want to assign x whatever value is in the ith element of the array A
-
consider Example 2: A[j] := y
we want the *value* of y to be placed in the location in memory
(ie *address*) where we are storing A[j]
This lead us to change our emitload and emitstore routines to be more
discriminating. We now have three different versions: emitloadaddress, emitloadvalue, and emitstorevalue.
When do you call what? Well, you might be encouraged to call emitloadvalue
for Example 1 above. But, you'd be wrong. See lecture notes from last class!
(Hint, emitloadvalue is called in the Statement parse rule!)
What got changed in these routines? That can be aswered with another question:
What do both VAR parameters and arrays have in common?
Answer: Both are of Mode "ref"!
void emitloadaddress(STE *var, char *reg) {
if (var->isReference()) // includes array elem and class field refs
eprint("ld [%s+%d], %s", var->baseName(), var->offset(), reg);
else if (var->isVariable()) // for arrays and passing arg by "var"
eprint("add %s, %d, %s", var->baseName(), var->offset(), reg);
else
/* error */
}
void emitloadvalue (STE *var, char *reg) {
if (var->isConstant())
eprint("set %d, %s", var->intValue(), reg);
else if (var->isReference()) {
char *r = getreg();
eprint("ld [%s+%d], %s", var->baseName(), var->offset(), r); // load addr
eprint("ld [%s], %s", r, reg); // deref
freereg(r);
}
else if (var->isVariable())
emitload(var, reg); // maybe should appear in-line
//eprint("ld [%s + %d], %s\n", var->baseName(), var->offset(), reg);
else
/* error */
}
Compiling Classes: Records + Functions = Objects
Overview:
- What does a "class" look like?
- Record assignment
- Record field reference
- x := mystack.size;
- mystack.size := y;
- S.size := S.size -1; (from inside pop)
- Method Invocation
- Keeping "hold" of mystack inside pop
- mystack.push(mystack2.top());
- Distinguishing between mystack.size and mystack.pop
- Dynamic Allocation
What does a "class" look like?
RECORD Stack
stk : ARRAY 100 OF INTEGER;
size : INTEGER;
...
END Stack;
PROCEDURE (S : POINTER TO Stack) pop();
S.size := S.size - 1;
END pop;
PROCEDURE (S : POINTER TO Stack) init();
S.size := 0;
END init;
VAR mystack : POINTER TO Stack;
VAR mystack2 : POINTER TO Stack;
VAR x,y : INTEGER;
BEGIN
new(mystack);
new(mystack2);
mystack.init();
mystack2.init();
mystack2 := mystack;
x := mystack.size;
mystack.size := y;
mystack.push(3);
mystack.push(7);
IF (mystack.size > 5) & mystack.top() < 10 THEN
mystack.pop();
END IF;
END.
Note:
- objects are stored in dynamic area, since can be allocated or
freed at any time
- pointer to object points to object in dynamic area
- in this picture, we see variable mystack in the global area pointing
to some storage (the sequences of o's) (similar for mystack2)
--------------------
+mystack mystack2 +
+ +-+ +-+ +
+-----+-| |-+-------+ global
| + +-+ +-+ + |
| -------------------- |
| + + |
| + -------------+ text
| + | +
| ---------V----------
| + oooooooo +
+---->ooooooo + dynamic
+ +
--------------------
+ +
--------------------
+ +
+ + stack
+ +
--------------------
STE STE
+-------------+ +--------------+
|mystack | |mystack2 |
| | | |
|gp 16 | |gp 48 |
+-------------+ +--------------+
Record Assignment: mystack2 := mystack;
Gee, the STE of a record holds an address where the record starts, kind of
like an array STE does. Great, maybe the code I need for assignment
for records is the same I need for arrays! But I control emitloadaddress
and emitloadvalue by asking about isReference() for STEs and mystack
is not of mode isReference()!
What do I need to have done in this case?
- I need the value (an address) in mystack2 (gp+48) to get the value
(also an address this time gp+16) in mystack.
- This is pointer assignment!
What does the Statement Parse Rule already do?
Answer: Exactly what we want it to do!
Note: Even though records and arrays are both have STEs that
keep the
addresses of the start of their structures, the fact that records
are stored as pointers and arrays are marked as Mode:Reference helps us a
great deal here! We, didn't have to make any changes to our code to get
this to work!
Record Field Reference
Conceptually: What do I have to do to "access" mystack.size?
Answer: Something similar to arrays but this time "base" is the address stored in "mystack" STE and offset is sum of sizes of object
components before "size"
obj_ptr = base(mystack) + offset(mystack)
obj_base = *obj_ptr
&(mystack.size) = obj_base + offset(size)
Parsing: The address is more powerful than the value!
No matter if mystack.size is on the right or the left of an assignment
Grammar rule is Des : Des . ID
This will look very similar to emitarrayref.
$$ = newtemp();
emitComputeFieldAddress($3, $1, $$);
void emitComputeFieldAddress(STE *field, *record, *result) {
char *base_address_reg = get_reg();
char *final_address_reg = get_reg();
//Recall, emitloadvalue uses the base and offset of the STE
//it is passed and stores the value at that address in the register
//passed to it
//For a pointer to a record -- this result is the address of
//the start (or base) of the record pointed to
emitloadvalue(record, base_address_reg);
eprint("add %s, %s, %s", base_address_reg,
field->offset(),
final_address_reg);
emitstorevalue(final_address_reg, result);
//set type/mode of result so that emitstore, etc., knows what to do
result->setType(field->type()->elemType());
result->setMode(REFERENCE);
freereg(base_address_reg);
freereg(final_address_reg);
}
This will make the assembly code for any access to mystack.size
(on left or right side of assignment statement) look like:
; value of mystack (pointer to Stack) is
; in gp+16
; Des : Des . ID emitted by
; emitComputeFieldAddress
ld [gp + 16], r1 ; load pointer to mystack (emitloadvalue)
add r1, 400, r2 ; point to size within mystack (skip array)
st r2, [fp - 64] ; store to T1 (emitstorevalue) (64 is made up)
Note: Record Offsets are slightly different from Array Offsets
Method Invocation
There are three problems to be discussed:
1) Keeping "hold" of a record STE inside a method call
Motivation: In order to use emitComputeFieldAddress to emit code for
S.size := S.size - 1;
we need to
have an STE that is returned for the lookup on S.
- invoking a method is like procedure call, except that there is
an implicit parameter ``mystack'':
mystack.pop() --> pop(mystack)
------------- <- sp
+ locals +
-------------
+ saved r's +
------------- Pop's frame
+ ... +
+ 1st arg +
+ mystack + (aka S)
------------- <- fp
+ +
+ + Main's frame
+ +
-------------
- This allows the "S.size" reference to be really treated as
"mystack.size" in the body of pop by accessing the first
arg of pop (which contains the pointer to mystack)
- The assembly that should be produced is:
; value of mystack (pointer to Stack) has
; been pushed on stack as a parameter to
; pop
; Des : Des.ID
; (handled by emitComputFieldAddress)
ld [fp + 48], r1 ; load pointer to mystack (48 is made up)
add r1, 400, r2 ; point to size within mystack (skip array)
st r2, [fp - 64] ; store to T1 (64 is made up)
; Des : Des.ID (and then E : Des)
; (handled by emitComputFieldAddress)
ld [fp + 48], r1 ; load pointer to mystack
add r1, 400, r2 ; point to size within mystack (skip array)
st r2, [fp - 68] ; store to T2
; E : E - E
ld [fp - 68], r1 ; load address size from T2
ld [r1], r2 ; load value size
set 1, r3 ; load constant
sub r2, r3, r4 ; subtract
st r4, [fp - 72] ; store to T3
; S : Des := E
ld [fp - 64], r1 ; load address size from T1
ld [fp - 72], r2 ; load T3
st r2, [r1] ; store T3 to size
- How to do this? Let's just pass another argument to pop!
Des : Des . ID // Des = mystack ste, ID = pop ste
| Des ( AL ) // Des = pop ste, how to access mystack ste?
AL : AL E
| E
Unfortunately, the grammar and our single STE result per rule
convention does not allow passing both STE's to the method invocation;
moreover, we can't put mystack on the call stack early, because...
2) Nested/recursive method calls
At the time that we'd want to go ahead and put "mystack" as an extra
argument to pop (or push), the
compiler hasn't generated the function calls that are likely to appear
in the argument list. In the case of
mystack.push(mystack2.top());
the mystack2 call as to be handled before the compiler can start
emiting code for the mystack->push call.
As usual, we can use a stack to handle the nested/recursive nature
of the constructs.
Note:
The offsets stored for method STE's need to account for the
fact that an extra argument has been stuck at the front. You could
put it at the end, too, to avoid the problem.
3) Distinguishing between method calls and field references
Now that we feel we've created solutions to both our record field access problem
(even inside a method call) and our method invocation problem
we're done.
Except how do we
know when to use what solution?
The "Des . ID" syntax does not distinguish variable
reference and type-bound procedure call, which are handled differently.
You need to have an explicit test in the rule to distinguish the two
cases and generate different code for them:
Des : Des . ID { field_si := $1->lookup($3); if (field_si->isProc()) ... }
Type-bound Procedure Naming
- Because of type-bound procedures, there can be several procedures with
the same name.
- In Oberon the type-bound procedure is in the scope of the record,
so it can't be confused with a global procedure of the same name.
You just look it up in the record's symbol table.
- In assembler, however, there is only one, global scope. What to do?
- Simple solution is to prepend the record name on the type-bound
procedure's name.
- For example init might become Stack__init. (There is
a convention that says that programmers should not use double-underscore
in their C/C++ programs; the compiler uses such names. You could also
put some strange symbol in the name like % or $, I think, to avoid
name clashes.)
Dynamic Allocation
- For us, this is easy: call C's cmalloc, don't do init
- write C program that does a cmalloc, mimic the code
- pass to it the size of your object in bytes (ste->type()->size())
- store result in object variable pointer
Lesson: Matching Solution Structure to Problem and Conceptual Integrity are
Not Formulaic
- Each problem we have encountered has had a different approach to solution
- Force to stay in model: new operators, large constants
- Tweak model: arrays, complex conditionals, small constants
- Change grammar: method invocation
- Replicate code in different context: setting parameters in call
- Came from complex management of tradeoffs
- Different problem structures are not mutually consistent in view
of new additions, so must conflict in final solution (at least for
reasonable cost: 80/20 rules)
- emitstore/emitputarg versus complex base calculation
- Conceptual integrity is relative
- ease of understanding what is there
- ease of adding what hasn't been added yet
- how do you decide which to take care of?
- spend more time to achieve both, but then deadline can slip
- In the end, our choices have to be made based on estimates of
complexity and cost
- can't know (exact) cost of future changes
- can't even cost-effectively enumerate all present tradeoffs
quantitatively