# CSE 30 -- Lecture 12 -- Nov 10

Assignment 4: due to disk quota problems earlier today, the deadline is extended to Nov 11 4:40pm. You will also have 4 late days instead of 2, still at 10% penalty per late day.

Assignment 5: the Model 200 Calculator

You must implement the model 200 version of the original calculator from assignment 4. It must have the following new-and-improved features:

• 32 deep stack in new mode; retain compatibility mode of 4 deep stack. When the calculator is first turned on, it runs in compatibility mode, where the stack is 4 deep and behaves as in the previous model.
• new ``m'' command to switch calculator modes. This mode command checks the top of the stack to make sure that the value is either a 0 or 1. If not, an error message is generated and the stack is not affected. If the value is 0, it is popped off in the current mode (this determines which element is replicated), and then the calculator is switched into compatibility mode; if the value is 1, it is also popped off in the current mode, and then the calculator is switched into extended mode, where the stack is 32 elements deep.
• new ``t'' command to pop the top element of the stack off (if greater than equal to zero) and push the number of moves required to solve the tower-of-hanoi problem with the number of disks equal to the popped off number. The tower-of-hanoi move function is:
```int	tower_moves(int	n)
{
if (n == 0) return 0;
else return 1 + 2 * tower_moves(n-1);
}
```
If the top element of the stack was less than zero, an error message should be generated and the stack should be unaffected.
• new ``c'' change-sign function to change the sign of the topmost element of the stack.
• If you are uncertain about how this works, use the provided ~/../public/calc200 binary to see how it should work.

## Loop unrolling

We consider the following table initialization code example.
```	int	i;

for (i = 0; i < N; i++)
tbl[i] = i;
```
It assembles into
```	li \$t0, 0
b test
loop:	sll \$t1,\$t0,2
sw \$t0,tbl(\$t1)
test:	blt \$t0,\$a1, loop	# assume a1 has N
# slt \$at,\$t0,\$a1
# bne \$at,\$zero,loop
```
which is really
```	li \$t0, 0
b test
loop:	sll \$t1,\$t0,2
lui \$at, UPPER(tbl)
sw \$t0,LOWER(tbl)(\$at)
test:	slt \$at,\$t0,\$a1
bne \$at,\$zero,loop
```
This loop uses 6N + 2 instructions to initialize a table of N entries.

First, assume N is a multiple of 4. We write the code as

```	int	i, *tblp;

for (i = 0, tblp = tbl; i < N; tblp += 4) {
tblp[0] = i++;
tblp[1] = i++;
tblp[2] = i++;
tblp[3] = i++;
}
```
which assembles into
```	li \$t0, 0
la \$t1, tbl		# lui \$t1,UPPER(tbl); ori \$t1,\$t1,LOWER(tbl)
b test
loop:	sw \$t0,0(\$t1)
sw \$t0,4(\$t1)		#  no exceptions for overflows
sw \$t0,8(\$t1)
sw \$t0,12(\$t1)
test:	slt \$at,\$t0,\$a1
bne \$at,\$zero,loop
```

The second loop runs N/4 times, each iteration costing 11 instructions. Thus the run time is 11 N / 4 + 4 or 2.75 N + 4. For sufficiently large N, this is more than twice as fast.

To handle an input N that is not a known constant that is a multiple of 4, we do the following:

```	int	i, *tblp, N0;

i = 0; tblp = tbl;
N0 = N >> 2;	/* N div 4 */
switch (N&3) {	/* N rem 4 */
case 3:	*tblp++ = i++;
case 2:	*tblp++ = i++;
case 1:	*tblp++ = i++;
}
for (; i < N; tblp += 4) {
tblp[0] = i++;
tblp[1] = i++;
tblp[2] = i++;
tblp[3] = i++;
}
```
which assembles into
```	li \$t0, 0
la \$t1, tbl	# lui \$t1,UPPER(tbl); ori \$t1,\$t1,LOWER(tbl)
sra \$t2,\$a1,2	# assume a1 has N
andi \$t3,\$a1,3
lw \$t3,case_tbl(\$t3)
jr \$t3
c3:	sw \$t0,0(\$t1)
c2:	sw \$t0,0(\$t1)
c1:	sw \$t0,0(\$t1)
.data
case_tbl:
.word test,c1,c2,c3
.text
b test
loop:	sw \$t0,0(\$t1)
sw \$t0,4(\$t1)