for (i = 0; i < N; i++) {
dst[i] = src[i];
}
is translated into MIPS as
li $t0, 0 b test bod: sll $t1,$t0,2 add $t2,$t1,$a0 add $t3,$t1,$a1 lw $t4,0($t3) sw $t4,0($t2) add $t0,$t0,1 test: blt $t0,$a3,bodwith the obvious register assignments. The runtime of this code is 3 + 7 N cycles.
To unroll this loop, first we assume divisibility of N by 4:
for (i = 0, sp = src, dp = dst; i < N; i += 4) {
dp[i+0] = sp[i+0];
dp[i+1] = sp[i+1];
dp[i+2] = sp[i+2];
dp[i+3] = sp[i+3];
dp += 4; sp += 4;
}
which would be translated into MIPS code as
li $t0, 0 move $t8,$a0 move $t9,$a1 b test bod: lw $t1,0($t9) lw $t2,4($t9) lw $t3,8($t9) lw $t4,12($t9) sw $t1,0($t8) sw $t2,4($t8) sw $t3,8($t8) sw $t4,12($t8) add $t0,$t0,4 add $t9,$t9,16 add $t8,$t8,16 test: blt $t0,$a2,bodwhich has a run time of 5 + 12 (N/4) = 5 + 3 N cycles. This could actually be improved a little still, without unrolling any more:
move $t8,$a0 move $t9,$a1 sll $t1,$a2,2 add $t0,$t9,$t1 b test bod: lw $t1,0($t9) lw $t2,4($t9) lw $t3,8($t9) lw $t4,12($t9) sw $t1,0($t8) sw $t2,4($t8) sw $t3,8($t8) sw $t4,12($t8) add $t9,$t9,16 add $t8,$t8,16 test: blt $t9,$t0,bodWhat did I do there?
To handle the cases when N is not a multiple of 4, we do
move $t8,$a0 move $t9,$a1 # and $t1,$a2,3 sll $t1,$t1,2 # was missingThis is roughly the C code:lw $t1,jtbl($t1) jr $t1 L3: lw $t1,0($t9) sw $t1,0($t8) add $t9,$t9,4 add $t8,$t8,4 L2: lw $t1,0($t9) sw $t1,0($t8) add $t9,$t9,4 add $t8,$t8,4 L1: lw $t1,0($t9) sw $t1,0($t8) add $t9,$t9,4 add $t8,$t8,4 and $t1,$a2,~3 # 0xfffffffc sll $t1,$t1,2 add $t0,$t9,$t1 .data jtbl: .word test .word L1, L2, L3 .text # bod: lw $t1,0($t9) lw $t2,4($t9) lw $t3,8($t9) lw $t4,12($t9) sw $t1,0($t8) sw $t2,4($t8) sw $t3,8($t8) sw $t4,12($t8) add $t9,$t9,16 add $t8,$t8,16 test: blt $t9,$t0,bod
sp = src; dp = dst;
switch (N % 4) {
case 3: *dp++ = *sp++;
case 2: *dp++ = *sp++;
case 1: *dp++ = *sp++;
}
N = N & ~3;
for (endptr = sp + N; sp < endptr; ) {
dp[i+0] = sp[i+0];
dp[i+1] = sp[i+1];
dp[i+2] = sp[i+2];
dp[i+3] = sp[i+3];
dp += 4; sp += 4;
}
Multitasking is the ability to run several (usually unrelated) programs at once; the programs typically have separate address spaces. Multithreading is having several virtual CPUs, typically sharing the same address space.

bsy+www@cs.ucsd.edu, last updated