Question 1.

- Suppose that when Program A is run, the user CPU time is 3 seconds, the elapsed wallclock time is 4 seconds, and the system performance is 10 MFLOP/sec. Assume that there are no other processes taking any significant amount of time, and the computer is either doing calculations in the CPU, or doing I/O, but it can't do both at the same time. We now replace the processor with one that runs six times faster, but doesn't affect the I/O speed. What will the user CPU time, the wallclock time, and the MFLOP/sec performance be now?
- You are on the design team for a new processor. The clock of the processor runs at 200 MHz. The following table gives instruction frequencies for Benchmark B, as well as how many cycles the instructions take, for the different classes of instructions. For this problem, we assume that (unlike many of today's computers) the processor only executes one instruction at a time.
- Calculate the CPI for Benchmark B.
- The CPU execution time on the benchmark is exactly 11 seconds. What is the ``native MIPS'' processor speed for the benchmark in millions of instructions per second?
- The hardware expert says that if you double the number of registers, the cycle time must be increased by 20%. What would the new clock speed be (in MHz)?
- The compiler expert says that if you double the number of registers, then the compiler will generate code that requires only half the number of Loads & Stores. What would the new CPI be on the benchmark?
- How many CPU seconds will the benchmark take if we double the number of registers (taking into account both changes described above)?

CPU performance_{B}/CPU performance_{A} = CPU time_{A}/CPU
time_{B}

6 = 3/CPU time_{B}

User CPU Time = .5 seconds

Since the I/O time is unaffected by the performance increase, it still takes 1 second to do I/O. Therefore it takes 1 + .5 = 1.5 seconds to run Program A on the faster CPU

Wallclock Time = 1.5 seconds

System Performance in MFLOPS = Number of Floating Point Operations * 10^{6}/Wallclock Time

Old System Performance (10) = #FLOP * 10^{6}/4

#FLOP = 40 * 10^{6}

New System Performance = 40 * 10^{6}/1.5

MFLOP/sec = 26.667

Question 2.

Instruction Type |
Frequency |
Cycles |

Loads & Stores |
30% |
6 cycles |

Arithmetic Instructions |
50% |
4 cycles |

All Others |
20% |
3 cycles |

If we say that there are 100 instructions, then:

30 of them will be loads and stores.

50 of them will be arithmetic instructions.

20 of them will be all others.

(30 * 6) + (50 * 4) + (20 * 3) = 440 cycles/100 instructions

Therefore, there are 4.4 Cycles per instruction.

The formula for calculating MIPS is:

MIPS = Clock rate/(CPI * 10^{6})

The clock rate is 200MHz so...

MIPS = (200 * 10^{6})/(4.4 * 10^{6}) = 45.454545

Clock time = 1/Cycle Time

Cycle Time = 1/Clock Time

Cycle Time = 1/(200 * 10^{6}) = 5 * 10^{-9}

The cycle time is then increased by 20%:

(5 * 10^{-9}) * 1.2 = 6 * 10^{-9}

The new clock rate is thus:

1/(6 * 10^{-9}) = 166.667 * 10^{6} or 166.667 MHz

There were 100 instructions in part b, so we will reduce the number of loads and stores by

half, and this will reduce the total number of instructions. So the new instruction mix will be:

15 Loads and Stores

50 Arithmetic Instructions

20 All Others

The total number of instructions is now 85, so the answer is:

((15 * 6) + (50 * 4) + (20 * 3)) / 85 = 350 cycles/ 85 instructions = 4.12 CPI

CPU seconds = (Number of instructions * Number of Clocks per instructions)/Clock Rate

First thing we need to do, is calculate the number of instructions which execute in 11 seconds on the new benchmark - the one with half the number of loads and stores.

To do this, we will need to figure out how many instructions execute
on the original benchmark in 11 seconds. Since we know the MIPS or how
many *Millions of Instructions Per Second* for the original benchmark,
we say:

(45.45 * 10^{6}) * 11 = 500 * 10^{6} instructions
in 11 seconds

Now we need to figure out how many of those are Loads and Stores so:

(500 * 10^{6}) * .3 = 150 * 10^{6} are
Load and Store instructions because the chart says that 30% of all instructions
are Loads and Stores. Now we need to cut this number in half, because the
new benchmark says that we have half the number of loads and stores , but
the cycle time increases by 20%. Therefore there are only 75 * 10^{6}
loads and stores. This also means that there are now less total instructions,
425 * 10^{6 }total instructions.

The final solution is:

((425 * 10^{6}) * 4.12)/(166.667 * 10^{6}) = 10.548
seconds