Solutions for HW#1: Questions 1 and 2

Question 1.

Suppose that when Program A is run, the user CPU time is 3 seconds, the elapsed wallclock time is 4 seconds, and the system performance is 10 MFLOP/sec. Assume that there are no other processes taking any significant amount of time, and the computer is either doing calculations in the CPU, or doing I/O, but it can't do both at the same time. We now replace the processor with one that runs six times faster, but doesn't affect the I/O speed. What will the user CPU time, the wallclock time, and the MFLOP/sec performance be now?

CPU performance_B/CPU performance_A = CPU time_A/CPU time_B

6 = 3/CPU time_B

User CPU Time = .5 seconds

Since the I/O time is unaffected by the performance increase, it still takes 1 second to do I/O. Therefore it takes 1 + .5 = 1.5 seconds to run Program A on the faster CPU

Wallclock Time = 1.5 seconds

System Performance in MFLOPS = Number of Floating Point Operations * 10⁶/Wallclock Time

Old System Performance (10) = #FLOP * 10⁶/4

#FLOP = 40 * 10⁶

New System Performance = 40 * 10⁶/1.5

MFLOP/sec = 26.667

Question 2.

You are on the design team for a new processor. The clock of the processor runs at 200 MHz. The following table gives instruction frequencies for Benchmark B, as well as how many cycles the instructions take, for the different classes of instructions. For this problem, we assume that (unlike many of today's computers) the processor only executes one instruction at a time.

Instruction Type	Frequency	Cycles
Loads & Stores	30%	6 cycles
Arithmetic Instructions	50%	4 cycles
All Others	20%	3 cycles

Calculate the CPI for Benchmark B.

If we say that there are 100 instructions, then:

30 of them will be loads and stores.

50 of them will be arithmetic instructions.

20 of them will be all others.

(30 * 6) + (50 * 4) + (20 * 3) = 440 cycles/100 instructions

Therefore, there are 4.4 Cycles per instruction.

The CPU execution time on the benchmark is exactly 11 seconds. What is the ``native MIPS'' processor speed for the benchmark in millions of instructions per second?

The formula for calculating MIPS is:

MIPS = Clock rate/(CPI * 10⁶)

The clock rate is 200MHz so...

MIPS = (200 * 10⁶)/(4.4 * 10⁶) = 45.454545

The hardware expert says that if you double the number of registers, the cycle time must be increased by 20%. What would the new clock speed be (in MHz)?

Clock time = 1/Cycle Time

Cycle Time = 1/Clock Time

Cycle Time = 1/(200 * 10⁶) = 5 * 10^-9

The cycle time is then increased by 20%:

(5 * 10^-9) * 1.2 = 6 * 10^-9

The new clock rate is thus:

1/(6 * 10^-9) = 166.667 * 10⁶ or 166.667 MHz

The compiler expert says that if you double the number of registers, then the compiler will generate code that requires only half the number of Loads & Stores. What would the new CPI be on the benchmark?

There were 100 instructions in part b, so we will reduce the number of loads and stores by

half, and this will reduce the total number of instructions. So the new instruction mix will be:

15 Loads and Stores

50 Arithmetic Instructions

20 All Others

The total number of instructions is now 85, so the answer is:

((15 * 6) + (50 * 4) + (20 * 3)) / 85 = 350 cycles/ 85 instructions = 4.12 CPI

How many CPU seconds will the benchmark take if we double the number of registers (taking into account both changes described above)?

CPU seconds = (Number of instructions * Number of Clocks per instructions)/Clock Rate

First thing we need to do, is calculate the number of instructions which execute in 11 seconds on the new benchmark - the one with half the number of loads and stores.

To do this, we will need to figure out how many instructions execute on the original benchmark in 11 seconds. Since we know the MIPS or how many Millions of Instructions Per Second for the original benchmark, we say:

(45.45 * 10⁶) * 11 = 500 * 10⁶ instructions in 11 seconds

Now we need to figure out how many of those are Loads and Stores so:

(500 * 10⁶) * .3 = 150 * 10⁶ are Load and Store instructions because the chart says that 30% of all instructions are Loads and Stores. Now we need to cut this number in half, because the new benchmark says that we have half the number of loads and stores , but the cycle time increases by 20%. Therefore there are only 75 * 10⁶ loads and stores. This also means that there are now less total instructions, 425 * 10⁶total instructions.

The final solution is:

((425 * 10⁶) * 4.12)/(166.667 * 10⁶) = 10.548 seconds