don't let amdahl's law scare you - you already know it. consider this math problem:
we're baking a cake. it takes one person a total of 80 minutes: there are 50 minutes of baking time; the remaining minutes are preparation time. if you have a team of assistants to help you, you can finish preparations in one sixth of the time. how long will it take you to bake the cake with a team of assistants?
this problem should be easy [if not, we need to talk]. but, if we take this problem and switch some phrases around, you end up with the same problem that appeared on quiz 1. you know how to do this! don't let the big words scare you. :)
let's try a more difficult problem:
suppose 50% of all instructions executed are memory instructions, and you have some crazy ideas that will result in speedups for memory instructions. but your advisor tells you that working on these ideas won't be worthwhile unless you can get at least 1.5 overall speedup. how much speedup must you get from memory instructions to make your ideas worthwhile?
don't forget that speedup is old_execution_time /
new_execution_time.
here's another problem:
you've purchased a math coprocessor for your computer ["i remember when..."]. the box says that installing it will improve the performance of floating point instructions by 10x, and that it will improve overall performance by 5x for "most programs". what percentage of "most programs" must be floating point instructions if we want 5x overall speedup?
it's a good idea to review your si prefixes [giga=10^9, mega=10^6, micro=10^-6, nano=10^-9, etc].
it's also a good idea to remember what the following units are:
| name | measured in... |
|---|---|
| execution time | seconds |
| CPI | cycles / instruction |
| IPC | instructions / cycle |
| clock rate = frequency | cycles / second = hertz |
| cycle time | seconds / cycle |
if you remember these things, solving problems involving execution time becomes very much like those annoying conversion problems you had to do back in high school. for example:
how many seconds in an hour? well, there are 60 minutes in an hour, and 60 seconds in a minute. writing this out, we see that:
1 hour 60 minutes 60 seconds
* ---------- * ---------- = 360 seconds
1 hour 1 minute
notice how the units cancel out nicely: the word 'hour' appears once in the numerator and once in the denominator; the same is true of the word 'minute'. if we cancel out these units, we are left with just the word 'seconds' in the numerator. this means we're doing it right. :)
back to architecture. how long will it take to execute one billion instructions on a 100 MHz processor with an average of 2 CPI?
1e9 instructions 2 cycles 1 second
* ------------- * ---------- = 20 seconds
1 instruction 1e8 cycles
note how the units cancel out nicely.
try this one:
how many instructions per second can we execute with our 2 CPI, 100 MHz processor?
if we increase clock rate to 200 MHz, but we also increase CPI to 3, what is the overall speedup?
stack: instructions pop operands off the stack, operate on them, and push the results back on the stack. the jvm [java virtual machine] is a stack machine. it's very easy to generate code for a stack machine - but it's tricky to design the hardware.
accumulator: a machine with one register. operations read operate on the value in the accumulator register, and write their results to the accumulator register. this results in a very simple machine - but it will take lots of instructions to get anything done.
register-memory: operations read operands from registers or memory, and write the results to registers or memory. with a machine like this, a lot can be done in a few instructions. but implementing these complex instructions in hardware can be difficult [which tends to increase CPI].
load/store: operations read and write only to registers. explicit load and store instructions are needed to read and write to memory. it takes more instructions to get things done, but implementing simple instructions in hardware is easier [which tends to decrease CPI].
questions:
we increase pc by 4 when we move to the next instruction. the target of a branch is pc+offset*4. what's the significance of these 4's?
jumps are direct, but the jump target is only 26 bits. addresses are 32-bits - where do the remaining 6 bits come from?
what's wrong with the following instruction, and how do we fix it?
addi $1, $0, 1048576
what is sign extension, and when do we need it?
floating point numbers are represented as +/- 1.m * 2^(e-127), where m is the mantissa, and e is the exponent. the first bit is the sign bit. 8 bits are used for the exponent, and 23 bits for the mantissa.
floating point numbers are always normalized to 1.something * ..., so the leading one is assumed and is not represented.
to decode a floating point number:
to encode a floating point number:
problems:
decode the following numbers from ieee fp representation: 0x40550000 0xc328a000
encode the following decimal numbers in ieee fp representation: 9.5 -0.1875