Authors: jon stokes
Tags: #Computers, #Systems Architecture, #General, #Microprocessors
instructions over the first 8 ns of that program’s execution than it would have
had the pipeline been full for the entire 8 ns.
When the processor is executing programs that consist of thousands of
instructions, then as the number of nanoseconds stretches into the thousands,
the impact on program execution time of those four initial nanoseconds,
during which only one instruction was completed, begins to vanish and the
pipelined processor’s advantage begins to approach the fourfold mark. For
example, after 1,000 ns, the non-pipelined processor will have completed 250
instructions (1000 ns ÷ 0.25 instructions/ns = 250 instructions), while the pipelined processor will have completed 996 instructions [(1000 ns – 4 ns) ÷ 1
instructions/ns]—a 3.984-fold improvement.
What I’ve just described using this concrete example is the difference
between a pipeline’s
maximum theoretical completion rate
and its real-world
average completion rate
. In the previous example, the four-stage processor’s maximum theoretical completion rate, i.e., its completion rate on cycles
when its entire pipeline is full, is one instruction/ns. However, the processor’s average completion rate during its first 8 ns is 5 instructions/8 ns = 0.625
instructions/ns. The processor’s average completion rate improves as it
passes more clock cycles with its pipeline full, until at 1,000 ns, its average
completion rate is 996 instructions/1000 ns = 0.996 instructions/ns.
52
Chapter 3
At this point, it might help to look at a graph of the four-stage pipeline’s
average completion rate as the number of nanoseconds increases, illustrated
in Figure 3-9.
1
0.8
Average
Instruction
0.6
Throughput
(instructions/clock)
0.4
0.2
20
40
60
80
100
Clock Cycles
Figure 3-9: Average completion rate of a four-stage pipeline
You can see how the processor’s average completion rate stays at zero until
the 4 ns mark, after which point the pipeline is full and the processor can
begin completing a new instruction on each nanosecond, causing the average
completion rate for the entire program to curve upward and eventually to
approach the maximum completion rate of one instruction/ns.
So in conclusion, a pipelined processor can only approach its ideal
completion rate if it can go for long stretches with its pipeline full on every
clock cycle.
Instruction Throughput and Pipeline Stalls
Pipelining isn’t totally “free,” however. Pipelining adds some complexity to
the microprocessor’s control logic, because all of these stages have to be kept
in sync. Even more important for the present discussion, though, is the fact
that pipelining adds some complexity to the ways in which you assess the
processor’s performance.
Instruction Throughput
Up until now, we’ve talked about microprocessor performance mainly in
terms of instruction completion rate, or the number of instructions that the
processor’s pipeline can complete each nanosecond. A more common perfor-
mance metric in the real world is a pipeline’s
instruction throughput
, or the number of instructions that the processor completes
each clock cycle
. You might be thinking that a pipeline’s instruction throughput should always be one
instruction/clock, because I stated previously that a pipelined processor
completes a new instruction at the end of each clock cycle
in which the write
stage has been active
. But notice how the emphasized part of that definition qualifies it a bit; you’ve already seen that the write stage is inactive during
Pipelined Execution
53
clock cycles in which the pipeline is being filled, so on those clock cycles, the processor’s instruction throughput is 0 instructions/clock. In contrast, when
the instruction’s pipeline is full and the write stage is active, the pipelined
processor has an instruction throughput of 1 instruction/clock.
So just like there was a difference between a processor’s maximum
theoretical completion rate and its average completion rate, there’s also
a difference between a processor’s maximum theoretical instruction
throughput and its average instruction throughput:
Instruction throughput
The number of instructions that the processor finishes executing on
each clock cycle. You’ll also see instruction throughput referred to as
instructions per clock (IPC).
Maximum theoretical instruction throughput
The theoretical maximum number of instructions that the processor can
finish executing on each clock cycle. For the simple kinds of pipelined
and non-pipelined processors described so far, this number is always one
instruction per cycle (one instruction/clock or one IPC).
Average instruction throughput
The average number of instructions per clock (IPC) that the processor
has actually completed over a certain number of cycles.
A processor’s instruction throughput is closely tied to its instruction
completion rate—the more instructions that the processor completes each
clock cycle (instructions/clock), the more instructions it also completes over
a given period of time (instructions/ns).
We’ll talk more about the relationship between these two metrics in a
moment, but for now just remember that a higher instruction throughput
translates into a higher instruction completion rate, and hence better
performance.
Pipeline Stalls
In the real world, a processor’s pipeline can be found in more conditions
than just the two described so far: a full pipeline or a pipeline that’s being
filled. Sometimes, instructions get hung up in one pipeline stage for multiple
cycles. There are a number of reasons why this might happen—we’ll discuss
many of them throughout this book—but when it happens, the pipeline is
said to
stall
. When the pipeline stalls, or gets hung in a certain stage, all of the instructions in the stages below the one where the stall happened continue
advancing normally, while the stalled instruction just sits in its stage, and all the instructions behind it back up.
In Figure 3-10, the orange instruction is stalled for two extra cycles in the
fetch stage. Because the instruction is stalled, a new gap opens ahead of it in
the pipeline for each cycle that it stalls. Once the instruction starts advancing through the pipeline again, the gaps in the pipeline that were created by the
stall—gaps that are commonly called “pipeline bubbles”—travel down the
pipeline ahead of the formerly stalled instruction until they eventually leave
the pipeline.
54
Chapter 3