Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture (68 page)

Read Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture Online

Authors: jon stokes

Tags: #Computers, #Systems Architecture, #General, #Microprocessors

BOOK: Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture
3.42Mb size Format: txt, pdf, ePub

to decode. The P6’s complex/slow decoder works in conjunction with the

microcode ROM to handle the really complex legacy instructions, which are

translated into sequences of micro-ops that are read directly from the ROM.

The Cost of
x
86 Legacy Support on the P6

All of this decoding and translation hardware takes up a lot of transistors.

MDR estimates that close to 40 percent of the P6’s transistor budget is spent

on
x
86 legacy support. If correct, that’s even higher than the astonishing 30 percent estimate for the original Pentium, and even if it’s incorrect, it

still suggests that the cost of legacy support is quite high.

At this point, you’re probably thinking back to the conclusion of the first

part of this chapter, in which I suggested that the relative cost of
x
86 support has decreased with successive generations of the Pentium. This is still true,

but the trend didn’t hold for the first instantiation of the P6 microarchi-

tecture: the original 133 MHz Pentium Pro. The Pentium Pro’s L1 cache was a

modest 16KB, which was small even by 1995 standards. The chip’s designers

had to skimp on on-die cache, because they’d spent so much of their tran-

sistor budget on the decoding and translation hardware. Comparable RISC

processors had two to four times that amount of cache, because less of the

die was taken up with front-end logic, so they could use the space for cache.

When the P6 microarchitecture was originally launched in its Pentium

Pro incarnation, transistor counts were still relatively low by today’s standards.

But as Moore’s Curves marched on, microprocessor designers went from

thinking, “How do we squeeze all the hardware that we’d like to put on the

chip into our transistor budget?” to “Now our generous transistor budget will

let us do some really nice things!” to “How on earth do we get this many

transistors to do useful, performance-enhancing work?”

What really drove the decrease in subsequent generations’ costs for
x
86

support was the increase in L1 cache sizes and the L2 cache’s move onto the

die, because the answer to that last question has—until recently—been,

“Let’s add cache.”

Summary: The P6 Microarchitecture in Historical Context

This concluding section provides an overview of the P6 microarchitecture in

its various incarnations. The main focus here is on fitting everything together

and giving you a sense of the big picture of how the P6 evolved. The historical

narrative outlined in this section seems, in retrospect, to have unfolded over

a much longer length of time than the seven years that it actually took to go

from the Pentium Pro to the Pentium 4, but seven years is an eternity in

computer time.

The Intel Pentium and Pentium Pro

107

The Pentium Pro

The processor described in the preceding section under the name
P6
is the original, 133 MHz Pentium Pro. As you can see from the processor comparison in Table 5-2, the Pentium Pro was relatively short on transistors, short

on cache, and short on features. In fact, the original Pentium eventually

got rudimentary SIMD computing support in the form of Intel’s MMX

(Multimedia Extensions), but the Pentium Pro didn’t have enough room

for that, so SIMD got jettisoned in favor of all that fancy decoding logic

described earlier.

In spite of all its shortcomings, though, the Pentium Pro did manage

to raise the
x
86 performance bar significantly. Its out-of-order execution engine, dual integer pipelines, and improved floating-point unit gave it

enough oomph to get the
x
86 ISA into the commodity server market.

The Pentium II

MMX didn’t make a return to the Intel product line until the Pentium II.

Introduced in 1997, this next generation of the P6 microarchitecture

debuted at speeds ranging from 233 to 300 MHz and sported a number of

performance-enhancing improvements over its predecessor.

First among these improvements was an on-die, split L1 cache that was

doubled in size to 32KB. This larger L1 helped boost performance across the

board by keeping the PII’s lengthy pipeline full of code and data.

The P6’s basic pipeline stayed the same in the PII, but Intel widened the

back end as depicted in Figure 5-11 by adding the aforementioned MMX

support in the form of two new MMX execution units: one on issue port 0

and the other on issue port 1. MMX provided vector support for integers

only, though. It wasn’t until the introduction of Streaming SIMD Extensions

(SSE) with the PIII that the P6 microarchitecture got support for floating-

point vector processing.

Reservation Station (RS)

Port 0

Port 1

Port 0

Port 0

Port 1

Port 4

Port 3

Port 2

Port 1

CIU

SIU

Store

Store

Load

MMX0

MMX1

FPU

Data

Addr.

Addr.

BU

Floating-

Branch

MMX Unit

Point

Integer Unit

Load-Store Unit

Unit

Unit

Vector ALUs

Scalar ALUs

Memory Access Units

Back End

Figure 5-11: The Pentium II’s back end

108

Chapter 5

The Pentium II’s integer and floating-point performance was relatively

good compared to its CISC competitors, and it helped further the trend,

started by the Pentium Pro, of
x
86 commodity hardware’s migration into

the server and workstation realms. However, the PII still couldn’t stand up

to RISC designs built on the same process with similar transistor counts.

Its main advantage was in bang for the buck, whereas the more expensive

RISC chips specialized in pure bang.

The Pentium III

Intel introduced its next P6 derivative, the Pentium III (PIII), in 1999 at

450 MHz on a 0.25 micron manufacturing process. The first version of the

Pentium III, code-named
Katmai
, had a 512KB off-die L2 cache that shared a small piece of circuit board (called a
daughtercard
) with the PIII. While this design offered fair performance, the PIII didn’t really begin to take off

from a performance standpoint until the introduction of the next version

of the PIII, code-named
Coppermine
, in early 2000.

Coppermine was produced on a 0.18 micron manufacturing process,

which means that Intel could pack more transistors onto the processor die.

Intel took advantage of this capability by reducing the PIII’s L2 cache size to

256KB and moving the cache onto the CPU die itself. Having the L2 on the

same die as both the CPU and the L1 cache dramatically reduced the L2

cache’s access time, a fact that more than made up for the reduction in cache

size. Coppermine’s performance scaled well with increases in clock speed,

eventually passing the 1 GHz milestone shortly after AMD’s Athlon.

The Pentium III processor introduced two significant additions to

the
x
86 ISA, the most important of which was a set of floating-point SIMD

extensions to the
x
86 architecture called Streaming SIMD Extensions

(SSE). With the addition of SSE’s 70 new instructions, the
x
86 architecture completed much more of what had been lacking in its support for vector

computing, making it more attractive for applications like games and

digital signal processing. I’ll cover the MMX and SSE extensions in more

detail in Chapter 8, but for now it’s necessary to say a word about how the

extensions were implemented in hardware.

The Pentium III’s designers added the majority of the new SSE hardware

on issue port 1 (see the back end in Figure 5-12). The new SSE units attached

to port 1 handle vector SIMD addition, shuffle, and reciprocal arithmetic

functions. Intel also modified the FPU on port 0 to handle SSE multiplies.

Thus the Pentium III’s main FPU functional block is responsible for both

scalar and vector operations.

The PIII also introduced the infamous
processor serial number (PSN)
, along with new
x
86 instructions aimed at reading the number. The PSN was a unique serial number that marked each processor, and it was intended for use in

securing online commercial transactions. However, due to concerns from

privacy advocates, the PSN was eventually dropped from the Pentium line.

The Intel Pentium and Pentium Pro

109

Reservation Station (RS)

Port 0

Port 1

Port 1

Port 0

Port 0

Port 1

Port 4

Port 3

Port 2

Port 1

VFADD

FPU &

CIU

SIU

Store

Store

Load

MMX0

MMX1

Data

Addr.

Addr.

BU

VSHUFF

VFMUL

VRECIP

FP/SSE

Branch

MMX/SSE Unit

Unit

Integer Unit

Load-Store Unit

Unit

Vector ALUs

Scalar ALUs

Memory Access Units

Back End

Figure 5-12: The Pentium III’s back end

Conclusion

The Pentium may not have outperformed its RISC contemporaries, but it was

superior enough to its
x
86-based competition to keep Intel comfortably in command of the commodity PC market. Indeed, prior to the rise of Advanced

Micro Devices (AMD) as a serious competitor, Intel had the luxury of setting

the pace of progress in the
x
86 PC space. Products were released when Intel was ready to release them, and clock speeds climbed when Intel was ready for

them to climb. Intel’s competitors were left to respond to what the larger

chipmaker was doing, with their own
x
86 products always lagging significantly behind Intel’s in performance and popularity.

AMD’s Athlon was the first
x
86 processor to pose any sort of threat to

Intel’s technical dominance, and by the time the PIII made its debut in

1999, it was clear that Intel and AMD were locked in a “gigahertz race” to

see who would be the first to introduce a processor with a 1 GHz clock speed.

The P6 microarchitecture in its PIII incarnation was Intel’s horse in this race, and that basic design eventually reached the 1 GHz mark shortly after AMD’s

Athlon. Thus a microarchitecture that started out at 150 MHz eventually

carried
x
86 beyond 1 GHz and into the lucrative server and workstation

markets that RISC architectures had traditionally dominated.

The gigahertz race had a profound effect not only on the commodity PC

market but also on the Pentium line itself, insofar as the next chip to bear

the Pentium name—the Pentium 4—bore the marks of the gigahertz race

stamped into its very architecture. If Intel learned anything in those last few

years of the P6’s life, it learned that clock speed sells, and it kept that lesson foremost in its mind when it designed the Pentium 4’s NetBurst microarchitecture. (For more on the Pentium 4, see Chapters 7 and 8.)

110

Chapter 5

P O W E R P C P R O C E S S O R S :

6 0 0 S E R I E S , 7 0 0 S E R I E S ,

A N D 7 4 0 0

Now that you’ve been introduced to the first half

of Intel’s Pentium line in the previous chapter, this

chapter will focus on the origins and development

of another popular family of microprocessors: the

PowerPC (or PPC) line of processors produced from

the joint efforts of Apple, IBM, and Motorola. Because

the PowerPC family of processors is extremely large and can be found in an

array of applications that ranges from mainframes to desktop PCs to routers

to game consoles, this chapter’s coverage of PowerPC will present only a

small and limited sample of the processors that implement the PowerPC

ISA. Specifically, this chapter will focus exclusively on a subset of the PowerPC

chips that have been shipped in Apple products, because these chips are

the most directly comparable to the Pentium line in that they’re aimed

at the “personal computer” market.

A Brief History of PowerPC

The PowerPC architecture has its roots in two separate architectures. The first

of these is an architecture called POWER (Performance Optimization With

Enhanced RISC), IBM’s RISC architecture developed for use in mainframes

and servers. The second is Motorola’s 68000 (aka the 68K) processor, which

prior to PowerPC, formed the core of Apple’s desktop computing line.

To make a long story very short, IBM needed a way to turn POWER

into a wider range of computing products for use outside the server closet,

Motorola needed a high-end RISC microprocessor in order to compete in the

RISC workstation market, and Apple needed a CPU for its personal computers

that would be both cutting-edge and backward compatible with the 68K.

Thus the AIM (Apple, IBM, Motorola) alliance was born, and with it

Other books

The Amber Road by Harry Sidebottom
Saving Forever - Part 3 by Lexy Timms, B+r Publishing, Book Cover By Design
Wicked Games by Samanthe Beck
Aeroparts Factory by Paul Kater
In Case We're Separated by Alice Mattison
If Not For You by Jennifer Rose
Theodosia and the Staff of Osiris-Theo 2 by R. L. Lafevers, Yoko Tanaka