Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture (101 page)

Read Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture Online

Authors: jon stokes

Tags: #Computers, #Systems Architecture, #General, #Microprocessors

BOOK: Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture
10.43Mb size Format: txt, pdf, ePub

relationship with program exe-

extensions, 70

cution time, 52–53

hardware implementation of, 70

decoding unit, on Pentium Pro,

history of, 71–73,
72

106
, 106–107

microarchitecture and, 69–74

282

INDEX

instruction set translation, on

microarchitecture,
255

Pentium Pro, 103, 105

pipeline, 258

integer ALUs, on Pentium, 87–88

vector processing, 262–264

integer execution units (IUs), 68,

Intel Core Duo/Solo, 235, 247–254

163–165

back end, 259

on Core 2 Duo, 260

features, 247

on G4e, 163–164

fetch buffer, 239

issue queue for, 210,
211

floating-point control word

on Pentium, 69

(FPCW), 252

on PowerPC 601, 115

floor plan of,
250

on PowerPC 604, 125

in historical context, 254

integer instructions

integer division, 253

and default integer size, 191

loop detector, 252

PowerPC 970 performance of, 203

micro-ops fusion in, 251–252

integer pipeline, 84

multi-core processor on, 247–250

integers, 66

Streaming SIMD Extensions

on 32-bit vs. 64-bit processors, 183

(SSE), 252

division on Core Duo, 253

Virtualization Technology, 253

Intel

Intel P6 microarchitecture, 93.
See

code names and brand names,

also
Intel Pentium Pro

236–237

fetch and decode hardware,
240

Itanium Processor family, 180

Intel Pentium, 62,
68
, 80–93

MMX (Multimedia Extensions),

back end, 87–91

70, 108, 174

floating-point ALUs, 88–91

project names, 138

integer ALUs, 87–88

relative frequencies of

basic microarchitecture,
80

processors, 139

branch unit and branch

Intel 4004, 62

prediction, 85–87

Intel 8080, 62

caches, 81–82

Intel 8086, 74, 187

features, 80

Intel Celeron processor, 222

floating-point unit in, 69

Intel Core 2 Duo, 254–258

in historical context, 92–93

back end, 258–270,
260

integer unit in, 69

floating-point execution units

level 1 cache, 80

(FPUs), 260–262

pipeline, 82–85,
83

integer execution units

stages, 84–85

(IUs), 260

static scheduling,
95

decode phase of instruction,
256
,

x
86 overhead on, 91–92

257–258

Intel Pentium II

double-precision floating-point

back end,
108

operations on, 261

features, 93

features, 254

Intel Pentium III, 109

fetch phase of instruction,
256
,

back end,
110
,
259

256–257

bottleneck potential, 258

floating-point instruction

features, 93

throughput, 261

floating-point instruction

in historical context, 270

throughput, 261

memory disambiguation,

floating-point vector

264–270

processing in, 108

INDEX

283

Intel Pentium 4, 74, 110, 138–140

reorder buffer, 99–100

approach to performance,
143

reservation station (RS),

architecture,
148
, 148–154

98–99, 100

branch prediction, 147–148

Intel Technology Journal
, 175

critical execution path,
152

Intel
x
86 hardware, 70

features, 138

inter-element arithmetic and non-

floating-point execution unit

arithmetic operations,

(FPU) for, 167–168

172–173

floating-point instruction

internal operations (IOPs), 196

throughput, 261

intra-element arithmetic and non-

vs. G4e, 137

arithmetic operations,

general approaches and design

171–172

philosophy, 141–144

I/O (input-output) unit, 26

instruction window, 159

IOPs (internal operations), 196

integer execution units (IUs),

ISA.
See
instruction set architecture

163, 164–165

(ISA)

internal instruction format, 197

issue buffer, 96

pipeline, 155–159

issue buffer/execution unit avail-

vector unit, 176

ability rule, 127

Intel Pentium D, 249

issue phase

Intel Pentium M, 235, 239–246

on Pentium Pro, 96–98

back end, 259

on PowerPC 604, 126–127

branch prediction, 244–245

issue ports, on Pentium 4, 157–158

decode phase, 240–244,
241
,
242

issue queues

features, 239

for branch unit and condition

fetch phase, 239–240

register, 210–211

floating-point instruction

for G4e, 146

throughput, 261

for integer and load-store execu-

floor plan of,
248

tion units, 210,
211

pipeline and back end, 246

on PowerPC 970, 199

stack execution unit on, 246

vector, 211,
212

versions, 236

vector logical, 207

Intel Pentium Pro, 93–109,
94

issue stage

back end, 102–103,
103
,
259

for G4e, 146

branch prediction in, 102

for Pentium 4, 157–158

cost of legacy
x
86 support on, 107

issuing, 96

decoupling front end from

Itanium Processor family, 180

back end, 94–100

IU.
See
integer execution units (IUs)

features, 93

J

floating-point unit in, 103

in historical context, 107–109

jumpn instruction, 32

instruction set translation, 103

jumpo instruction, 32

instruction window, 100

jumpz instruction, 31

issue phase, 96–98

level 1 cache, 107

K

microarchitecture’s instruction

Katmai, 109

decoding unit,
106
, 106–107

processor, 175

pipeline, 100–102

kernels, 221

284

INDEX

L

steps to execute, 265, 266

L1 cache.
See
level 1 cache

translating into fused micro-ops,

L2 cache.

242, 243

See
level 2 cache

load port, on Pentium 4, 157

L3 cache, 81

load-store units (LSUs), 17

labels, and branch instructions,

issue queue for, 210,
211

33–34

on Pentium, 69

laminated micro-op, 251

on PowerPC 603 and 603e, 121

laptop (portable) computers, 237

on PowerPC 604, 125

latency of instruction

on PowerPC 970, 203–205

for Pentium 4 SIMD

locality of reference, 220–223

instructions, 176

for floating-point code, 166

pipeline stalls and, 57–58

for integer-intensive

for PowerPC 970 integer unit, 202

applications, 165

for string instructions, 105

logical issue queues, 207, 210

on superscalar processors,

logical operations, 12, 67

117–118

as intra-element non-arithmetic

tag RAM and, 224

operations, 171

leakage current, from idle

long mode, in
x
86-64, 190, 191

transistor, 238

loop detector

least recently used (LRU) block,

on Core Duo, 252

and cache replacement

on Pentium M, 244–245

policy, 230

LRU (least recently used) block,

legacy mode, in
x
86-64, 189,
190
, 191

and cache replacement

level 1 cache, 81, 217–218

policy, 230

vs. other data storage, 217

LSU.
See
load-store units (LSUs)

on Pentium, 80

on Pentium II, 108

M

on Pentium 4, 149

on PowerPC 601, 115

machine instructions, 72

splitting, 223

machine language, 19–25

level 2 cache, 81, 218

on DLW-1, 20–21

vs. other data storage, 217

translating program into, 25

for Pentium III, 109

use in early computing, 26

level 3 cache, 81

machine language format, for

lines, 1

register-relative load, 24

load address unit, on P6 back

machine language instruction, 20

end, 102

macro-fusion, 255, 257

load balancing, on PowerPC 970,

main memory, 9

212–213

mapping

load hoisting, 268

direct, 225–226,
226

loading, operating system, 34

fully associative, 224,
225

n
-way set associative, 226–230,
227

load instruction, 11, 15, 23–24

branch instruction as special

Mark I (Harvard), 81

type, 32–33

Mauchly, John, 6
n

micro-ops for, 267

maximum theoretical completion

programmer and control of, 104

rate, 52–53

register-relative address, 24

maximum theoretical instruction

throughput, 54

INDEX

285

media applications

micro-ops.
See
micro-operations

cache pollution by, 222

microprocessor, 1

vector computing for, 168

clock cycle, 44

memory

instruction completion per,

access to contents, 14

53–54

address, storage by memory

vs. memory and bus clock

cell, 16

cycles, 216

aliasing, 265, 266–267

in pipelined processor, 47

bus, 9

core microarchitecture of, 248

disambiguation on Core 2 Duo,

errors from exceeding dynamic

264–270

range, 184–185

for floating-point

hard-wired to fetch first

performance, 168

instruction, 34

hierarchy on computer,
82

increasing number of instruc-

instruction format, 13

tions per time period, 43

lifecycle of access instruction,

interface to, 26

265, 266

microarchitecture vs.

micro-op queue, 156

implementations, 248

vs. other data storage, 217

non-pipelined, 43–45,
44

ports, on Pentium 4, 157

pipelined, 45–48

RAM, 8–10

three-step sequence, 8

reorder buffer, 265

millicoded instruction, 197, 198

rules for, 268

MIPS, 73

scheduler, on Pentium 4, 156

MMX (Multimedia Extensions), 70,

speed of, 81

108, 174

memory-access instructions, 11, 12

mnemonics, 20

binary encoding, 23–25

mode bit, 21

memory-access units, 69

Moore’s Curves, 92, 93

branch unit as, 85

μops.
See
micro-operations

memory-to-memory format arith-

Motorola.
See
PowerPC (PPC)

metic instructions, 103–104

Motorola 68000 processor, 112

Merom, 255

Motorola AltiVec, 70, 135, 169–170

micro-operations (micro-ops; μops;

development, 207

uops), 106, 149

G4e units, 206–207

fusion, 240–244

vector operations, 170–173

in Core Duo, 251–252

inter-element arithmetic and

in Pentium 4, 150

non-arithmetic operations,

queue, 106

172–173

memory, 156

intra-element arithmetic and

on Pentium 4, 155

non-arithmetic operations,

schedulers for, 156–157

171–172

microarchitecture, and ISA, 69–74

Motorola G4, vector instruction

microcode engine, 72, 73

latencies on, 208

microcode programs, 72

Motorola G4e

microcode ROM, 72

architecture and pipeline,

in Pentium, 85, 92

144–147

in Pentium 4, 154

branch prediction, 147–148

caches, 194

286

INDEX

dedicated integer hardware for

O

address calculations, 204

offset, 17

floating-point execution unit

on-die cache, 82

(FPU) for, 166–167

opcodes, 19–25

general approaches and design

operands, 12

philosophy, 141–144

formats, 161–162

integer execution units (IUs),

operating system, loading, 34

163–164

operations, 11

integer hardware, 203

out-of-order execution, 96

microarchitecture,
144

output, 2

vs. Pentium 4, 137

overflow of register, 32, 184

performance approach,
142

overhead cost, for pipeline, 60

pipeline stages, 145–147

overheating, 238

vector instruction

latencies on, 208

P

Motorola G5.
See
PowerPC (PPC)

970 (G5)

packed floating-point addition

Motorola MPC7400.
See
PowerPC

(PFADD), 260

(PPC) 7400 (G4)

page file, on hard drive, 218

Motorola MPC7450, 138

Other books

Deadly Holidays by Alexa Grace
Gossamyr by Michele Hauf
The Oxygen Murder by Camille Minichino
The Warlock's Companion by Christopher Stasheff
Captured 3 by Lorhainne Eckhart
Horse Tradin' by Ben K. Green
Strictly For Cash by James Hadley Chase
Piranha by Clive Cussler