Ch1 - Lecture2

Содержание

Слайд 2

Chapter 1 — Computer Abstractions and Technology —

Defining Performance

Which airplane has

Chapter 1 — Computer Abstractions and Technology — Defining Performance Which airplane has the best performance?
the best performance?

Слайд 3

Chapter 1 — Computer Abstractions and Technology —

Response Time and Throughput

Response

Chapter 1 — Computer Abstractions and Technology — Response Time and Throughput
time (PC user)
How long it takes to do a task
Throughput (Datacenter manager)
Total work done per unit time
e.g., tasks/transactions/… per hour
How are response time and throughput affected by
Replacing the processor with a faster version?
Adding more processors?
We’ll focus on response time for now…

Слайд 4

Chapter 1 — Computer Abstractions and Technology —

Understanding Performance

Algorithm
Determines number of

Chapter 1 — Computer Abstractions and Technology — Understanding Performance Algorithm Determines
operations executed
Programming language, compiler, architecture
Determine number of machine instructions executed per operation
Processor and memory system
Determine how fast instructions are executed
I/O system (including OS)
Determines how fast I/O operations are executed

Слайд 5

Chapter 1 — Computer Abstractions and Technology —

Relative Performance

Define Performance =

Chapter 1 — Computer Abstractions and Technology — Relative Performance Define Performance
1/Execution Time
“X is n time faster than Y”

Example: time taken to run a program
10s on A, 15s on B
Execution TimeB / Execution TimeA = 15s / 10s = 1.5
So A is 1.5 times faster than B

Слайд 6

Chapter 1 — Computer Abstractions and Technology —

Measuring Execution Time

Elapsed time

Chapter 1 — Computer Abstractions and Technology — Measuring Execution Time Elapsed
(wall clock time, response time)
Total response time, including all aspects
Processing, I/O, OS overhead, idle time
Determines system performance
CPU time
Time spent processing a given job
Discounts I/O time, other jobs’ shares
Comprises user CPU time and system CPU time
Different programs are affected differently by CPU and system performance

Слайд 7

Chapter 1 — Computer Abstractions and Technology —

CPU Clocking

Operation of digital

Chapter 1 — Computer Abstractions and Technology — CPU Clocking Operation of
hardware governed by a constant-rate clock

Clock (cycles)

Data transfer and computation

Update state

Clock period

Clock period: duration of a clock cycle
e.g., 250ps = 0.25ns = 250×10–12s
Clock frequency (rate): cycles per second
e.g., 4.0GHz = 4000MHz = 4.0×109Hz

Слайд 8

Chapter 1 — Computer Abstractions and Technology —

CPU Time

Performance improved by
Reducing

Chapter 1 — Computer Abstractions and Technology — CPU Time Performance improved
number of clock cycles
Increasing clock rate
Hardware designer must often trade off clock rate against cycle count

A program takes 2500 clock cycles
to run on a computer with 2.5 GHz
processor. What is CPU time?

Слайд 9

Chapter 1 — Computer Abstractions and Technology —

CPU Time Example

Computer A:

Chapter 1 — Computer Abstractions and Technology — CPU Time Example Computer
2GHz clock, 10s CPU time
Designing Computer B
Aim for 6s CPU time
Can do faster clock, but causes 1.2 × clock cycles
How fast must Computer B clock be?

Слайд 10

Chapter 1 — Computer Abstractions and Technology —

Instruction Count and CPI

Instruction

Chapter 1 — Computer Abstractions and Technology — Instruction Count and CPI
Count for a program
Determined by program, ISA and compiler
Average cycles per instruction
Determined by CPU hardware
If different instructions have different CPI
Average CPI affected by instruction mix

Слайд 11

Chapter 1 — Computer Abstractions and Technology —

CPI Example

Computer A: Cycle

Chapter 1 — Computer Abstractions and Technology — CPI Example Computer A:
Time = 250ps, CPI = 2.0
Computer B: Cycle Time = 500ps, CPI = 1.2
Same ISA
Which is faster, and by how much?

A is faster…

…by this much

Слайд 12

Chapter 1 — Computer Abstractions and Technology —

CPI in More Detail

If

Chapter 1 — Computer Abstractions and Technology — CPI in More Detail
different instruction classes take different numbers of cycles

Weighted average CPI

Relative frequency

Слайд 13

Chapter 1 — Computer Abstractions and Technology —

CPI Example

Alternative compiled code

Chapter 1 — Computer Abstractions and Technology — CPI Example Alternative compiled
sequences using instructions in classes A, B, C

Which code sequence executes the most instructions?
Which will be faster?
What is the CPI for each sequence?

Слайд 14

Chapter 1 — Computer Abstractions and Technology —

CPI Example

Alternative compiled code

Chapter 1 — Computer Abstractions and Technology — CPI Example Alternative compiled
sequences using instructions in classes A, B, C

Sequence 1: IC = 5
Clock Cycles = 2×1 + 1×2 + 2×3 = 10
Avg. CPI = 10/5 = 2.0

Sequence 2: IC = 6
Clock Cycles = 4×1 + 1×2 + 1×3 = 9
Avg. CPI = 9/6 = 1.5

Слайд 15

Chapter 1 — Computer Abstractions and Technology —

Performance Summary

Performance depends on
Algorithm:

Chapter 1 — Computer Abstractions and Technology — Performance Summary Performance depends
affects IC, possibly CPI (float)
Programming language: affects IC, CPI
Compiler: affects IC, CPI
Instruction set architecture: affects IC, CPI, Tc

Слайд 16

Chapter 1 — Computer Abstractions and Technology —

Power Trends

In CMOS IC

Chapter 1 — Computer Abstractions and Technology — Power Trends In CMOS
technology

×1000

×30

5V → 1V

Слайд 17

Chapter 1 — Computer Abstractions and Technology —

Reducing Power

Suppose a new

Chapter 1 — Computer Abstractions and Technology — Reducing Power Suppose a
CPU has
85% of capacitive load of old CPU
15% voltage and 15% frequency reduction

The power wall
We can’t reduce voltage further
We can’t remove more heat
How else can we improve performance?

Слайд 18

Chapter 1 — Computer Abstractions and Technology —

Uniprocessor Performance

Constrained by power,

Chapter 1 — Computer Abstractions and Technology — Uniprocessor Performance Constrained by
instruction-level parallelism, memory latency

Слайд 19

Chapter 1 — Computer Abstractions and Technology —

Multiprocessors

Multicore microprocessors
More than one

Chapter 1 — Computer Abstractions and Technology — Multiprocessors Multicore microprocessors More
processor per chip
Requires explicitly parallel programming
Compare with instruction level parallelism
Hardware executes multiple instructions at once
Hidden from the programmer
Hard to do
Programming for performance
Load balancing
Optimizing communication and synchronization

Слайд 20

Chapter 1 — Computer Abstractions and Technology —

Manufacturing ICs

Yield: proportion of

Chapter 1 — Computer Abstractions and Technology — Manufacturing ICs Yield: proportion
working dies per wafer

Слайд 21

Chapter 1 — Computer Abstractions and Technology —

AMD Opteron X2 Wafer

X2:

Chapter 1 — Computer Abstractions and Technology — AMD Opteron X2 Wafer
300mm wafer, 117 chips, 90nm technology
X4: 45nm technology