

# **Architecture & Organization**

- Architecture is those attributes visible to the programmer
  - Instruction set, number of bits used for data representation, I/O mechanisms, addressing techniques.
    - e.g. Is there a multiply instruction?
- Organization is how features are implemented
  - Control signals, interfaces, memory technology.
    - e.g. Is there a hardware multiply unit or is it done by repeated addition?







# **Architecture & Organization**

- All Intel x86 family share the same basic architecture
- The IBM System/370 family share the same basic architecture
- This gives code compatibility
  - At least backwards

Organization differs between different versions







# **Structure & Function**

- Structure is the way in which components relate to each other
- Function is the operation of individual components as part of the structure







## Function

- □ All computer functions are:
  - Data processing
  - Data storage
  - Data movement and
  - Control







## **Functional view**



6







## **Structure - The CPU**



# **Structure - The Control Unit**



1 0

# **ENIAC - background**

- Electronic Numerical Integrator And Computer
- University of Pennsylvania
- Trajectory tables for weapons
- Started 1943 and Finished 1946
  Too late for war effort
  Used until 1955







## **ENIAC - details**

- Decimal (not binary)
- □ 20 accumulators of 10 digits
- Programmed manually by switches
- □ 18,000 vacuum tubes and 30 tons
- □ 15,000 sq. ft and 140 kW power consumption
- **5,000 additions per second**







# von Neumann/Turing

- □ Stored Program concept (1952)
- □ Main memory storing programs and data
- □ ALU operating on binary data
- Control unit interpreting instructions from memory and executing
- Input and output equipment operated by control unit
- Princeton Institute for Advanced Studies IAS







# Structure of von Neumann machine









## **Transistors**

- Replaced vacuum tubes
- **Smaller and Cheaper**
- Less heat dissipation
- Solid State device and Made from Silicon (Sand)
- Invented 1947 at Bell Labs
- □ William Shockley et al.







# **Transistor Based Computers**

- Second generation machines
- NCR & RCA produced small transistor machines
  - IBM 7000
  - DEC 1957
    Produced PDP-1







## **Microelectronics**

- □ Literally "small electronics"
- A computer is made up of gates, memory cells and interconnections
- These can be manufactured on a semiconductor
- e.g. silicon wafer







## **Generations of Computer**

- Vacuum tube 1946-1957
- **Transistor 1958-1964**
- □ Small scale integration 1965 on
  - Up to 100 devices on a chip
- □ Medium scale integration to 1971
  - 100 3,000 devices on a chip
- □ Large scale integration 1971-1977
  - **3,000 100,000 devices on a chip**
- □ Very large scale integration 1978 to date
  - 100,000 100,000,000 devices on a chip
- **Ultra large scale integration** 
  - Over 100,000,000 devices on a chip

## Growth in CPU Transistor Count



## **CPU Structure**

- **CPU must:** 
  - Fetch instructions
  - Interpret instructions
  - Fetch data
  - Process data
  - Write data







## **CPU With Systems Bus**



0





## **CPU Internal Structure**











- CPU must have some working space (temporary storage)
- □ Called registers
- Number and function vary between processor designs
- One of the major design decisions
- **Top level of memory hierarchy**







# **User Visible Registers**

□ General Purpose



□ Address

### Condition Codes







## **General Purpose Registers (1)**

- □ May be true general purpose
- May be restricted
- May be used for data or addressing
- Data
  - Accumulator
- AddressingSegment







## **General Purpose Registers (2)**

- □ Make them general purpose
  - Increase flexibility and programmer options
  - Increase instruction size & complexity
- Make them specialized
  - Smaller (faster) instructions
  - Less flexibility







# **How Many GP Registers?**

### □ Between 8 – 32

### □ Fewer = more memory references

### 







# How big?

- □ Large enough to hold full address
- Large enough to hold full word
- Often possible to combine two data registers
  - C programming
  - double int a;
  - Iong int a;







# **Condition Code Registers**

- Sets of individual bits
  e.g. result of last operation was zero
- Can be read (implicitly) by programs
  - e.g. Jump if zero
- Can not (usually) be set by programs







# **Control & Status Registers**

- Program Counter
- Instruction Decoding Register
- Memory Address Register
- Memory Buffer Register







# **Program Status Word**

- □ A set of bits
- Includes Condition Codes
  - Sign of last result
  - Zero
  - Carry
  - Equal
  - Overflow
  - Interrupt enable/disable
  - Supervisor







# **Example Register Organizations**

#### **Data Registers**

| D0 |  |
|----|--|
| D1 |  |
| D2 |  |
| D3 |  |
| D4 |  |
| D5 |  |
| D6 |  |
| D7 |  |

### Address Registers

| A0             |  |
|----------------|--|
| A1             |  |
| A2             |  |
| A3             |  |
| A2<br>A3<br>A4 |  |
| A5<br>A6       |  |
| A6             |  |
| A7             |  |
| A7'            |  |

### **Program Status Program Counter** Status Register

### (a) MC68000

|   | - |        |  |
|---|---|--------|--|
|   |   | ALC: N |  |
| _ |   |        |  |
|   |   |        |  |
|   |   |        |  |

### **General Registers**

| AX | Accumulator |
|----|-------------|
| BX | Base        |
| CX | Count       |
| DX | Data        |

#### Pointer & Index SP Stack Pointer BP **Base Pointer** SI Source Index DI Dest Index

| Segment |       |    |
|---------|-------|----|
| CS      | Code  |    |
| DS      | Data  |    |
| SS      | Stack | Ĵ. |
| ES      | Extra |    |

#### **Program Status**

| Instr Ptr |
|-----------|
| Flags     |

#### (b)

| 808  | 6 |  |
|------|---|--|
| 1000 |   |  |

### **General Registers**

| EAX | AX |
|-----|----|
| EBX | BX |
| ECX | CX |
| EDX | DX |

| ESP | SP |
|-----|----|
| EBP | BP |
| ESI | SI |
| EDI | DI |

### **Program Status FLAGS Register** Instruction Pointer

(c) 80386 - Pentium II





# Intel

- □ **1971 4004** 
  - First microprocessor
  - All CPU components on a single chip
  - 4 bit
- **Followed in 1972 by 8008** 
  - 8 bit
  - Both designed for specific applications
- □ **1974 8080** 
  - Intel's first general purpose microprocessor







# **Performance Mismatch**

- Processor speed increased
- Memory capacity increased
- Memory speed lags behind processor speed







# DRAM and Processor Characteristics



## **Solutions**

- Increase number of bits retrieved at one time
  - Make DRAM "wider" rather than "deeper"
- Change DRAM interface
  - Cache
- Reduce frequency of memory access
  - More complex cache and cache on chip
- Increase interconnection bandwidth
  - High speed buses







# **Pentium Evolution (1)**

- □ **8080** 
  - first general purpose microprocessor
  - 8 bit data path
  - Used in first personal computer Altair
- **8086** 
  - much more powerful
  - 16 bit
  - instruction cache, prefetch few instructions
  - 8088 (8 bit external bus) used in first IBM PC
- □ **80286** 
  - 16 Mbyte memory addressable
- □ **80386** 
  - 32 bit
  - Support for multitasking

# **Pentium Evolution (2)**

- **80486** 
  - sophisticated powerful cache and instruction pipelining
  - built in math co-processor
- Pentium
  - Superscalar
  - Multiple instructions executed in parallel
- Pentium Pro
  - Increased superscalar organization
  - Aggressive register renaming
  - branch prediction
  - data flow analysis
  - speculative execution







## Speeding it up

- Pipelining
- On board L1 & L2 cache
- Branch prediction
- Data flow analysis and
- □ **Speculative execution**







#### Cache

- □ Small amount of fast memory
- **Sits between normal main memory and CPU**
- May be located on CPU chip or module









### Two Stage Instruction Pipeline









## **Timing of Pipeline**









## **Pentium Evolution (3)**

- Pentium II
  - MMX technology
  - graphics, video & audio processing
- Pentium III
  - Additional floating point instructions for 3D graphics
- Pentium 4
  - Note Arabic rather than Roman numerals
  - Further floating point and multimedia enhancements
- Itanium
  - **64 bit**







#### **Pentium 4 Cache**

- **80386 no on chip cache**
- 80486 8k using 16 byte lines and four way set associative organization
- Pentium (all versions) two on chip L1 caches
  - Data & instructions
- Pentium 4 L1 caches
  - 8k bytes
  - 64 byte lines
  - four way set associative
- □ L2 cache
  - Feeding both L1 caches
  - 256k and 128 byte lines
  - 8 way set associative

## Pentium 4 Diagram (Simplified)









#### **Background to IA-64**

- **Pentium 4 appears to be last in x86 line**
- **Intel & Hewlett-Packard (HP) jointly developed**
- □ New architecture
  - 64 bit architecture
  - Not extension of x86
  - Not adaptation of HP 64bit RISC architecture
- **Exploits vast circuitry and high speeds**
- Systematic use of parallelism







#### **Motivation**

- □ Instruction level parallelism
  - Implicit in machine instruction
  - Not determined at run time by processor
- □ Long or very long instruction words (LIW/VLIW)
- Branch predication (not the same as branch prediction)
- □ **Speculative loading**
- Intel & HP call this Explicit Parallel Instruction Computing (EPIC)
- □ IA-64 is an instruction set architecture intended for implementation on EPIC

### Superscalar v IA-64

| Superscalar                                                                                | IA-64                                                                                        |
|--------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------|
| RISC-line instructions, one per word                                                       | RISC-line instructions bundled into groups of three                                          |
| Multiple parallel execution units                                                          | Multiple parallel execution units                                                            |
| Reorders and optimizes instruction stream at run time                                      | Reorders and optimizes instruction stream at compile time                                    |
| Branch prediction with speculative execution of one path                                   | Speculative execution along both paths of a branch                                           |
| Loads data from memory only when needed,<br>and tries to find the data in the caches first | Speculatively loads data before its needed, and still tries to find data in the caches first |







## **Why New Architecture?**

- □ Not hardware compatible with x86
- Now have tens of millions of transistors available on chip
- Could build bigger cache
  - Diminishing returns
- Add more execution units
  - Increase superscaling
  - More units makes processor "wider"
  - More logic needed to orchestrate
  - Improved branch prediction required
  - Longer pipelines required
  - At most six instructions per cycle







# **CLOSEST POINT OF APPROACH** INTRUDER Α **TCAS** B СРА **DECEMBER 2005** U.S. PAUL RUSSEL 9 VERSION

1 0









#### **Proximity Intruder**









#### **Traffic Advisory**









#### **Resolution Advisory**



















Unknown areas



## CVR/DFDR









### CVR/DFDR









#### **CVR AND DFDR**









PRESENTATION BY U.S. PAUL RUSSEL, INDIAN AIRLINES LIMITED, GENTRAL TRAINING ESTABLISHMENT, HYDERABAD.