Содержание
- 2. Background Required to Understand this Chapter Advanced Computer Architecture. Smruti R. Sarangi Chapter 4
- 3. Contents Advanced Computer Architecture. Smruti R. Sarangi Simpler Version of an OOO Processor Compiler based Techniques
- 4. Aggressive Speculation Branch prediction is one form of speculation If we detect that a branch has
- 5. Types of Aggressive Speculation Advanced Computer Architecture. Smruti R. Sarangi
- 6. Address Speculation: Predict the memory address of a load or store Predict last address scheme Use
- 7. Stride based Address Pattern Advanced Computer Architecture. Smruti R. Sarangi
- 8. Predicting the Stride Last address (A): The memory address computed the last time the instruction with
- 9. Load-Store Dependence Speculation Advanced Computer Architecture. Smruti R. Sarangi Predict a collision (same memory address) between
- 10. Collision History Table Loads show consistent behavior They are either colliding or non-colliding Advanced Computer Architecture.
- 11. Using the CHT When we compute the address of a load We access the CHT If
- 12. Store Sets Advanced Computer Architecture. Smruti R. Sarangi Explicitly remember load-store dependences PC ? Store set
- 13. Basic Idea For every load, we have an associated store set Stores that have forwarded values
- 14. Load Latency Speculation A load might hit in the L1 cache (2 cycles) or might go
- 15. Make a guess Advanced Computer Architecture. Smruti R. Sarangi For load instructions, predict if it will
- 16. Advanced Computer Architecture. Smruti R. Sarangi Constants Value prediction: Why are values predictable?
- 17. Value Predictor Advanced Computer Architecture. Smruti R. Sarangi
- 18. Using an additional predictor for confidence First, use the confidence table to find out if it
- 19. Contents Advanced Computer Architecture. Smruti R. Sarangi Simpler Version of an OOO Processor Compiler based Techniques
- 20. Replay Flushing the pipeline for every misspeculation is not a wise thing Instead, flush a part
- 21. Forward Slice of Instruction I0 Advanced Computer Architecture. Smruti R. Sarangi A forward slice contains an
- 22. Non-Selective Replay Trivial Solution: Flush the pipeline between the dispatch and execute stages Smarter Solution It
- 23. Example Let us say that instructions 2, 3, and 4 had one operand waking up in
- 24. Instruction Window Entry When an operand becomes ready, we set its timer to n Every cycle
- 25. More about Non-Selective Replay We attach the expected latency with each instruction packet as it flows
- 26. Two methods of replaying Method 1: Keep instructions that have been issued in the issue queue
- 27. Two methods of replaying - II Move the instructions to a dedicated replay queue after issue
- 28. Orphan Instructions Assume that the load instruction misses in the L1 cache The add, sub, and
- 29. Orphan Instructions - II Keep track of squashed instructions. Re-broadcast tags of orphan instructions. ? We
- 30. Delayed Selective Replay Let us now propose an idea to replay only those instructions that are
- 31. Delayed Selective Replay - II When an instruction finishes execution ? Check if its poison bit
- 32. Orphan Instructions We can always wait for the instruction to reach the head of the ROB.
- 33. Token Based Selective Replay Let us use a pattern found in most programs: Most of the
- 34. After Predicting a d-cache Miss Instructions that are predicted to miss, will have a non-deterministic execution
- 35. Structure of the Rename Table If an instruction is a token head, we save the id
- 36. While reading the rename table ... Read the tokenVecs of the source operands Merge the tokenVecs
- 37. After execution After the token head instruction completes execution, see if it took additional cycles (verification
- 38. Instructions in S2 Assume an instruction that was not predicted to miss actually misses No token
- 39. Contents Advanced Computer Architecture. Smruti R. Sarangi Simpler Version of an OOO Processor Compiler based Techniques
- 40. A Simpler Design Physical Register File (PRF) based design Advanced Computer Architecture. Smruti R. Sarangi Fast
- 41. Let us now look at a different kind of OOO processor Instead of having a physical
- 42. Changes to renaming Entry in the RAT table ROB id ROB/RF bit ROB/RF bit ? 1
- 43. Changes to Dispatch and Wakeup Each entry in the IW now stores the values of the
- 44. Changes to Wakeup, Bypass, Reg. Write and Commit We can follow the same speculative wakeup strategy
- 45. PRF based design vs ARF based design points in the PRF based design A value resides
- 46. Contents Advanced Computer Architecture. Smruti R. Sarangi Simpler Version of an OOO Processor Compiler based Techniques
- 47. Compiler based Optimizations Can the compiler optimize the code? Advanced Computer Architecture. Smruti R. Sarangi
- 48. Constant Folding Advanced Computer Architecture. Smruti R. Sarangi We can directly replace a with 10, b
- 49. Strength Reduction Advanced Computer Architecture. Smruti R. Sarangi slow fast
- 50. Common Subexpression Elimination Each line in the second example corresponds to one line of assembly code.
- 51. Dead Code Elimination Advanced Computer Architecture. Smruti R. Sarangi Dead code
- 52. Silent Stores Silent stores write the same value that is already present Advanced Computer Architecture. Smruti
- 53. Advanced Computer Architecture. Smruti R. Sarangi Loop Based Optimizations
- 54. Loop Invariant based Code Motion There is no point setting (val = 5) repeatedly. Advanced Computer
- 55. Induction Variable based Optimization Advanced Computer Architecture. Smruti R. Sarangi Original Induction variable Replace a multiply
- 56. Loop Fusion Advanced Computer Architecture. Smruti R. Sarangi Original Optimized Fuse the loops Loop fusion reduces
- 57. Loop Unrolling - I Advanced Computer Architecture. Smruti R. Sarangi Original loop Assembly code
- 58. Advanced Computer Architecture. Smruti R. Sarangi Loop Unrolling - II Advantage: fewer total instructions and specifically
- 59. Advanced Computer Architecture. Smruti R. Sarangi Software Pipelining
- 60. Advanced Computer Architecture. Smruti R. Sarangi L S I
- 61. Visualization of the Execution Process Advanced Computer Architecture. Smruti R. Sarangi We can create our loops
- 62. Can we execute instructions in this order? Advanced Computer Architecture. Smruti R. Sarangi I0 ? S1
- 63. Advantages of Software Pipelining Consider this order: I0 ? S1 ? L2 ? I1 ? S2
- 64. Different Loop Iterators: Group of 3 iterations Advanced Computer Architecture. Smruti R. Sarangi
- 65. Code with Different Loop Iterators Advanced Computer Architecture. Smruti R. Sarangi Unroll the loop 3 times
- 66. Advanced Computer Architecture. Smruti R. Sarangi If we had 32 registers, we could do this very
- 67. Epilogue and Prologue Advanced Computer Architecture. Smruti R. Sarangi
- 68. Solution without Unrolling Advanced Computer Architecture. Smruti R. Sarangi i = -1; t = B[0]; .loop
- 69. Unrolling and Mixing Advanced Computer Architecture. Smruti R. Sarangi
- 70. Contents Advanced Computer Architecture. Smruti R. Sarangi Simpler Version of an OOO Processor Compiler based Techniques
- 71. . Sounds like a promising idea … Less hardware ? less power, less complexity Modern software
- 72. VLIW Processors VLIW (Very Long Instruction Word) processors were the first designs in this space. Bundle
- 73. If Statements: Predicated Execution Use predicated execution (remember GPUs). Advanced Computer Architecture. Smruti R. Sarangi If
- 74. Curious Case of Memory Instructions We can have multiple memory instructions in a bundle The addresses
- 75. VLIW vs EPIC Advanced Computer Architecture. Smruti R. Sarangi Given that VLIW processors do not necessarily
- 76. Intel Itanium Processor Unique collaboration between Intel and HP Aim: EPIC processor Designed to leverage the
- 77. Fetch Stage Each bundle contains 3 instructions The decoupling buffer can hold 8 such bundles Advanced
- 78. Branch Predictors Itanium has four types of branch predictors Compiler directed Four special registers: Target Address
- 79. Branch Predictors – II Multi-way Branches Compilers ensure that (typically) the last instruction in a bundle
- 80. This part of the pipeline Itanium has 9 issue ports: 2 for memory, 2 for integer,
- 81. Register Remapping Stage Large 128-entry register file. Advanced Computer Architecture. Smruti R. Sarangi 32 static registers
- 82. Example: Function foo calls function bar Advanced Computer Architecture. Smruti R. Sarangi We deliberately create an
- 83. Register Stack Frame The in and local registers are preserved across function calls. The out registers
- 84. Binary Search Advanced Computer Architecture. Smruti R. Sarangi No processing done after receiving the return value.
- 85. Register Stack Frame The in and local registers are preserved across function calls. The out registers
- 86. Support for Software Pipelining and Overflows Main Problem: We run out of registers Itanium has a
- 87. High Performance Execution Engine Advanced Computer Architecture. Smruti R. Sarangi Scoreboard Simple mechanism for OOO execution
- 88. Conditions: Instruction I Advanced Computer Architecture. Smruti R. Sarangi WAW Hazards Check all the earlier entries
- 89. Conditions: II Instructions wait in the scoreboard until they are safe No hazards Advanced Computer Architecture.
- 90. Predication If we flush the pipeline upon a branch misprediction It would be quite unfair Let
- 91. Code without Predication Count the number of branch instructions. Advanced Computer Architecture. Smruti R. Sarangi /*
- 92. Predicated Instructions The comparison generates predicates (flags) po ? number is odd, pe ? number is
- 93. Advanced Computer Architecture. Smruti R. Sarangi Pipeline
- 94. Load Boosting Boost a load and some instructions that use its value to a point before
- 95. Advanced Computer Architecture. Smruti R. Sarangi A host of compiler optimizations can be used to speed
- 97. Скачать презентацию






























































































Виктор Мари Гюго "Козетта"
Компетентностный подход в учебно – воспитательном процессе – основа качества современного образования
KOMPSAT-2 Новые возможности в космической съемке высокого разрешения
Подлежащее и грамматические средства его выражения
Как прекрасен этот мир,посмотри!
Форма для исследования черт характера
Презентация День народного единства (для Молодежной палаты) (1)
Как сделать из мухи слона?
Методическая работа
Правовые основы деятельности помощников вожатых. Тема 3
Мастер-класс
ДОРОГИЕ МОИ РОДИТЕЛИ!
Znanie_i_poznanie
2010
matesha_4_klass
МУНИЦИПАЛЬНОЕ БЮДЖЕТНОЕ ДОШКОЛЬНОЕ ОБРАЗОВАТЕЛЬНОЕ УЧРЕЖДЕНИЕ «ДЕТСКИЙ САД 87 ОБЩЕРАЗВИВАЮЩЕГО ВИДА ОКТЯБРЬСКОГО РАЙОНА ГОРОДА
Методы затратного ценообразования
Собственный капитал коммерческого банка
Программа и план участия в проекте «Инновации в воспитании»
Схемы регистрации контракта (новая и текущая) в ЦИБ КХЛ
Всероссийский этап Чемпионата Ворлдскиллс Россия. Основные тезисы деловой программы
Либеральные реформы 60 -70 -х гг
Лев Николаевич Толстой (1828-1910)
Диво
Буква Я и её звуки
Строение семени
Силуэт и стиль в одежде
Презентация на тему Счёт предметов