Wednesday, May 1, 2013

PIPELINING & RISCs

Pipelining, what is it?

Pipelining and parallelism are 2 methods used to achieve concurrency.
Pipelining increases concurrency by dividing a computation into a number of steps.
Pipelining is a key implementation technique used to build fast processors that can be seen in RISC architecture. It allows the execution of multiple instructions to overlap in time.In a non-pipelined processing, by contrast, the next data/instruction is processed after the entire processing of the previous data/instruction is complete.

---Parallelism is the use of multiple resources to increase concurrency

                                         A Two stage Instruction Pipeline will look as so

-If we assume that the fetch and execute stages require the same amount of time, and
-If the computer has two hardware units, one for fetching instructions and the other for executing them (what is the implication?).
The fetch and execute operations can each be completed in one clock cycle


PIPELINE PERFORMANCE

n:number of instructions
k: stages in pipeline
t : cycletime (time in seconds needed to advance a set of instructions one stage through the pipeline)
Tk: total time for pipelining
T0 : total time without pipelining 



The formula for calculating total time for equal stages

For n instructions, with k equal stages and delay of t for each stage
Total time with no pipelining, T0 = nkt
Total time with pipelining, Tk = (k + (n-1))t

The formula for calculating speedup 
Speedup of a k-stage pipeline for n instructions :
  S   = T0 / Tk
             = nk t  / ((k+(n-1)) t
            à k (for large n)
Note: the equations for T0  and Tk  are used here where at the time for every stage is equal. If the time t is not equal, it means that each stage from 1 to k would have the time(s), like    ,    ….  
Example 1: Based on the 2 stage timing diagram
With k=2 stage, n=9:
T0 = 9*2*1 = 18, Tk=(2+(9-1)) *1=10
S= 18/10 = 1.8 times
Example 2: Based on the 6 stage timing diagram
With k=6 stage, n=9:
T0 = 9*6*1= 54, Tk=(6+(9-1))*1=14
S= 54/14 = 3.857 times
We can see from here indeed increasing the number of stages with the other values being same, speedup improves.

Example 3: Based on the 3 unequal stages example
With k=3 stage (uneven), n=4:
To = n.     = 4*(3+3+2) = 32
Tk =            + (n-1).   
   = (3+3+2)+ (3)(3)
    = 8 + 9
    = 17
S = 32/17 = 1.88 



Calculating throughput
Pipelined Throughput= n/Tk (n)
Non-pipeline Throughput = n/To (n)
Where n= total no of instructions
To (n) is the Total Time for Non-pipelining, Tk (n) is the Total Time for Pipelining
Both throughputs is in instructions/per unit time(s) 


Example 1: Based on the 6 stage timing diagram

n= 9, k=6
Non Pipelined Throughput   = 9/To (9),
       = 9/ 9*6*1
      = 9/54
       = 0.17
Pipelined Throughout   = 9/ Tk (9)
             =9/((6+(9-1))*1)
                          =9/14
                                     = 0.64
Example 2: Based on the 3 unequal stages example



With k=3 stage (uneven), n=4:
Non Pipelined Throughput   = 4/To (4)
                                            = 4/(4*(3+3+2) )
       = 4/32
                    = 0.125
Pipelined Throughout   =4/ Tk (4)
             = 4/ ((3+3+2)+ (3)(3))
              = 4/17
                                    = 0.235
in order to increase speedup the best way is to increase the number of stages.

Factors that limits the performance of pipelining
Unequal duration/delay of stages
Conditional branch instruction or interrupts. 
Ex:
Instruction 3 is a conditional branch to instruction 15
No instructions completed during time units 9-12. This is performance penalty incurred because we could not anticipate the branch
Flushing of pipeline
Pipelined operation cannot be maintained in the presence of branch or jump instructions.

There are 3 types of hazzards to limitation of pipelining
-Resource hazards : HW cannot support this combination of instructions (single person to fold and put clothes away, washer-drier)
-Data hazards: Instruction depends on result of prior instruction still in the pipeline
-Data dependencies example
A = B + C
D = E + A
C = G x H

-Control hazards: Caused by delay between the fetching of instructions and decisions about changes in control flow (branches and jumps).
How may the hazards be prevented?


INPUT OUTPUT INTERFACE

                                                                I/O INTERFACE




I/o Devices are another key element of the computer system as each module acts as an interface to the system bus or central switch and controls one or more peripheral devices. 
in other words,I/O module helps communication between the peripherals and the bus


 Why are the peripherals not connected to the system bus directly?
o    Peripherals have different ways of operations. It is not practical to ask the processor to control all different devices.
o    Data transfer rate in devices are much slower than the memory or processor. So it is not practical to link high speed bus directly to the devices.
o    Some devices might have higher speed than the bus, so it is not practical to have a direct link.
o    Peripherals may use different data formats and word lengths than the computer they are attached to.


Functions of a I/O module



I/O Operations
Three techniques are possible for I/O operation:
1.       Programmed I/O
o    Data are exchanged between the processor and the I/O module.
o    When the processor is executing a program and gets and instruction related to I/O, it issues a command to the respective I/O module.
o    The I/O module will perform the requested operation.
o    The I/O will not alert or interrupt the processor once the operation is done.
o    It is the responsibility of the processor to keep on checking the status of the I/O module to see if the operation is complete.

2.     Interrupt driven I/O
o    This method overcomes CPU waiting time.
o    The processor issues an I/O command to a module and then goes on to do some other useful work.
o    The I/O module will then interrupt the processor to request service when is ready to exchange data with the processor.
o    The processor then executes the data transfer and then resumes its processing.

3.       Direct Memory Access (DMA)
o    When large volumes of data are to be moved between memory and the peripherals, an efficient technique is Direct Memory Access (DMA).

DMA Function
o    DMA involves an additional module on the system bus.
o    The DMA module is capable of imitating the processor and taking over control of the system from the processor.
o    It needs to do this to transfer data to and from memory over the system bus.


External Interface
·         Used to connect devices to the I/O module.
·         A common characteristic of interface is whether it is serial or parallel.
·         Parallel Interface
o   Multiple lines connect the peripheral and the I/O module. So multiple bits are transferred at once.
o   Used in high speed peripherals such as hard disk, printer, scanner.
·         Serial Interface
o   There is only one line to transmit data and bits are transferred one at a time.
o   Used in slower devices such as mouse and keyboard.
  











BUSES


BUS



what is a BUS?
    A bus is a communication path way connecting two or more devices.
Take the motherboard and look at the bottom side of it, you can see lines connection to each element. That is a bus



How does a Bus work?
·         To send data to another device
o   First the module must get permission to use the bus, and then only can send the data
·         To request data from another device
o   First the module must get permission to use the bus, and then send a request signal to the other device and finally wait for the device to send the data.

for more information of the bus take a look at the mindmap below







System Bus consist of
Data Bus-   Carry data, instruction and address between main memory and ALU
      ..Control Bus-Cary signals such as interrupt, timing and acknowledgment to and from other components.
Address Bus-Used to transfer address from PC to memory.



BUS HIERARCHY



The Characteristics of a BUS




The performance of a bus can be measured by using these factors
o   Transfer Time
§  Amount of time taken for a data to be delivered in a single transaction
o   Bandwidth
§  Measures the capacity of the bus
How much can the bus send at one time








Instruction Sets:Addressing Modes and formats



Addressing Modes and Formats



Immediate Addressing


Direct Addressing








Indirect Addressing 







Register Addressing






Register Indirect Addressing

Operand is in memory cell pointed to by contents of register R
Advantages and limitations- basically same as indirect addressign
Address space limitation of the address field is overcome by having that field refer to a word-length location containing an address
Uses one less memory reference than indirect addressing



                                                    Displacement Addressing

Combines the capabilities of direct addressing and register indirect addressing
EA = A + (R)
Address field hold two values
A = base value
R = register that holds displacement
or vice versa
Most common uses:
Relative addressing
Base-register addressing








Memory







One of the types of memory is the RAM which has two types which are Dynamic and Static RAM




Another type of memory, ROM




INTRODUCTION TO CACHE

CPU requests contents of memory location
Check cache for this data
If present, get from cache (fast) -> known as Cache Hit
If not present, read required block from main memory to cache -> known as Cache Miss
Then deliver from cache to CPU
Cache includes tags to identify which block of main memory is in each cache slot



A typical view of the cache organization