Home > Branch Hazards


Branch Hazards

Branch Back Taken:

In the following program the reaction of Niloofar 1 regarding a branch instruction is elaborated. This is branch back that has to be taken. The prediction unit of Niloofar 1 is a very simple one. It will predict every branch back as taken.

HazBBT1.s




I00
I01
I02
I03

I04
I05
I06
I07
I08
I09
I0a

I0b
I0d

# Testing branch hazard in Niloofar 1
# Branch Back, Taken

lp1:    addi r3, r0, 3
        addi r5, r0, 5
        addi r6, r0, 6
        beq  r1, r2, lp1

        nop
        nop
        nop
        nop
        nop
        nop
        nop

hang:   movi r7, hang
        jalr r7, r7

In the second cycle, Just along with the fetch phase of the branch instruction, the prediction unit decodes it and realizes that it is a branch back. It will activate the branchback bit and will place the target address into backpc.

The simulation of HazBBT1.s, CurTime = 2

#
# Timings:  CurTime = 2  InstCnt = 0  nTCurInstNo = 4  nTCurInstPos = 4 nCurToQIBPos = 2
#
# I##: [00][01][02]  qid pc instruction
# I00: [  ][Fe][iQ] . 0  00 ADDI r3, r0, 0x0003
# I01: [  ][Fe][iQ] . 1  01 ADDI r5, r0, 0x0005
# I02: [  ][  ][Fe] . x  02 ADDI r6, r0, 0x0006
# I03: [  ][  ][Fe] . x  03 BEQ  r1, r2, 0xfffc
# I04: [  ][  ][  ]<- x  xx      rx, rx, 0xxxxx
# I05: [  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I06: [  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I07: [  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I08: [  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I09: [  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I0a: [  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I0b: [  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I0c: [  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I0d: [  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I0e: [  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I0f: [  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I10: [  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I11: [  ][  ][  ] . x  xx      rx, rx, 0xxxxx
#
# reg#   value v src    TIME 301   pc0=04  pc1=05
#   0 :  0000  1 xx
#   1 :  0000  1 xx
#   2 :  0000  1 xx
#   3 :  0000  0 00
#   4 :  0000  1 xx
#   5 :  0000  0 01
#   6 :  0000  1 xx
#   7 :  0000  1 xx
# Fetch Buffers:   branchback=1 backpc=00
#      fbA: v=0 instr=2c03 pc=00     -> fbC: v=1 instr=3806 pc=02
#      fbB: v=0 instr=3405 pc=01     -> fbD: v=1 instr=c57c pc=03
# IQ   : v d o g res  op bt tgt  a1  (v/s )  a2  (v/s )   im   pc  exc it I F S Mr Mi
#  H  0: 1 0 0 0 0000 1  0   3  0000 (1/xx) 0000 (1/xx)  0003  00   0   a 1 0 x  0  0
#     1: 1 0 0 0 0000 1  0   5  0000 (1/xx) 0000 (1/00)  0005  01   0   a 1 0 x  0  0
#   T 2: 0 x x x xxxx x  x   x  xxxx (x/xx) xxxx (x/xx)  xxxx  xx   x   ` 0 0 0  0  0
#     3: 0 x x x xxxx x  x   x  xxxx (x/xx) xxxx (x/xx)  xxxx  xx   x   ` 0 0 0  0  0
#     4: 0 x x x xxxx x  x   x  xxxx (x/xx) xxxx (x/xx)  xxxx  xx   x   ` 0 0 0  0  0
#     5: 0 x x x xxxx x  x   x  xxxx (x/xx) xxxx (x/xx)  xxxx  xx   x   ` 0 0 0  0  0
#     6: 0 x x x xxxx x  x   x  xxxx (x/xx) xxxx (x/xx)  xxxx  xx   x   ` 0 0 0  0  0
#     7: 0 x x x xxxx x  x   x  xxxx (x/xx) xxxx (x/xx)  xxxx  xx   x   ` 0 0 0  0  0
# ALU Inputs:  0: avail=1 data=0 id=xx [`]op=x a1=xxxx a2=xxxx im=xxxx bt=x sid=xx, pc=xx
#              1: avail=1 data=0 id=xx [`]op=x a1=xxxx a2=xxxx im=xxxx bt=x sid=xx, pc=xx
# ALU Outputs: 0: v=0 bus=xxxx id=xx bmiss=0 misspc=xx missid=xx
#              1: v=0 bus=xxxx id=xx bmiss=0 misspc=xx missid=xx
# DRAM Status:  OUT: v=0 data=xxxx id=xx exc=x
#               dram#: status addr data []op tgt id
#                  0 :   0     xx  xxxx [`]x  x  xx
#                  1 :   0     xx  xxxx [`]x  x  xx
#                  2 :   0     xx  xxxx [`]x  x  xx
# Commit Buses:  0: v=0 id=00 data=0000 tgt=03
#                1: v=0 id=01 data=0000 tgt=05
# Memory   :    0    1    2    3    4    5    6    7    8    9    A    B    C    D    E    F
# 00000000 : 2c03 3405 3806 c57c 0000 0000 0000 0000 0000 0000 0000 7c00 3f8b ff80 0000 0000
# 00000001 : 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
#

The branchback bit will stall the fetch unit for one cycle and the target address will be placed in the fetch unit's program counter.

The simulation of HazBBT1.s, CurTime = 3

#
# Timings:  CurTime = 3  InstCnt = 0  nTCurInstNo = 4  nTCurInstPos = 4 nCurToQIBPos = 4
#
# I##: [00][01][02][03]  qid pc instruction
# I00: [  ][Fe][iQ][ID] . 0  00 ADDI r3, r0, 0x0003
# I01: [  ][Fe][iQ][ID] . 1  01 ADDI r5, r0, 0x0005
# I02: [  ][  ][Fe][iQ] . 2  02 ADDI r6, r0, 0x0006
# I03: [  ][  ][Fe][iQ] . 3  03 BEQ  r1, r2, 0xfffc
# I04: [  ][  ][  ][  ]<- x  xx      rx, rx, 0xxxxx
# I05: [  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I06: [  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I07: [  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I08: [  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I09: [  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I0a: [  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I0b: [  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I0c: [  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I0d: [  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I0e: [  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I0f: [  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I10: [  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I11: [  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
#
# reg#   value v src    TIME 401   pc0=00  pc1=01
#   0 :  0000  1 xx
#   1 :  0000  1 xx
#   2 :  0000  1 xx
#   3 :  0000  0 00
#   4 :  0000  1 xx
#   5 :  0000  0 01
#   6 :  0000  0 02
#   7 :  0000  1 xx
# Fetch Buffers:   branchback=0 backpc=07
#   -> fbA: v=0 instr=2c03 pc=00        fbC: v=0 instr=3806 pc=02
#   -> fbB: v=0 instr=3405 pc=01        fbD: v=0 instr=c57c pc=03
# IQ   : v d o g res  op bt tgt  a1  (v/s )  a2  (v/s )   im   pc  exc it I F S Mr Mi
#  H  0: 1 0 1 0 0000 1  0   3  0000 (1/xx) 0000 (1/xx)  0003  00   0   a 0 0 1  0  0
#     1: 1 0 1 0 0000 1  0   5  0000 (1/xx) 0000 (1/00)  0005  01   0   a 0 0 1  0  0
#     2: 1 0 0 0 0000 1  0   6  0000 (1/xx) 0000 (1/xx)  0006  02   0   a 1 0 1  0  0
#     3: 1 0 0 0 0000 6  1   0  0000 (1/xx) 0000 (1/xx)  fffc  03   0   b 1 0 0  0  0
#   T 4: 0 x x x xxxx x  x   x  xxxx (x/xx) xxxx (x/xx)  xxxx  xx   x   ` 0 0 0  0  0
#     5: 0 x x x xxxx x  x   x  xxxx (x/xx) xxxx (x/xx)  xxxx  xx   x   ` 0 0 0  0  0
#     6: 0 x x x xxxx x  x   x  xxxx (x/xx) xxxx (x/xx)  xxxx  xx   x   ` 0 0 0  0  0
#     7: 0 x x x xxxx x  x   x  xxxx (x/xx) xxxx (x/xx)  xxxx  xx   x   ` 0 0 0  0  0
# ALU Inputs:  0: avail=1 data=1 id=00 [a]op=0 a1=0000 a2=0003 im=0003 bt=0 sid=00, pc=00
#              1: avail=1 data=1 id=01 [a]op=0 a1=0000 a2=0005 im=0005 bt=0 sid=01, pc=01
# ALU Outputs: 0: v=1 bus=0003 id=00 bmiss=0 misspc=04 missid=00
#              1: v=1 bus=0005 id=01 bmiss=0 misspc=07 missid=01
# DRAM Status:  OUT: v=0 data=xxxx id=xx exc=x
#               dram#: status addr data []op tgt id
#                  0 :   0     xx  xxxx [`]x  x  xx
#                  1 :   0     xx  xxxx [`]x  x  xx
#                  2 :   0     xx  xxxx [`]x  x  xx
# Commit Buses:  0: v=0 id=00 data=0000 tgt=03
#                1: v=0 id=01 data=0000 tgt=05
# Memory   :    0    1    2    3    4    5    6    7    8    9    A    B    C    D    E    F
# 00000000 : 2c03 3405 3806 c57c 0000 0000 0000 0000 0000 0000 0000 7c00 3f8b ff80 0000 0000
# 00000001 : 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
#

The fetch unit will follow its action using the new target address. In the end of cycle 4, when the branch instruction reaches to the execute phase, the processor finds out that the prediction was correct and the speculatively fetched instructions will follow their way along the pipeline. The following output clip, shows cycle 5, where the branch instruction is passing the execute phase.

The simulation of HazBBT1.s, CurTime = 5

#
# Timings:  CurTime = 5  InstCnt = 2  nTCurInstNo = 8  nTCurInstPos = 8 nCurToQIBPos = 6
#
# I##: [00][01][02][03][04][05]  qid pc instruction
# I00: [  ][Fe][iQ][ID][Ex][Cm] . 0  00 ADDI r3, r0, 0x0003
# I01: [  ][Fe][iQ][ID][Ex][Cm] . 1  01 ADDI r5, r0, 0x0005
# I02: [  ][  ][Fe][iQ][ID][Ex] . 2  02 ADDI r6, r0, 0x0006
# I03: [  ][  ][Fe][iQ][ID][Ex] . 3  03 BEQ  r1, r2, 0xfffc
# I04: [  ][  ][  ][  ][Fe][iQ] . 4  00 ADDI r3, r0, 0x0003
# I05: [  ][  ][  ][  ][Fe][iQ] . 5  01 ADDI r5, r0, 0x0005
# I06: [  ][  ][  ][  ][  ][Fe] . x  02 ADDI r6, r0, 0x0006
# I07: [  ][  ][  ][  ][  ][Fe] . x  03 BEQ  r1, r2, 0xfffc
# I08: [  ][  ][  ][  ][  ][  ]<- x  xx      rx, rx, 0xxxxx
# I09: [  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I0a: [  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I0b: [  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I0c: [  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I0d: [  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I0e: [  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I0f: [  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I10: [  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I11: [  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
#
# reg#   value v src    TIME 601   pc0=04  pc1=05
#   0 :  0000  1 xx
#   1 :  0000  1 xx
#   2 :  0000  1 xx
#   3 :  0003  0 04
#   4 :  0000  1 xx
#   5 :  0005  0 05
#   6 :  0000  0 02
#   7 :  0000  1 xx
# Fetch Buffers:   branchback=1 backpc=00
#      fbA: v=0 instr=2c03 pc=00     -> fbC: v=1 instr=3806 pc=02
#      fbB: v=0 instr=3405 pc=01     -> fbD: v=1 instr=c57c pc=03
# IQ   : v d o g res  op bt tgt  a1  (v/s )  a2  (v/s )   im   pc  exc it I F S Mr Mi
#     0: 0 1 0 1 0003 1  0   3  0000 (1/xx) 0000 (1/xx)  0003  00   0   a 0 0 0  0  0
#     1: 0 1 0 1 0005 1  0   5  0000 (1/xx) 0000 (1/00)  0005  01   0   a 0 0 0  0  0
#  H  2: 1 1 0 1 0006 1  0   6  0000 (1/xx) 0000 (1/xx)  0006  02   0   a 0 0 1  0  0
#     3: 1 1 0 1 0000 6  1   0  0000 (1/xx) 0000 (1/xx)  fffc  03   0   b 0 0 0  0  0
#     4: 1 0 0 0 0000 1  0   3  0000 (1/xx) 0003 (1/xx)  0003  00   0   a 1 0 1  0  0
#     5: 1 0 0 0 0000 1  0   5  0000 (1/xx) 0005 (1/00)  0005  01   0   a 1 0 1  0  0
#   T 6: 0 x x x xxxx x  x   x  xxxx (x/xx) xxxx (x/xx)  xxxx  xx   x   ` 0 0 0  0  0
#     7: 0 x x x xxxx x  x   x  xxxx (x/xx) xxxx (x/xx)  xxxx  xx   x   ` 0 0 0  0  0
# ALU Inputs:  0: avail=1 data=0 id=02 [a]op=0 a1=0000 a2=0006 im=0006 bt=0 sid=02, pc=02
#              1: avail=1 data=0 id=03 [b]op=6 a1=0000 a2=0000 im=fffc bt=1 sid=03, pc=03
# ALU Outputs: 0: v=0 bus=0006 id=02 bmiss=0 misspc=09 missid=02
#              1: v=0 bus=0000 id=03 bmiss=0 misspc=04 missid=03
# DRAM Status:  OUT: v=0 data=xxxx id=xx exc=x
#               dram#: status addr data []op tgt id
#                  0 :   0     xx  xxxx [`]x  x  xx
#                  1 :   0     xx  xxxx [`]x  x  xx
#                  2 :   0     xx  xxxx [`]x  x  xx
# Commit Buses:  0: v=1 id=02 data=0006 tgt=06
#                1: v=1 id=03 data=0000 tgt=00
# Memory   :    0    1    2    3    4    5    6    7    8    9    A    B    C    D    E    F
# 00000000 : 2c03 3405 3806 c57c 0000 0000 0000 0000 0000 0000 0000 7c00 3f8b ff80 0000 0000
# 00000001 : 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
#

Branch Back Not Taken:

Let's see what happens in the Niloofar 1 for a misprediction. In the following program, register #2 is filled with the value of 2. Register #1 is zero, so the branch instruction must be considered not taken. The static branch predictor of Niloofar 1 will consider it taken as it is a branch back.

HazBBNT1.s




I00
I01
I02
I03
I04

I05
I06
I07
I08
I09
I0a
I0b

I0c
I0e

# Testing branch hazard in Niloofar 1
# Branch Back, Not Taken

lp1:    addi r2, r0, 2
        addi r5, r0, 5
        addi r6, r0, 6
        addi r7, r0, 7
        beq  r1, r2, lp1

        nop
        nop
        nop
        nop
        nop
        nop
        nop

hang:   movi r7, hang
        jalr r7, r7

So many things may have to be described for this case. In cycle 3, the branch instruction is fetched. It is predicted taken, so the fetch unit will be stalled and in the next cycle its program counter will be filled with the target address. In cycle 5,  the branch instruction has to wait for the value of register #2 due to data dependency and its issue will be delayed one cycle. The nop command that was fetched along with branch instruction is flushed. The wrong fetches will continue as the issue of branch command is delayed. In the end of cycle 6, when the branch instruction reaches to the execute phase, the processor finds out that the prediction was not correct.

The simulation of HazBBNT1.s, CurTime = 6

#
# Timings:  CurTime = 6  InstCnt = 4  nTCurInstNo = a  nTCurInstPos = a nCurToQIBPos = a
#
# I##: [00][01][02][03][04][05][06]  qid pc instruction
# I00: [  ][Fe][iQ][ID][Ex][Cm][  ] . 0  00 ADDI r2, r0, 0x0002
# I01: [  ][Fe][iQ][ID][Ex][Cm][  ] . 1  01 ADDI r5, r0, 0x0005
# I02: [  ][  ][Fe][iQ][ID][Ex][Cm] . 2  02 ADDI r6, r0, 0x0006
# I03: [  ][  ][Fe][iQ][ID][Ex][Cm] . 3  03 ADDI r7, r0, 0x0007
# I04: [  ][  ][  ][Fe][iQ][  ][ID] . 4  04 BEQ  r1, r2, 0xfffb
# I05: [  ][  ][  ][Fe][  ][X ][  ] . x  05 ADD  r0, r0, r0 
# I06: [  ][  ][  ][  ][  ][Fe][iQ] . 5  00 ADDI r2, r0, 0x0002
# I07: [  ][  ][  ][  ][  ][Fe][iQ] . 6  01 ADDI r5, r0, 0x0005
# I08: [  ][  ][  ][  ][  ][  ][Fe] . x  02 ADDI r6, r0, 0x0006
# I09: [  ][  ][  ][  ][  ][  ][Fe] . x  03 ADDI r7, r0, 0x0007
# I0a: [  ][  ][  ][  ][  ][  ][  ]<- x  xx      rx, rx, 0xxxxx
# I0b: [  ][  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I0c: [  ][  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I0d: [  ][  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I0e: [  ][  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I0f: [  ][  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I10: [  ][  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I11: [  ][  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
#
# reg#   value v src    TIME 701   pc0=04  pc1=05
#   0 :  0000  1 xx
#   1 :  0000  1 xx
#   2 :  0002  0 05
#   3 :  0000  1 xx
#   4 :  0000  1 xx
#   5 :  0005  0 06
#   6 :  0006  1 02
#   7 :  0007  1 03
# Fetch Buffers:   branchback=0 backpc=0b
#      fbA: v=0 instr=2802 pc=00     -> fbC: v=1 instr=3806 pc=02
#      fbB: v=0 instr=3405 pc=01     -> fbD: v=1 instr=3c07 pc=03
# IQ   : v d o g res  op bt tgt  a1  (v/s )  a2  (v/s )   im   pc  exc it I F S Mr Mi
#     0: 0 1 0 1 0002 1  0   2  0000 (1/xx) 0000 (1/xx)  0002  00   0   a 0 0 0  0  0
#     1: 0 1 0 1 0005 1  0   5  0000 (1/xx) 0000 (1/00)  0005  01   0   a 0 0 0  0  0
#     2: 0 1 0 1 0006 1  0   6  0000 (1/xx) 0000 (1/xx)  0006  02   0   a 0 0 0  0  0
#     3: 0 1 0 1 0007 1  0   7  0000 (1/xx) 0000 (1/00)  0007  03   0   a 0 0 0  0  0
#  H  4: 1 0 1 0 0000 6  1   0  0002 (1/00) 0000 (1/xx)  fffb  04   0   b 0 0 0  0  0
#     5: 1 0 0 0 0000 1  0   2  0000 (1/xx) 0002 (1/00)  0002  00   0   a 1 1 0  0  0
#     6: 1 0 0 0 0000 1  0   5  0000 (1/xx) 0005 (1/00)  0005  01   0   a 1 1 0  0  0
#   T 7: 0 x x x xxxx x  x   x  xxxx (x/xx) xxxx (x/xx)  xxxx  xx   x   ` 0 0 0  0  0
# ALU Inputs:  0: avail=1 data=1 id=04 [b]op=6 a1=0002 a2=0000 im=fffb bt=1 sid=04, pc=04
#              1: avail=1 data=0 id=03 [a]op=0 a1=0000 a2=0007 im=0007 bt=0 sid=03, pc=03
# ALU Outputs: 0: v=1 bus=0002 id=04 bmiss=1 misspc=05 missid=04
#              1: v=0 bus=0007 id=03 bmiss=0 misspc=0b missid=03
# DRAM Status:  OUT: v=0 data=xxxx id=xx exc=x
#               dram#: status addr data []op tgt id
#                  0 :   0     xx  xxxx [`]x  x  xx
#                  1 :   0     xx  xxxx [`]x  x  xx
#                  2 :   0     xx  xxxx [`]x  x  xx
# Commit Buses:  0: v=0 id=04 data=0000 tgt=00
#                1: v=0 id=05 data=0000 tgt=02
# Memory   :    0    1    2    3    4    5    6    7    8    9    A    B    C    D    E    F
# 00000000 : 2802 3405 3806 3c07 c57b 0000 0000 0000 0000 0000 0000 0000 7c00 3f8c ff80 0000
# 00000001 : 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
#

 In cycle 7, the speculatively fetched instructions are flushed. The fetch unit is stalled and its program counter is updated. In the next cycle, those instructions that follow the mispredicted branch instruction will be fetched.

The simulation of HazBBNT1.s, CurTime = 8

#
# Timings:  CurTime = 8  InstCnt = 5  nTCurInstNo = c  nTCurInstPos = c nCurToQIBPos = a
#
# I##: [00][01][02][03][04][05][06][07][08]  qid pc instruction
# I00: [  ][Fe][iQ][ID][Ex][Cm][  ][  ][  ] . 0  00 ADDI r2, r0, 0x0002
# I01: [  ][Fe][iQ][ID][Ex][Cm][  ][  ][  ] . 1  01 ADDI r5, r0, 0x0005
# I02: [  ][  ][Fe][iQ][ID][Ex][Cm][  ][  ] . 2  02 ADDI r6, r0, 0x0006
# I03: [  ][  ][Fe][iQ][ID][Ex][Cm][  ][  ] . 3  03 ADDI r7, r0, 0x0007
# I04: [  ][  ][  ][Fe][iQ][  ][ID][Ex][Cm] . 4  04 BEQ  r1, r2, 0xfffb
# I05: [  ][  ][  ][Fe][  ][X ][  ][  ][  ] . x  05 ADD  r0, r0, r0 
# I06: [  ][  ][  ][  ][  ][Fe][iQ][X ][  ] . 5  00 ADDI r2, r0, 0x0002
# I07: [  ][  ][  ][  ][  ][Fe][iQ][X ][  ] . 6  01 ADDI r5, r0, 0x0005
# I08: [  ][  ][  ][  ][  ][  ][Fe][X ][  ] . x  02 ADDI r6, r0, 0x0006
# I09: [  ][  ][  ][  ][  ][  ][Fe][X ][  ] . x  03 ADDI r7, r0, 0x0007
# I0a: [  ][  ][  ][  ][  ][  ][  ][  ][Fe] . x  05 ADD  r0, r0, r0 
# I0b: [  ][  ][  ][  ][  ][  ][  ][  ][Fe] . x  06 ADD  r0, r0, r0 
# I0c: [  ][  ][  ][  ][  ][  ][  ][  ][  ]<- x  xx      rx, rx, 0xxxxx
# I0d: [  ][  ][  ][  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I0e: [  ][  ][  ][  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I0f: [  ][  ][  ][  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I10: [  ][  ][  ][  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I11: [  ][  ][  ][  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
...

There was relatively, a big misprediction penalty as five instructions were flushed. In the absence of data dependency, it would not be so deep. Consider the following program.

HazBBNT2.s




I00
I01
I02
I03
I04
I05
I06
I07

I08
I09
I0a
I0b
I0c
I0d
I0e

I10
I12

# Testing branch hazard in Niloofar 1
# Branch Back, Not Taken

lp1:    addi r2, r0, 2
        addi r3, r0, 5
        addi r4, r0, 6
        addi r5, r0, 5
        addi r6, r0, 6
        addi r7, r0, 7
        nop
        beq  r1, r2, lp1

        nop
        nop
        nop
        nop
        nop
        nop
        nop

hang:   movi r7, hang
        jalr r7, r7

There was not a data dependency. Also the branch instruction was the second instruction in a fetch set that always contains two instructions. So just two instructions were flushed due to misprediction.

The simulation of HazBBNT2.s, CurTime = 8

#
# Timings:  CurTime = 8  InstCnt = 8  nTCurInstNo = c  nTCurInstPos = c nCurToQIBPos = a
#
# I##: [00][01][02][03][04][05][06][07][08]  qid pc instruction
# I00: [  ][Fe][iQ][ID][Ex][Cm][  ][  ][  ] . 0  00 ADDI r2, r0, 0x0002
# I01: [  ][Fe][iQ][ID][Ex][Cm][  ][  ][  ] . 1  01 ADDI r3, r0, 0x0003
# I02: [  ][  ][Fe][iQ][ID][Ex][Cm][  ][  ] . 2  02 ADDI r4, r0, 0x0004
# I03: [  ][  ][Fe][iQ][ID][Ex][Cm][  ][  ] . 3  03 ADDI r5, r0, 0x0005
# I04: [  ][  ][  ][Fe][iQ][ID][Ex][Cm][  ] . 4  04 ADDI r6, r0, 0x0006
# I05: [  ][  ][  ][Fe][iQ][ID][Ex][Cm][  ] . 5  05 ADDI r7, r0, 0x0007
# I06: [  ][  ][  ][  ][Fe][iQ][ID][Ex][Cm] . 6  06 ADD  r0, r0, r0 
# I07: [  ][  ][  ][  ][Fe][iQ][ID][Ex][Cm] . 7  07 BEQ  r1, r2, 0xfff8
# I08: [  ][  ][  ][  ][  ][  ][Fe][X ][  ] . x  00 ADDI r2, r0, 0x0002
# I09: [  ][  ][  ][  ][  ][  ][Fe][X ][  ] . x  01 ADDI r3, r0, 0x0003
# I0a: [  ][  ][  ][  ][  ][  ][  ][  ][Fe] . x  08 ADD  r0, r0, r0 
# I0b: [  ][  ][  ][  ][  ][  ][  ][  ][Fe] . x  09 ADD  r0, r0, r0 
# I0c: [  ][  ][  ][  ][  ][  ][  ][  ][  ]<- x  xx      rx, rx, 0xxxxx
# I0d: [  ][  ][  ][  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I0e: [  ][  ][  ][  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I0f: [  ][  ][  ][  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I10: [  ][  ][  ][  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I11: [  ][  ][  ][  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
...

Branch Forward Taken:

In this program there is a branch forward instruction. Register #1 and Register #2 are initially zero, so the branch has to be taken.

HazBFT1.s




I00
I01
I02
I03
I04
I05
I06

I07
I08
I09
I0a
I0b
I0c

I0d
I0f

# Testing branch hazard in Niloofar 1
# Branch Forward, Taken

        beq  r1, r2, bp1
        addi r5, r0, 5
        addi r6, r0, 6
        addi r7, r0, 7
bp1:    addi r3, r0, 3
        addi r4, r0, 4

        nop
        nop
        nop
        nop
        nop
        nop
        nop

hang:   movi r7, hang
        jalr r7, r7

Niloofar 1, always considers a branch forward as not taken. In the end of cycle 3, it finds out the misprediction and after flushing the speculatively fetched instructions, follows the fetch using the branch target address.

The simulation of HazBFT1.s, CurTime = 5

#
# Timings:  CurTime = 5  InstCnt = 1  nTCurInstNo = 8  nTCurInstPos = 8 nCurToQIBPos = 6
#
# I##: [00][01][02][03][04][05]  qid pc instruction
# I00: [  ][Fe][iQ][ID][Ex][Cm] . 0  00 BEQ  r1, r2, 0x0003
# I01: [  ][Fe][iQ][ID][X ][  ] . 1  01 ADDI r5, r0, 0x0005
# I02: [  ][  ][Fe][iQ][X ][  ] . 2  02 ADDI r6, r0, 0x0006
# I03: [  ][  ][Fe][iQ][X ][  ] . 3  03 ADDI r7, r0, 0x0007
# I04: [  ][  ][  ][Fe][X ][  ] . x  04 ADDI r3, r0, 0x0003
# I05: [  ][  ][  ][Fe][X ][  ] . x  05 ADDI r4, r0, 0x0004
# I06: [  ][  ][  ][  ][  ][Fe] . x  04 ADDI r3, r0, 0x0003
# I07: [  ][  ][  ][  ][  ][Fe] . x  05 ADDI r4, r0, 0x0004
# I08: [  ][  ][  ][  ][  ][  ]<- x  xx      rx, rx, 0xxxxx
# I09: [  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I0a: [  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I0b: [  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I0c: [  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I0d: [  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I0e: [  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I0f: [  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I10: [  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I11: [  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
...

Branch Forward Not Taken:

In this program we have provided a branch forward case that has to be not taken.

HazBFNT1.s




I00
I01
I02
I03
I04
I05
I06

I07
I08
I09
I0a
I0b
I0c
I0d

I0e
I1
0

# Testing branch hazard in Niloofar 1
# Branch Forward, Not Taken

        addi r2, r0, 2
        beq  r1, r2, bp1
        addi r5, r0, 5
        addi r6, r0, 6
        addi r7, r0, 7
bp1:    addi r3, r0, 3
        addi r4, r0, 4

        nop
        nop
        nop
        nop
        nop
        nop
        nop

hang:   movi r7, hang
        jalr r7, r7

The processor will predict the branch not taken and there will not be any flush in the program timing.

The simulation of HazBFNT1.s, CurTime = 7

#
# Timings:  CurTime = 7  InstCnt = 5  nTCurInstNo = c  nTCurInstPos = c nCurToQIBPos = a
#
# I##: [00][01][02][03][04][05][06][07]  qid pc instruction
# I00: [  ][Fe][iQ][ID][Ex][Cm][  ][  ] . 0  00 ADDI r2, r0, 0x0002
# I01: [  ][Fe][iQ][  ][ID][Ex][Cm][  ] . 1  01 BEQ  r1, r2, 0x0003
# I02: [  ][  ][Fe][iQ][ID][Ex][Cm][  ] . 2  02 ADDI r5, r0, 0x0005
# I03: [  ][  ][Fe][iQ][  ][ID][Ex][Cm] . 3  03 ADDI r6, r0, 0x0006
# I04: [  ][  ][  ][Fe][iQ][ID][Ex][Cm] . 4  04 ADDI r7, r0, 0x0007
# I05: [  ][  ][  ][Fe][iQ][  ][ID][Ex] . 5  05 ADDI r3, r0, 0x0003
# I06: [  ][  ][  ][  ][Fe][iQ][ID][Ex] . 6  06 ADDI r4, r0, 0x0004
# I07: [  ][  ][  ][  ][Fe][iQ][  ][ID] . 7  07 ADD  r0, r0, r0 
# I08: [  ][  ][  ][  ][  ][Fe][iQ][  ] . 0  08 ADD  r0, r0, r0 
# I09: [  ][  ][  ][  ][  ][Fe][  ][iQ] . 1  09 ADD  r0, r0, r0 
# I0a: [  ][  ][  ][  ][  ][  ][Fe][  ] . x  0a ADD  r0, r0, r0 
# I0b: [  ][  ][  ][  ][  ][  ][Fe][  ] . x  0b ADD  r0, r0, r0 
# I0c: [  ][  ][  ][  ][  ][  ][  ][  ]<- x  xx      rx, rx, 0xxxxx
# I0d: [  ][  ][  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I0e: [  ][  ][  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I0f: [  ][  ][  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I10: [  ][  ][  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
# I11: [  ][  ][  ][  ][  ][  ][  ][  ] . x  xx      rx, rx, 0xxxxx
...

Next

Home

ISA

Phases Timing Branch Hazards Synthesis

References