Home > Branch Hazards |
Branch Hazards
|
Branch Back Taken:
In the following program the reaction of
Niloofar 1 regarding a branch instruction is elaborated. This is branch back
that has to be taken. The prediction unit of Niloofar 1 is a very simple one. It
will predict every branch back as taken.
|
HazBBT1.s |
|
I00
I01
I02
I03
I04
I05
I06
I07
I08
I09
I0a
I0b
I0d
|
# Testing branch
hazard in Niloofar 1
# Branch Back, Taken
lp1: addi
r3, r0, 3
addi
r5, r0, 5
addi
r6, r0, 6
beq r1, r2, lp1
nop
nop
nop
nop
nop
nop
nop
hang: movi r7, hang
jalr
r7, r7 |
In the second cycle, Just along with the fetch
phase of the branch instruction, the prediction unit decodes it and realizes
that it is a branch back. It will activate the
branchback bit and will place the
target address into backpc.
|
The simulation
of HazBBT1.s,
CurTime = 2 |
|
#
# Timings: CurTime = 2 InstCnt = 0 nTCurInstNo = 4 nTCurInstPos = 4
nCurToQIBPos = 2
#
# I##: [00][01][02] qid pc instruction
# I00: [ ][Fe][iQ] . 0 00 ADDI r3, r0, 0x0003
# I01: [ ][Fe][iQ] . 1 01 ADDI r5, r0, 0x0005
# I02: [ ][ ][Fe] . x 02 ADDI r6, r0, 0x0006
# I03: [ ][ ][Fe] . x 03
BEQ r1, r2, 0xfffc
# I04: [ ][ ][ ]<- x xx rx, rx, 0xxxxx
# I05: [ ][ ][ ] . x xx rx, rx, 0xxxxx
# I06: [ ][ ][ ] . x xx rx, rx, 0xxxxx
# I07: [ ][ ][ ] . x xx rx, rx, 0xxxxx
# I08: [ ][ ][ ] . x xx rx, rx, 0xxxxx
# I09: [ ][ ][ ] . x xx rx, rx, 0xxxxx
# I0a: [ ][ ][ ] . x xx rx, rx, 0xxxxx
# I0b: [ ][ ][ ] . x xx rx, rx, 0xxxxx
# I0c: [ ][ ][ ] . x xx rx, rx, 0xxxxx
# I0d: [ ][ ][ ] . x xx rx, rx, 0xxxxx
# I0e: [ ][ ][ ] . x xx rx, rx, 0xxxxx
# I0f: [ ][ ][ ] . x xx rx, rx, 0xxxxx
# I10: [ ][ ][ ] . x xx rx, rx, 0xxxxx
# I11: [ ][ ][ ] . x xx rx, rx, 0xxxxx
#
# reg# value v src TIME 301 pc0=04
pc1=05
# 0 : 0000 1 xx
# 1 : 0000 1 xx
# 2 : 0000 1 xx
# 3 : 0000 0 00
# 4 : 0000 1 xx
# 5 : 0000 0 01
# 6 : 0000 1 xx
# 7 : 0000 1 xx
# Fetch Buffers: branchback=1 backpc=00
# fbA: v=0 instr=2c03 pc=00 -> fbC: v=1 instr=3806 pc=02
# fbB: v=0 instr=3405 pc=01 -> fbD: v=1 instr=c57c pc=03
# IQ : v d o g res op bt tgt a1 (v/s ) a2 (v/s ) im pc exc it I
F S Mr Mi
# H 0: 1 0 0 0 0000 1 0 3 0000 (1/xx) 0000 (1/xx) 0003 00 0 a 1
0 x 0 0
# 1: 1 0 0 0 0000 1 0 5 0000 (1/xx) 0000 (1/00) 0005 01 0 a 1
0 x 0 0
# T 2: 0 x x x xxxx x x x xxxx (x/xx) xxxx (x/xx) xxxx xx x ` 0
0 0 0 0
# 3: 0 x x x xxxx x x x xxxx (x/xx) xxxx (x/xx) xxxx xx x ` 0
0 0 0 0
# 4: 0 x x x xxxx x x x xxxx (x/xx) xxxx (x/xx) xxxx xx x ` 0
0 0 0 0
# 5: 0 x x x xxxx x x x xxxx (x/xx) xxxx (x/xx) xxxx xx x ` 0
0 0 0 0
# 6: 0 x x x xxxx x x x xxxx (x/xx) xxxx (x/xx) xxxx xx x ` 0
0 0 0 0
# 7: 0 x x x xxxx x x x xxxx (x/xx) xxxx (x/xx) xxxx xx x ` 0
0 0 0 0
# ALU Inputs: 0: avail=1 data=0 id=xx [`]op=x a1=xxxx a2=xxxx im=xxxx bt=x
sid=xx, pc=xx
# 1: avail=1 data=0 id=xx [`]op=x a1=xxxx a2=xxxx im=xxxx bt=x
sid=xx, pc=xx
# ALU Outputs: 0: v=0 bus=xxxx id=xx bmiss=0 misspc=xx missid=xx
# 1: v=0 bus=xxxx id=xx bmiss=0 misspc=xx missid=xx
# DRAM Status: OUT: v=0 data=xxxx id=xx exc=x
# dram#: status addr data []op tgt id
# 0 : 0 xx xxxx [`]x x xx
# 1 : 0 xx xxxx [`]x x xx
# 2 : 0 xx xxxx [`]x x xx
# Commit Buses: 0: v=0 id=00 data=0000 tgt=03
# 1: v=0 id=01 data=0000 tgt=05
# Memory : 0 1 2 3 4 5 6 7 8 9 A B
C D E F
# 00000000 : 2c03 3405 3806 c57c 0000 0000 0000 0000 0000 0000 0000 7c00
3f8b ff80 0000 0000
# 00000001 : 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000
# |
The
branchback bit will stall the fetch unit for one cycle and the target
address will be placed in the fetch unit's program counter.
|
The simulation
of HazBBT1.s,
CurTime = 3 |
|
#
# Timings: CurTime = 3 InstCnt = 0 nTCurInstNo = 4 nTCurInstPos = 4
nCurToQIBPos = 4
#
# I##: [00][01][02][03] qid pc instruction
# I00: [ ][Fe][iQ][ID] . 0 00 ADDI r3, r0, 0x0003
# I01: [ ][Fe][iQ][ID] . 1 01 ADDI r5, r0, 0x0005
# I02: [ ][ ][Fe][iQ] . 2 02 ADDI r6, r0, 0x0006
# I03: [ ][ ][Fe][iQ] . 3 03 BEQ r1, r2, 0xfffc
# I04: [ ][ ][ ][ ]<- x xx rx, rx,
0xxxxx
# I05: [ ][ ][ ][ ] . x xx rx, rx,
0xxxxx
# I06: [ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I07: [ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I08: [ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I09: [ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I0a: [ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I0b: [ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I0c: [ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I0d: [ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I0e: [ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I0f: [ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I10: [ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I11: [ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
#
# reg# value v src TIME 401 pc0=00
pc1=01
# 0 : 0000 1 xx
# 1 : 0000 1 xx
# 2 : 0000 1 xx
# 3 : 0000 0 00
# 4 : 0000 1 xx
# 5 : 0000 0 01
# 6 : 0000 0 02
# 7 : 0000 1 xx
# Fetch Buffers: branchback=0 backpc=07
# -> fbA: v=0 instr=2c03 pc=00 fbC: v=0 instr=3806 pc=02
# -> fbB: v=0 instr=3405 pc=01 fbD: v=0 instr=c57c pc=03
# IQ : v d o g res op bt tgt a1 (v/s ) a2 (v/s ) im pc exc it I
F S Mr Mi
# H 0: 1 0 1 0 0000 1 0 3 0000 (1/xx) 0000 (1/xx) 0003 00 0 a 0
0 1 0 0
# 1: 1 0 1 0 0000 1 0 5 0000 (1/xx) 0000 (1/00) 0005 01 0 a 0
0 1 0 0
# 2: 1 0 0 0 0000 1 0 6 0000 (1/xx) 0000 (1/xx) 0006 02 0 a 1
0 1 0 0
# 3: 1 0 0 0 0000 6 1 0 0000 (1/xx) 0000 (1/xx) fffc 03 0 b 1
0 0 0 0
# T 4: 0 x x x xxxx x x x xxxx (x/xx) xxxx (x/xx) xxxx xx x ` 0
0 0 0 0
# 5: 0 x x x xxxx x x x xxxx (x/xx) xxxx (x/xx) xxxx xx x ` 0
0 0 0 0
# 6: 0 x x x xxxx x x x xxxx (x/xx) xxxx (x/xx) xxxx xx x ` 0
0 0 0 0
# 7: 0 x x x xxxx x x x xxxx (x/xx) xxxx (x/xx) xxxx xx x ` 0
0 0 0 0
# ALU Inputs: 0: avail=1 data=1 id=00 [a]op=0 a1=0000 a2=0003 im=0003 bt=0
sid=00, pc=00
# 1: avail=1 data=1 id=01 [a]op=0 a1=0000 a2=0005 im=0005 bt=0
sid=01, pc=01
# ALU Outputs: 0: v=1 bus=0003 id=00 bmiss=0 misspc=04 missid=00
# 1: v=1 bus=0005 id=01 bmiss=0 misspc=07 missid=01
# DRAM Status: OUT: v=0 data=xxxx id=xx exc=x
# dram#: status addr data []op tgt id
# 0 : 0 xx xxxx [`]x x xx
# 1 : 0 xx xxxx [`]x x xx
# 2 : 0 xx xxxx [`]x x xx
# Commit Buses: 0: v=0 id=00 data=0000 tgt=03
# 1: v=0 id=01 data=0000 tgt=05
# Memory : 0 1 2 3 4 5 6 7 8 9 A B
C D E F
# 00000000 : 2c03 3405 3806 c57c 0000 0000 0000 0000 0000 0000 0000 7c00
3f8b ff80 0000 0000
# 00000001 : 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000
# |
The fetch unit will follow its action using the
new target address. In the end of cycle 4, when the branch instruction reaches
to the execute phase, the processor finds out that the prediction was correct
and the speculatively fetched instructions will follow their way along the
pipeline. The following output clip, shows cycle 5, where the branch instruction
is passing the execute phase.
|
The simulation
of HazBBT1.s,
CurTime = 5 |
|
#
# Timings: CurTime = 5 InstCnt = 2 nTCurInstNo = 8 nTCurInstPos = 8
nCurToQIBPos = 6
#
# I##: [00][01][02][03][04][05] qid pc instruction
# I00: [ ][Fe][iQ][ID][Ex][Cm] . 0 00 ADDI r3, r0, 0x0003
# I01: [ ][Fe][iQ][ID][Ex][Cm] . 1 01 ADDI r5, r0, 0x0005
# I02: [ ][ ][Fe][iQ][ID][Ex] . 2 02 ADDI r6, r0, 0x0006
# I03: [ ][ ][Fe][iQ][ID][Ex] .
3 03 BEQ r1, r2,
0xfffc
# I04: [ ][ ][ ][ ][Fe][iQ] . 4 00 ADDI
r3, r0, 0x0003
# I05: [ ][ ][ ][ ][Fe][iQ] . 5 01 ADDI
r5, r0, 0x0005
# I06: [ ][ ][ ][ ][ ][Fe] . x 02 ADDI
r6, r0, 0x0006
# I07: [ ][ ][ ][ ][ ][Fe] . x 03 BEQ
r1, r2, 0xfffc
# I08: [ ][ ][ ][ ][ ][ ]<- x xx rx, rx, 0xxxxx
# I09: [ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I0a: [ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I0b: [ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I0c: [ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I0d: [ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I0e: [ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I0f: [ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I10: [ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I11: [ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
#
# reg# value v src TIME 601 pc0=04 pc1=05
# 0 : 0000 1 xx
# 1 : 0000 1 xx
# 2 : 0000 1 xx
# 3 : 0003 0 04
# 4 : 0000 1 xx
# 5 : 0005 0 05
# 6 : 0000 0 02
# 7 : 0000 1 xx
# Fetch Buffers: branchback=1 backpc=00
# fbA: v=0 instr=2c03 pc=00 -> fbC: v=1 instr=3806 pc=02
# fbB: v=0 instr=3405 pc=01 -> fbD: v=1 instr=c57c pc=03
# IQ : v d o g res op bt tgt a1 (v/s )
a2 (v/s ) im pc exc it I F S Mr Mi
# 0: 0 1 0 1 0003 1 0 3 0000 (1/xx) 0000 (1/xx) 0003 00 0 a 0
0 0 0 0
# 1: 0 1 0 1 0005 1 0 5 0000 (1/xx) 0000 (1/00) 0005 01 0 a 0
0 0 0 0
# H 2: 1 1 0 1 0006 1 0 6 0000 (1/xx) 0000 (1/xx) 0006 02 0 a 0
0 1 0 0
# 3: 1 1 0 1 0000 6
1 0 0000 (1/xx) 0000 (1/xx) fffc 03 0 b 0 0 0 0 0
# 4: 1 0 0 0 0000 1 0 3 0000 (1/xx) 0003 (1/xx) 0003 00 0 a 1
0 1 0 0
# 5: 1 0 0 0 0000 1 0 5 0000 (1/xx) 0005 (1/00) 0005 01 0 a 1
0 1 0 0
# T 6: 0 x x x xxxx x x x xxxx (x/xx) xxxx (x/xx) xxxx xx x ` 0
0 0 0 0
# 7: 0 x x x xxxx x x x xxxx (x/xx) xxxx (x/xx) xxxx xx x ` 0
0 0 0 0
# ALU Inputs: 0: avail=1 data=0 id=02 [a]op=0 a1=0000 a2=0006 im=0006 bt=0
sid=02, pc=02
# 1: avail=1 data=0 id=03
[b]op=6 a1=0000 a2=0000 im=fffc bt=1 sid=03,
pc=03
# ALU Outputs: 0: v=0 bus=0006 id=02 bmiss=0 misspc=09 missid=02
# 1: v=0 bus=0000 id=03
bmiss=0 misspc=04 missid=03
# DRAM Status: OUT: v=0 data=xxxx id=xx exc=x
# dram#: status addr data []op tgt id
# 0 : 0 xx xxxx [`]x x xx
# 1 : 0 xx xxxx [`]x x xx
# 2 : 0 xx xxxx [`]x x xx
# Commit Buses: 0: v=1 id=02 data=0006 tgt=06
# 1: v=1 id=03 data=0000 tgt=00
# Memory : 0 1 2 3 4 5 6 7 8 9 A B
C D E F
# 00000000 : 2c03 3405 3806 c57c 0000 0000 0000 0000 0000 0000 0000 7c00
3f8b ff80 0000 0000
# 00000001 : 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000
# |
Branch Back
Not Taken:
Let's see what happens in the Niloofar 1 for a
misprediction. In the following program, register #2 is filled with the value of
2. Register #1 is zero, so the branch instruction must be considered not taken.
The static branch predictor of Niloofar 1 will consider it taken as it is a
branch back.
|
HazBBNT1.s |
|
I00
I01
I02
I03
I04
I05
I06
I07
I08
I09
I0a
I0b
I0c
I0e
|
# Testing branch
hazard in Niloofar 1
# Branch Back, Not Taken
lp1:
addi r2, r0, 2
addi
r5, r0, 5
addi
r6, r0, 6
addi
r7, r0, 7
beq r1,
r2, lp1
nop
nop
nop
nop
nop
nop
nop
hang: movi r7, hang
jalr
r7, r7 |
So many things may have to be described for this
case. In cycle 3, the branch instruction is fetched. It is predicted taken, so
the fetch unit will be stalled and in the next cycle its program counter will be
filled with the target address. In cycle 5, the branch instruction has to
wait for the value of register #2 due to data dependency and its issue will be
delayed one cycle. The nop command that was fetched along with branch
instruction is flushed. The wrong fetches will continue as the issue of branch
command is delayed. In the end of cycle 6, when the branch instruction reaches
to the execute phase, the processor finds out that the prediction was not
correct.
|
The simulation
of HazBBNT1.s,
CurTime = 6 |
|
#
# Timings: CurTime = 6 InstCnt = 4 nTCurInstNo = a nTCurInstPos = a
nCurToQIBPos = a
#
# I##: [00][01][02][03][04][05][06] qid pc instruction
# I00: [ ][Fe][iQ][ID][Ex][Cm][ ] . 0 00 ADDI r2, r0, 0x0002
# I01: [ ][Fe][iQ][ID][Ex][Cm][ ] . 1 01 ADDI r5, r0, 0x0005
# I02: [ ][ ][Fe][iQ][ID][Ex][Cm] . 2 02 ADDI r6, r0, 0x0006
# I03: [ ][ ][Fe][iQ][ID][Ex][Cm] . 3 03 ADDI r7, r0, 0x0007
# I04: [ ][ ][ ][Fe][iQ][ ][ID] . 4 04 BEQ r1, r2, 0xfffb
# I05: [ ][ ][ ][Fe][ ][X ][ ] . x 05 ADD r0, r0, r0
# I06: [ ][ ][ ][ ][ ][Fe][iQ] . 5 00
ADDI r2, r0, 0x0002
# I07: [ ][ ][ ][ ][ ][Fe][iQ] . 6 01
ADDI r5, r0, 0x0005
# I08: [ ][ ][ ][ ][ ][ ][Fe] . x 02
ADDI r6, r0, 0x0006
# I09: [ ][ ][ ][ ][ ][ ][Fe] . x 03
ADDI r7, r0, 0x0007
# I0a: [ ][ ][ ][ ][ ][ ][ ]<- x xx rx, rx, 0xxxxx
# I0b: [ ][ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I0c: [ ][ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I0d: [ ][ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I0e: [ ][ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I0f: [ ][ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I10: [ ][ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I11: [ ][ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
#
# reg# value v src TIME 701 pc0=04
pc1=05
# 0 : 0000 1 xx
# 1 : 0000 1 xx
# 2 : 0002 0 05
# 3 : 0000 1 xx
# 4 : 0000 1 xx
# 5 : 0005 0 06
# 6 : 0006 1 02
# 7 : 0007 1 03
# Fetch Buffers: branchback=0 backpc=0b
# fbA: v=0 instr=2802 pc=00 -> fbC: v=1 instr=3806 pc=02
# fbB: v=0 instr=3405 pc=01 -> fbD: v=1 instr=3c07 pc=03
# IQ : v d o g res op bt tgt a1 (v/s ) a2 (v/s ) im pc exc it I
F S Mr Mi
# 0: 0 1 0 1 0002 1 0 2 0000 (1/xx) 0000 (1/xx) 0002 00 0 a 0
0 0 0 0
# 1: 0 1 0 1 0005 1 0 5 0000 (1/xx) 0000 (1/00) 0005 01 0 a 0
0 0 0 0
# 2: 0 1 0 1 0006 1 0 6 0000 (1/xx) 0000 (1/xx) 0006 02 0 a 0
0 0 0 0
# 3: 0 1 0 1 0007 1 0 7 0000 (1/xx) 0000 (1/00) 0007 03 0 a 0
0 0 0 0
# H 4: 1 0 1 0 0000 6 1 0 0002 (1/00) 0000 (1/xx) fffb 04 0 b 0
0 0 0 0
# 5: 1 0 0 0 0000 1 0 2 0000 (1/xx) 0002 (1/00) 0002 00 0 a 1
1 0 0 0
# 6: 1 0 0 0 0000 1 0 5 0000 (1/xx) 0005 (1/00) 0005 01 0 a 1
1 0 0 0
# T 7: 0 x x x xxxx x x x xxxx (x/xx) xxxx (x/xx) xxxx xx x ` 0
0 0 0 0
# ALU Inputs: 0: avail=1 data=1 id=04 [b]op=6
a1=0002 a2=0000 im=fffb bt=1 sid=04, pc=04
# 1: avail=1 data=0 id=03 [a]op=0 a1=0000 a2=0007 im=0007 bt=0
sid=03, pc=03
# ALU Outputs: 0: v=1 bus=0002 id=04
bmiss=1 misspc=05 missid=04
# 1: v=0 bus=0007 id=03 bmiss=0 misspc=0b missid=03
# DRAM Status: OUT: v=0 data=xxxx id=xx exc=x
# dram#: status addr data []op tgt id
# 0 : 0 xx xxxx [`]x x xx
# 1 : 0 xx xxxx [`]x x xx
# 2 : 0 xx xxxx [`]x x xx
# Commit Buses: 0: v=0 id=04 data=0000 tgt=00
# 1: v=0 id=05 data=0000 tgt=02
# Memory : 0 1 2 3 4 5 6 7 8 9 A B
C D E F
# 00000000 : 2802 3405 3806 3c07 c57b 0000 0000 0000 0000 0000 0000 0000
7c00 3f8c ff80 0000
# 00000001 : 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000
# |
In cycle 7, the speculatively fetched
instructions are flushed. The fetch unit is stalled and its program counter is
updated. In the next cycle, those instructions that follow the mispredicted
branch instruction will be fetched.
|
The simulation
of HazBBNT1.s,
CurTime = 8 |
|
#
# Timings: CurTime = 8 InstCnt = 5 nTCurInstNo = c nTCurInstPos = c
nCurToQIBPos = a
#
# I##: [00][01][02][03][04][05][06][07][08] qid pc instruction
# I00: [ ][Fe][iQ][ID][Ex][Cm][ ][ ][ ] . 0 00 ADDI r2, r0, 0x0002
# I01: [ ][Fe][iQ][ID][Ex][Cm][ ][ ][ ] . 1 01 ADDI r5, r0, 0x0005
# I02: [ ][ ][Fe][iQ][ID][Ex][Cm][ ][ ] . 2 02 ADDI r6, r0, 0x0006
# I03: [ ][ ][Fe][iQ][ID][Ex][Cm][ ][ ] . 3 03 ADDI r7, r0, 0x0007
# I04: [ ][ ][ ][Fe][iQ][ ][ID][Ex][Cm] . 4 04 BEQ r1, r2, 0xfffb
# I05: [ ][ ][ ][Fe][ ][X ][ ][ ][ ] . x 05 ADD r0, r0, r0
# I06: [ ][ ][ ][ ][ ][Fe][iQ][X ][ ] . 5 00 ADDI r2, r0, 0x0002
# I07: [ ][ ][ ][ ][ ][Fe][iQ][X ][ ] . 6 01 ADDI r5, r0, 0x0005
# I08: [ ][ ][ ][ ][ ][ ][Fe][X ][ ] . x 02 ADDI r6, r0, 0x0006
# I09: [ ][ ][ ][ ][ ][ ][Fe][X ][ ] . x 03 ADDI r7, r0, 0x0007
# I0a: [ ][ ][ ][ ][ ][ ][ ][ ][Fe] . x 05
ADD r0, r0, r0
# I0b: [ ][ ][ ][ ][ ][ ][ ][ ][Fe] . x 06 ADD r0, r0, r0
# I0c: [ ][ ][ ][ ][ ][ ][ ][ ][ ]<- x xx rx, rx, 0xxxxx
# I0d: [ ][ ][ ][ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I0e: [ ][ ][ ][ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I0f: [ ][ ][ ][ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I10: [ ][ ][ ][ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I11: [ ][ ][ ][ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
... |
There was relatively, a big misprediction
penalty as five instructions were flushed. In the absence of data dependency, it
would not be so deep. Consider the following program.
|
HazBBNT2.s |
|
I00
I01
I02
I03
I04
I05
I06
I07
I08
I09
I0a
I0b
I0c
I0d
I0e
I10
I12
|
# Testing branch
hazard in Niloofar 1
# Branch Back, Not Taken
lp1:
addi r2, r0, 2
addi r3,
r0, 5
addi r4,
r0, 6
addi
r5, r0, 5
addi
r6, r0, 6
addi
r7, r0, 7
nop
beq r1,
r2, lp1
nop
nop
nop
nop
nop
nop
nop
hang: movi r7, hang
jalr
r7, r7 |
There was not a data dependency. Also the branch
instruction was the second instruction in a fetch set that always contains two
instructions. So just two instructions were flushed due to misprediction.
|
The simulation
of HazBBNT2.s,
CurTime = 8 |
|
#
# Timings: CurTime = 8 InstCnt = 8 nTCurInstNo = c nTCurInstPos = c
nCurToQIBPos = a
#
# I##: [00][01][02][03][04][05][06][07][08] qid pc instruction
# I00: [ ][Fe][iQ][ID][Ex][Cm][ ][ ][ ] . 0 00 ADDI r2, r0, 0x0002
# I01: [ ][Fe][iQ][ID][Ex][Cm][ ][ ][ ] . 1 01 ADDI r3, r0, 0x0003
# I02: [ ][ ][Fe][iQ][ID][Ex][Cm][ ][ ] . 2 02 ADDI r4, r0, 0x0004
# I03: [ ][ ][Fe][iQ][ID][Ex][Cm][ ][ ] . 3 03 ADDI r5, r0, 0x0005
# I04: [ ][ ][ ][Fe][iQ][ID][Ex][Cm][ ] . 4 04 ADDI r6, r0, 0x0006
# I05: [ ][ ][ ][Fe][iQ][ID][Ex][Cm][ ] . 5 05 ADDI r7, r0, 0x0007
# I06: [ ][ ][ ][ ][Fe][iQ][ID][Ex][Cm] . 6 06 ADD r0, r0, r0
# I07: [ ][ ][ ][ ][Fe][iQ][ID][Ex][Cm] . 7 07 BEQ r1, r2, 0xfff8
# I08: [ ][ ][ ][ ][ ][ ][Fe][X ][ ] . x 00 ADDI r2, r0, 0x0002
# I09: [ ][ ][ ][ ][ ][ ][Fe][X ][ ] . x 01 ADDI r3, r0, 0x0003
# I0a: [ ][ ][ ][ ][ ][ ][ ][ ][Fe] . x 08 ADD r0, r0, r0
# I0b: [ ][ ][ ][ ][ ][ ][ ][ ][Fe] . x 09 ADD r0, r0, r0
# I0c: [ ][ ][ ][ ][ ][ ][ ][ ][ ]<- x xx rx, rx, 0xxxxx
# I0d: [ ][ ][ ][ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I0e: [ ][ ][ ][ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I0f: [ ][ ][ ][ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I10: [ ][ ][ ][ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I11: [ ][ ][ ][ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
... |
Branch Forward
Taken:
In this program there is a branch forward
instruction. Register #1 and Register #2 are initially zero, so the branch has
to be taken.
|
HazBFT1.s |
|
I00
I01
I02
I03
I04
I05
I06
I07
I08
I09
I0a
I0b
I0c
I0d
I0f
|
# Testing branch
hazard in Niloofar 1
# Branch Forward, Taken
beq r1, r2, bp1
addi
r5, r0, 5
addi
r6, r0, 6
addi
r7, r0, 7
bp1:
addi r3, r0, 3
addi
r4, r0, 4
nop
nop
nop
nop
nop
nop
nop
hang: movi r7, hang
jalr
r7, r7 |
Niloofar 1, always considers a branch forward as
not taken. In the end of cycle 3, it finds out the misprediction and after
flushing the speculatively fetched instructions, follows the fetch using the
branch target address.
|
The simulation
of HazBFT1.s,
CurTime = 5 |
|
#
# Timings: CurTime = 5 InstCnt = 1 nTCurInstNo = 8 nTCurInstPos = 8
nCurToQIBPos = 6
#
# I##: [00][01][02][03][04][05] qid pc instruction
# I00: [ ][Fe][iQ][ID][Ex][Cm] . 0 00 BEQ r1, r2, 0x0003
# I01: [ ][Fe][iQ][ID][X ][ ] . 1 01 ADDI r5, r0, 0x0005
# I02: [ ][ ][Fe][iQ][X ][ ] . 2 02 ADDI r6, r0, 0x0006
# I03: [ ][ ][Fe][iQ][X ][ ] . 3 03 ADDI r7, r0, 0x0007
# I04: [ ][ ][ ][Fe][X ][ ] . x 04 ADDI r3, r0, 0x0003
# I05: [ ][ ][ ][Fe][X ][ ] . x 05 ADDI r4, r0, 0x0004
# I06: [ ][ ][ ][ ][ ][Fe] . x 04 ADDI r3, r0, 0x0003
# I07: [ ][ ][ ][ ][ ][Fe] . x 05 ADDI r4, r0, 0x0004
# I08: [ ][ ][ ][ ][ ][ ]<- x xx rx, rx, 0xxxxx
# I09: [ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I0a: [ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I0b: [ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I0c: [ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I0d: [ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I0e: [ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I0f: [ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I10: [ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I11: [ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
... |
Branch
Forward Not Taken:
In this program we have provided a branch
forward case that has to be not taken.
|
HazBFNT1.s |
|
I00
I01
I02
I03
I04
I05
I06
I07
I08
I09
I0a
I0b
I0c
I0d
I0e
I10
|
# Testing branch
hazard in Niloofar 1
# Branch Forward, Not Taken
addi r2, r0, 2
beq r1, r2, bp1
addi
r5, r0, 5
addi
r6, r0, 6
addi
r7, r0, 7
bp1:
addi r3, r0, 3
addi
r4, r0, 4
nop
nop
nop
nop
nop
nop
nop
hang: movi r7, hang
jalr
r7, r7 |
The processor will predict the branch not taken
and there will not be any flush in the program timing.
|
The simulation
of HazBFNT1.s,
CurTime = 7 |
|
#
# Timings: CurTime = 7 InstCnt = 5 nTCurInstNo = c nTCurInstPos = c
nCurToQIBPos = a
#
# I##: [00][01][02][03][04][05][06][07] qid pc instruction
# I00: [ ][Fe][iQ][ID][Ex][Cm][ ][ ] . 0 00 ADDI r2, r0, 0x0002
# I01: [ ][Fe][iQ][ ][ID][Ex][Cm][ ] . 1 01 BEQ r1, r2, 0x0003
# I02: [ ][ ][Fe][iQ][ID][Ex][Cm][ ] . 2 02 ADDI r5, r0, 0x0005
# I03: [ ][ ][Fe][iQ][ ][ID][Ex][Cm] . 3 03 ADDI r6, r0, 0x0006
# I04: [ ][ ][ ][Fe][iQ][ID][Ex][Cm] . 4 04 ADDI r7, r0, 0x0007
# I05: [ ][ ][ ][Fe][iQ][ ][ID][Ex] . 5 05 ADDI r3, r0, 0x0003
# I06: [ ][ ][ ][ ][Fe][iQ][ID][Ex] . 6 06 ADDI r4, r0, 0x0004
# I07: [ ][ ][ ][ ][Fe][iQ][ ][ID] . 7 07 ADD r0, r0, r0
# I08: [ ][ ][ ][ ][ ][Fe][iQ][ ] . 0 08 ADD r0, r0, r0
# I09: [ ][ ][ ][ ][ ][Fe][ ][iQ] . 1 09 ADD r0, r0, r0
# I0a: [ ][ ][ ][ ][ ][ ][Fe][ ] . x 0a ADD r0, r0, r0
# I0b: [ ][ ][ ][ ][ ][ ][Fe][ ] . x 0b ADD r0, r0, r0
# I0c: [ ][ ][ ][ ][ ][ ][ ][ ]<- x xx rx, rx, 0xxxxx
# I0d: [ ][ ][ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I0e: [ ][ ][ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I0f: [ ][ ][ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I10: [ ][ ][ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
# I11: [ ][ ][ ][ ][ ][ ][ ][ ] . x xx rx, rx, 0xxxxx
... |
Next