BOOM

Development environment

Chipyard vs boom-template. Chipyard is not well documented yet, but boom-template is being deprecated. I am currently using boom-template.

Design-flow

Edit boom source code with intellij
Compile, eloborate and build verilator simulator with the Makefiles. make debug inside the verisim folder generates the verilator executable that also generates waveform output.
Run code on the verilator executable
Analyze waveform outputs with gtkwave
=> 1.

Running your own tests

Understanding the BOOM

The main file is core.scala. Here all the modules are instantiated and their ifaces are connected. It is, however, somewhat counterintuitive as big bundles are passed to many ifaces, but in reality only a few wires are being used and they rely on the compilation tool to optimize away the unused ones.

ROB

The ROB is a Physical Register File design. This is not like the data-in-ROB design that we find in e.g. Tomasulos algo. It means that all the data resides in the Register File. The RF holds many more registers than the ISA dictates and we keep a Rename Map Table that contains the information about both the committed and the speculative state.

When an instr enters the Rename Stage, first the logical source operands. rd, rs1, rs2 are translated by the map table. prs1 and prs2 are directly read from the map table while pdst is read from the freelist. The lrd (logical dest reg) is looked up in the map table and this is called the stale physical destination specified (stale pdst). When the instruction commits the stale pdst is returned to the Free List.

The ROB is a bit tricky. The enquing iface is simple. It is singular. But the WB iface is more complicated we have:

wb_resps (writeback from the execution unit with the result of an instruction)
lsu_clr_unsafe (writeback from LSU that a load has been executed and can be committed at hoROB
lsu_clr_bsy (Writeback from LSU that Store is done)
lxcpt load exception, probably memory ordering error
bxcpt branch exception from branch unit
brinfo branch misspeculations (from exe unit I think)

The ROB outputs a io.commit which is both for commit and rollback. It is being sent to many other modules. However one of the key places is the rename stages.

LSU

LSU has multiple interfaces to multiple parts of the pipeline.

Decode stage iface

When a load/store arrives in the Decode Stage a place in the LAQ/SAQ is already allocated.
When the LSU is full this signal is sent to the Decode Stage to stall the pipeline

EXE stage iface

The addresses for load/store is calculated by the EXE stage and the result is passed into the LSU.

Eager Delay iface

The ReleaseQueue has a configurable number of ports that can flip the is_shadowed bit of the LAQ
Setting the bit is done in the same CC as the LAQ entry is allocated
Unsetting the bit is done when a wb to the ROB passes through the Shadow Buffer and then an instruction shadowing the load gets marked as not speculative.

How does it work?

When loads/stores get to Decode stage an entry is allocated in the LAQ/SAQ
When the address is calculated in the EXE stage this address is: 2a. Translated by the TLB 2b. Loads => sent to D$ if shadowed => put to sleep 2c. Stores => put on SAQ queue
TLB misses are put to sleep and woken up later to retry the translation
Loads are fired to D$ before we know if we have memory ordering conflicts. Those loads are then killed in the next CC

Exceptions and flushing the pipeline

The Branch Unit deals with branch predictions and mispredictions. Most modules have an interface to the branch unit to kill mispredicted instructions. Our Shadow Buffer also implemented this

Eager Delay implementation

The Eager Delay for hiding speculative loads is an improvement of the Naive Delay method. The Naive Delay only allows loads to fire to D$ when they are at the head of the ROB. I.e. no loads can fire out of order. The Eager delay improves upon this, we will stall any load until it is not speculative, i.e. stall the load until it is not "shadowed" by another instruction. It requires the following components

Shadow Buffer

The Shadow Buffer stores the shadow-casting instructions in-order. If the Shadow Buffer is not-empty, it means that the load is currently speculative (this could change by the time the address of the load is calculated). When a instruction write-backs to the ROB the SB also updates.

Release Queue

If a load passes through the Decode stage and the SB is NOT empty, we will make a new entry in the Release Queue. The Release Queue maps LAQ => SB_idx. The RQ will also signal the LSU that the incoming load allocation is shadowed. When an instruction is marked as not-speculative anymore in the SB it will signal the RQ which will, in turn, signal the LSU of any loads that now safely can be fired off to D$

ROB

The SB is really an extension of the ROB which keeps track of all the instructions in the pipeline, in-order.

Release Queue

LSU

TODOs Eager Delay

Clean the Writeback interfaces: SB:

exe_resp wb goes via ROB. But branches that WB here doesnt really make any sense
branch_xcpt via ROB: I dont even know exactly what this is
branch mispredict goes directly from BRU->SB. Kills branches and its decendants

RQ:

SB writes to RQ 1 instr per CC. RQ does not iface to any BRU or EXU.
Num WritePorts on RQ should be a config param
When a killed branch is written back to RQ he just drops that instr

LSU:

LSU is also directly ifaced to BRU
Needs WB from RQ to flip shadow bit inorder to fire load to mem

Design choices in Eager Delay

NBNB Eager Delay doesnt support superscalar becuase: ROB kill => SB is scalar

Shadow Buffer

Keeps the following bits per instruction: Data (True/False), Valid, was_killed, uop(The Micro)
Has interface to Branch Unit to deal with branch misspeculation (can be moved to only iface ROB maybe)
Has only a single commit port towards the Release Queue. Upon Branch mispredict or jmp xcpt the "was_killed" bit is flipped. But is only propagated to the RQ when its at the Head of the SB. This is to simplify SB head/tail and commit logic. We dont want to commit anything from the SB out-of-order. This could also be a mistake.

Release Queue

Has a 2 commit ports to the LSU where it sets the is_shadowed bit to 0 when the SB is invalidated.

LSU

The LSU will not retry TLB on a shadowed load, it will also not wakeup a shadowed load. Could this cause a deadlock?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BOOM

Development environment

Design-flow

Running your own tests

Understanding the BOOM

ROB

LSU

Decode stage iface

EXE stage iface

Eager Delay iface

How does it work?

Exceptions and flushing the pipeline

Eager Delay implementation

Shadow Buffer

Release Queue

ROB

Release Queue

LSU

TODOs Eager Delay

Design choices in Eager Delay

Shadow Buffer

Release Queue

LSU

Clone this wiki locally