-
Notifications
You must be signed in to change notification settings - Fork 1
BOOM
Chipyard vs boom-template. Chipyard is not well documented yet, but boom-template is being deprecated. I am currently using boom-template.
- Edit boom source code with intellij
- Compile, eloborate and build verilator simulator with the Makefiles. make debug inside the verisim folder generates the verilator executable that also generates waveform output.
- Run code on the verilator executable
- Analyze waveform outputs with gtkwave
- => 1.
The main file is core.scala. Here all the modules are instantiated and their ifaces are connected. It is, however, somewhat counterintuitive as big bundles are passed to many ifaces, but in reality only a few wires are being used and they rely on the compilation tool to optimize away the unused ones.
The ROB is a Physical Register File design. This is not like the data-in-ROB design that we find in e.g. Tomasulos algo. It means that all the data resides in the Register File. The RF holds many more registers than the ISA dictates and we keep a Rename Map Table that contains the information about both the committed and the speculative state.
When an instr enters the Rename Stage, first the logical source operands. rd, rs1, rs2 are translated by the map table. prs1 and prs2 are directly read from the map table while pdst is read from the freelist. The lrd (logical dest reg) is looked up in the map table and this is called the stale physical destination specified (stale pdst). When the instruction commits the stale pdst is returned to the Free List.
The ROB is a bit tricky. The enquing iface is simple. It is singular. But the WB iface is more complicated we have:
- wb_resps (writeback from the execution unit with the result of an instruction)
- lsu_clr_unsafe (writeback from LSU that a load has been executed and can be committed at hoROB
- lsu_clr_bsy (Writeback from LSU that Store is done)
- lxcpt load exception, probably memory ordering error
- bxcpt branch exception from branch unit
- brinfo branch misspeculations (from exe unit I think)
The ROB outputs a io.commit which is both for commit and rollback. It is being sent to many other modules. However one of the key places is the rename stages.
LSU has multiple interfaces to multiple parts of the pipeline.
- When a load/store arrives in the Decode Stage a place in the LAQ/SAQ is already allocated.
- When the LSU is full this signal is sent to the Decode Stage to stall the pipeline
- The addresses for load/store is calculated by the EXE stage and the result is passed into the LSU.
- The ReleaseQueue has a configurable number of ports that can flip the is_shadowed bit of the LAQ
- Setting the bit is done in the same CC as the LAQ entry is allocated
- Unsetting the bit is done when a wb to the ROB passes through the Shadow Buffer and then an instruction shadowing the load gets marked as not speculative.
- When loads/stores get to Decode stage an entry is allocated in the LAQ/SAQ
- When the address is calculated in the EXE stage this address is: 2a. Translated by the TLB 2b. Loads => sent to D$ if shadowed => put to sleep 2c. Stores => put on SAQ queue
- TLB misses are put to sleep and woken up later to retry the translation
- Loads are fired to D$ before we know if we have memory ordering conflicts. Those loads are then killed in the next CC
The Branch Unit deals with branch predictions and mispredictions. Most modules have an interface to the branch unit to kill mispredicted instructions. Our Shadow Buffer also implemented this
The Eager Delay for hiding speculative loads is an improvement of the Naive Delay method. The Naive Delay only allows loads to fire to D$ when they are at the head of the ROB. I.e. no loads can fire out of order. The Eager delay improves upon this, we will stall any load until it is not speculative, i.e. stall the load until it is not "shadowed" by another instruction. It requires the following components
The Shadow Buffer stores the shadow-casting instructions in-order. If the Shadow Buffer is not-empty, it means that the load is currently speculative (this could change by the time the address of the load is calculated). When a instruction write-backs to the ROB the SB also updates.
If a load passes through the Decode stage and the SB is NOT empty, we will make a new entry in the Release Queue. The Release Queue maps LAQ => SB_idx. The RQ will also signal the LSU that the incoming load allocation is shadowed. When an instruction is marked as not-speculative anymore in the SB it will signal the RQ which will, in turn, signal the LSU of any loads that now safely can be fired off to D$
The SB is really an extension of the ROB which keeps track of all the instructions in the pipeline, in-order.
Clean the Writeback interfaces: SB:
- exe_resp wb goes via ROB. But branches that WB here doesnt really make any sense
- branch_xcpt via ROB: I dont even know exactly what this is
- branch mispredict goes directly from BRU->SB. Kills branches and its decendants
RQ:
- SB writes to RQ 1 instr per CC. RQ does not iface to any BRU or EXU.
- Num WritePorts on RQ should be a config param
- When a killed branch is written back to RQ he just drops that instr
LSU:
- LSU is also directly ifaced to BRU
- Needs WB from RQ to flip shadow bit inorder to fire load to mem
NBNB Eager Delay doesnt support superscalar becuase: ROB kill => SB is scalar
- Keeps the following bits per instruction: Data (True/False), Valid, was_killed, uop(The Micro)
- Has interface to Branch Unit to deal with branch misspeculation (can be moved to only iface ROB maybe)
- Has only a single commit port towards the Release Queue. Upon Branch mispredict or jmp xcpt the "was_killed" bit is flipped. But is only propagated to the RQ when its at the Head of the SB. This is to simplify SB head/tail and commit logic. We dont want to commit anything from the SB out-of-order. This could also be a mistake.
- Has a 2 commit ports to the LSU where it sets the is_shadowed bit to 0 when the SB is invalidated.
The LSU will not retry TLB on a shadowed load, it will also not wakeup a shadowed load. Could this cause a deadlock?