-
Notifications
You must be signed in to change notification settings - Fork 10
2....a)Hardware:⑤KWTA
The allocation for the components in dynamic data structures, e.g. tree, are fine-grained and frequent. Therefore, even FBTA, PATA and HTA can be the performance bottleneck for these accelerators. However, most requests for these allocations are usually C struct with constant size. Considering this characteristic, Hi-DMM provides designers with high-performance fixed-size allocator based on K-way tree which can handle more than 2048 fixed-size blocks.
The structure of KWTA is a K-way AND-gate tree with depth of 2 instead of the binary OR-gate tree for other allocators in this paper. An example of a 8-way tree is shown in the figure below. The first layer is summary layer and the second one is leaf layer. Each node marked with zero in summary layer has available child(children) and each node marked with zero in leaf layer represents an available mini-heap, a small memory space with constant size, which can hold one or more fixed-size blocks. Each mini-heap is maintained by two registers, and . records how many fixed-size blocks have been allocated in the mini-heap, no matter whether they have been freed or not, while records how many fixed-size blocks have been freed from the mini-heap.
By applying the search of the first zero bit twice on the BVs of K-way AND-gate tree, an available mini-heap will be located. According to the offset of the leaf , the size of mini-heap and the register , the allocated address will be . Note that the allocation in a mini-heap is one-way, which means the fixed-size blocks will be allocated from the head of mini-heap to the tail, according to the increasing value of the register , even if there are some freed space at the head. In this way, the latency of allocation stage will be decreased. Moreover, the mini-heap will be recorded if the value of its register is smaller than the size of mini-heap, so that the next request will not need any search. The allocation of a new mini-heap will cost 9 cycles based on 64-way tree. However, if the allocation can be handled by the recorded mini-heap, the latency will be just 1 cycle. As the size of mini-heap increases, the average latency of allocation stage will decrease significantly, because each allocated heap can be recorded for several subsequent requests. For example, when the size of mini-heap is 16, the average latency of allocation stage is 1.5 cycles. The procedure of KWTA allocation described above is also presented in the following figure.
After each allocation in the mini-heap, the value of register will be increased by 1. If the value of register is equal to the size of mini-heap and the value of register is smaller than that of register , the corresponding leaf in K-way tree will be marked with 1, indicating that it is unavailable. In contrast, after each de-allocation, the value of register will be increased by 1 and if the value of register is equal to that of register , the corresponding leaf in K-way tree will be marked with 0 and both registers will be reset to 0. The summary layer will be updated according to the definition of AND-gate tree. In other situations, the value of the leaf in K-way tree won't be changed.
As an example presented in the figure above, the BV value of node 34 is 1, indicating the mini-heap is unavailable, because the value of register is 4, equal to . Since 3 blocks have been freed in the mini-heap, one more de-allocation in this mini-heap can make it available again and reset the value to 0.