Skip to content

Front End Passes for FPGA Oriented Optimizations and Transformation

Tingyuan LIANG edited this page Jan 8, 2020 · 1 revision

2. Front-End Passes for FPGA-Oriented Optimizations and Transformation

In this part, Light-HLS will transform the IR code according to the FPGA characteristics for optimization.

A. Function Level:

(A.1) Function Instantiation: In C/C++ source code, a function might be reused to process different data. For CPU, the function can be consistent for different data source during compilation. However, for FPGA, if the function processes data with different structures, e.g. they are partitioned with different factors, the function needs to be instantiated by Light-HLS for different data, for different analysis and optimizations.

B. Loop Level:

(B.1) Loop Extraction: This is an optimization for loop level optimization. In HLS, the functions processing independent objects can be run concurrently. Therefore, extracting the loops into functions can help to raise the parallelism among loops. In LLVM-9.0.0, LoopExtractor Pass has been implemented.

(B.2) Loop Simplification / Index Variable Simplify: For further analysis of loops, e.g. tripcount evaluation and header/exit detection, Light-HLS needs to canonicalize natural loops. What is canpnical loops? WARNING: Currently, Light-HLS cannot process non-canpnical loop, for which we are dealing with.

(B.3) Loop Strength Reduction (LSR): In the loops, there could be some expressions involving the index variables of the loops. Loop strength reduction will try to transform those operations into more efficient operations, e.g. from multiplication to addition, (A = i*30 => A = A + 30 (i is a index of loop)). LLVM provides LSR Pass but some multiplications, used in address calculation, may stay after loop strength reduction, if the target machine has scaled index addressing mode. However, FPGA is not one of the target machine of official LLVM. In order to enforce the LSR for HLS, Light-HLS includes its own LSR Pass.

(B.4) Loop Unrolling: Unrolling a loop could be benificial to FPGA parallelism, since after unrolling, those independent instructions can be executed concurrently. LLVM provides Loop Unrolling Pass which targets at CPU-like devices and unrolls loops according to specific criterias. However, for FPGA, we want to enforce loop unrolling for some loops and therefore, referring to the LLVM Pass, Light-HLS provides a loop unroller which will unroll loop according to the configuration file, which specifies the unrolling factor and the label of the loops should be unrolled.

C. General Instruction Level:

(C.1) Duplicated Instruction Removal: Duplicated instructions are those instructions with the same opcode and the same operands in the IR code. This kind of situations may usually occur after loop unrolling and GER lowering, since there could be the similar instructions in different iterations or the calculation of different addresses. Light-HLS will remove those duplicated ones to reduce the cost for FPGA implementation.

(C.2) Multiplication Optimization: A series of multiplication operations (reduction) can be transformed into a multiplier tree or just ordered them to postpone the accesses to reduce computation latency. Moreover, some multiplication with constant can be transformed into shift operation to reduce overhead.

(C.3) Addition Optimization: A series of addition operations (reduction) can be ordered them to postpone the accesses.

(C.4) Instruction Hoisting: There could be instructions in branches: some of them are actually independent with the branch PHI operations or some of them are shared among branches. For these situations, the instructions might be hoisted from the branches to their dominant nodes, so the instructions could be executed in advance or in parallel with other previous operations, to lower the latency.

(C.5) Bitwidth Optimization: On FPGA, the arbitrary precision integers are supported and the overhead of the operations are sensitive to the bitwidth of the operations. Therefore, bitwidth optimization is important for FPGA HLS, which is implemented by Light-HLS.

D. Memory Access Level:

(D.1) GEPLowering: GEP is an operation in LLVM to get the element pointer for the accesses to arrays. An array could have multiple dimensions and GEP helps to map the accesses to array to the exact memory address. However, the on-chip memory of FPGA are mainly BRAMs, which are actually "single-dimension". In order to ensure that the instructions can get data from BRAMs, Light-HLS lowers the GEP to those exact operations of address calculation. For example, for the access B[i][j] to the array B[70][20], Light-HLS will transform the GEP operation into the multiplication and addition, e.g. i*20+j.

(D.2) Redundant Access Removal: There could be many redundant memory accesses in IR code, especially after loop unrolling. Here, Light-HLS removes those some of accesses, assuming that FPGA is the only one device processing those data.