forked from Realeyes/pencil-benchmarks-imageproc
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
238 lines (201 loc) · 12.1 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
#################################
PENCIL Image Processing Benchmark
#################################
# Prerequisites
################
- An OpenCL driver, header files and libraries.
- The following packages:
* build-essential
* cmake
* TBB (Threading Building Blocks)
- A PENCIL compiler (PPCG for example).
- The PRL runtime library (included in PPCG).
- The PENCIL header files (included in PPCG).
- OpenCV (2.4.9.1 or above):
* Tested with 2.4.9.1 and 2.4.10.1
* You can install the OpenCV packages (not tested)
or download and compile the sources (as described below).
* Download OpenCV
git clone https://github.com/Itseez/opencv.git
* Checkout 2.4.9.1 or newer:
cd opencv
git checkout 2.4.10.1
cd ..
* If you are running the benchmark on ARM Mali GPU you need to apply the patch
0001-Make-image-filtering-erode-dilate-work-with-ARM-Mali.patch
provided with the benchmark to OpenCV. This patch changes the workgroup sizes to smaller ones so that OpenCV kernels can run on Mali (which only accepts small workgroups).
cd opencv
git apply ../pencil-benchmarks-imageproc/0001-Make-image-filtering-erode-dilate-work-with-ARM-Mali.patch
cd ..
* Configure build with CMake:
mkdir opencv-build
cd opencv-build
cmake ../opencv
* Details:
- If your CPU supports AVX/SSEx instruction sets, you can add the following to the "cmake ../opencv" command:
-DENABLE_AVX=ON -DENABLE_SSE41=ON -DENABLE_SSE42=ON -DENABLE_SSSE3=ON -DENABLE_SSE3=ON
- If your CPU is ARM, set the appropriate NEON/VFP switches.
- Optional: add TBB usage with -DWITH_TBB=ON (requires libtbb-dev package).
- Make sure the necessary modules are to be built - CMake should print a (long) status with a "To be built:" list, it should contain core, ocl and highgui (plus their dependencies).
* Build OpenCV
make all -j12
* Details: You can replace the number 12 with the number of threads of your processor(s)
* Optionally, install OpenCV as a system library:
sudo make install
############
# Building
############
Two methods are possible. You can either use bash scripts (in scripts/) or
use CMake.
# Building Using Bash Scripts (recommended):
##############################################
We recommend this method as you can use the same scripts to perform auto-tuning
which is not possible with the Cmake method.
- Set the variables in scripts/scripts_config.conf to define the paths of the
libraries, headers and tools used in the benchmark.
PENCIL_COMPILER_BINARY: the full path of the PENCIL compiler binary
PENCIL_INCLUDE_DIR: path of the directory containing "pencil.h"
PRL_LIB_DIR: path of the directory containing the PRL library (libprl.so)
PRL_INCLUDE_DIR: path of the directory containing "prl.h"
OPENCL_LIB_DIR: path of the directory containing the OpenCL library
OPENCL_INCLUDE_DIR: path of the OpenCL header files
OPENCV_INCLUDE_DIR: path of the OpenCV header files
OPENCV_LIB_DIR: path of the OpenCV library files
LIST_OF_KERNELS: defines a blank separated list of the kernels that
should be compiled or autotuned.
Please look into the file scripts/scripts_config.conf for an example
and for more details.
You can use the default values of the variables except the path variables
which should be set correctly.
- You can use the following script to build the benchmark:
./scripts/compile_and_run_kernels.sh
A log is generated in build/benchmark_building_log.txt
# Building Using CMake :
#########################
To build the benchmark, you need:
- Compile benchmark:
Crate a directory for out-of-source build, run cmake and make:
- CMake can be configured from a GUI window using cmake-gui. (recommended: list all parameters that can be change)
- OpenCL might not be found by the CMake script. In this case, set OPENCL_LIBRARY manually. It is a required parameter for ARM Mali: list BOTH libOpenCL.so and libmali.so.
- Based on your OpenCL device, it is recommended to set up PENCIL_DEFAULT_FLAGS_* variables:
The default values are set to be the common denominator to allow it to work everywhere, but the performance is not optimal.
PENCIL_DEFAULT_FLAGS_BLOCKSIZE should be set to the maximum workgroup sizes. For instance, AMD Radeon cards should use "16,16" (256 threads total)
PENCIL_DEFAULT_FLAGS_LOCAL_MEMORY_SIZE is the amount of used __local memory. Alternatively set to 0 to disable __local memory usage (useful for ARM Mali).
These parameters are reported by the tool clinfo: First is "Max work items[x]:" (individually) and "Max work group size:" (the multiply of values); second is "Max local memory:"
- Optionally, as performance tuning try turning
PENCIL_DEFAULT_FLAGS_MAXFUSE,
PENCIL_DEFAULT_FLAGS_NO_SEPARATE_COMP,
PENCIL_DEFAULT_FLAGS_DISABLE_PRIVATE
flags ON or OFF.
- If something is not installed at the standard paths (/usr, /usr/local), then run cmake-gui and edit the variables (should be straight-forward)
- Set CMAKE_BUILD_TYPE to Release for optimized build, set to Debug to build without optimizations and with debug info.
mkdir pencil-benchmarks-imageproc-build
cd pencil-benchmarks-imageproc-build
cmake -DCMAKE_BUILD_TYPE=Release -DOPENCL_LIBRARY=/path/to/libOpenCL.so -DPENCIL_DEFAULT_FLAGS_BLOCKSIZE="A,B" -DPENCIL_DEFAULT_FLAGS_LOCAL_MEMORY_SIZE=C path/to/pencil-benchmarks-imageproc-repo
make all -j12
The CMake cache is persistent between builds, you only need to supply a parameter when you want to override it.
- (Optional) Tune per-kernel parameters:
We usually need to support separate tile-grid-block sizes for different kernels, and allows access to advanced parameters of the polyhedral compiler.
You can supply these parameters using a corresponding PENCIL_FLAGS_* parameter, separate for every benchmark item. If this is not supplied, the default flags are used.
- Warning: Due to CMake restrictions, some characters (quotes, backslash) needs escaping. Also, the semicolon character cannot be used in these strings.
instead of: --sizes={kernel[i]->tile[8,8];kernel[i]->grid[8,8];kernel[i]->block[8,8]}
use: --sizes=\"{kernel[i]->tile[8,8]}\" --sizes=\"{kernel[i]->grid[8,8]}\" --sizes=\"{kernel[i]->block[8,8]}\"
- Build benchmark:
cd repository_root_path
make all -j12
- Examples of tested hardware parameters (to use with CMake)
AMD Radeon R9 290
-DPENCIL_DEFAULT_FLAGS_BLOCKSIZE="16,16" -DPENCIL_DEFAULT_FLAGS_LOCAL_MEMORY_SIZE=32768 -DPENCIL_DEFAULT_FLAGS_MAXFUSE=ON -DPENCIL_DEFAULT_FLAGS_NO_SEPARATE_COMP=ON -DPENCIL_DEFAULT_FLAGS_DISABLE_PRIVATE=OFF
nVidia Tesla M2050
-DPENCIL_DEFAULT_FLAGS_BLOCKSIZE="32,32" -DPENCIL_DEFAULT_FLAGS_LOCAL_MEMORY_SIZE=49152 -DPENCIL_DEFAULT_FLAGS_MAXFUSE=ON -DPENCIL_DEFAULT_FLAGS_NO_SEPARATE_COMP=ON -DPENCIL_DEFAULT_FLAGS_DISABLE_PRIVATE=OFF
Intel HD Graphics 4000
-DPENCIL_DEFAULT_FLAGS_BLOCKSIZE="32,16" -DPENCIL_DEFAULT_FLAGS_LOCAL_MEMORY_SIZE=65536 -DPENCIL_DEFAULT_FLAGS_MAXFUSE=ON -DPENCIL_DEFAULT_FLAGS_NO_SEPARATE_COMP=ON -DPENCIL_DEFAULT_FLAGS_DISABLE_PRIVATE=OFF
ARM Mali T628
-DPENCIL_DEFAULT_FLAGS_BLOCKSIZE="8,8" -DPENCIL_DEFAULT_FLAGS_LOCAL_MEMORY_SIZE=0 -DPENCIL_DEFAULT_FLAGS_MAXFUSE=ON -DPENCIL_DEFAULT_FLAGS_NO_SEPARATE_COMP=ON -DPENCIL_DEFAULT_FLAGS_DISABLE_PRIVATE=OFF
Please, share the parameters to your device here
# Auto-tuning and Time Measurements
######################################
- You can use the following script to perform auto-tuning for the
PPCG PENCIL compiler.
./scripts/ppcg_tuning.sh
The script generates many PPCG options internaly, and
launches PPCG with each one of these options, then will compile each
file generated by PPCG and will execute it.
The execution times measured for each kernel are reported in
build/output_time.$KERNEL_NAME.csv (a list of all the options explored
by the auto-tuner is provided along with the execution time for each
option).
The best execution times for each kernel are reported in
build/output_time.csv
The best PPCG options obtained through tuning are reported in
the file build/best_optimizations_log.sh
A log is generated in build/benchmark_building_log.txt
- You can use the following script to build the benchmark using
the default PPCG options:
./scripts/compile_and_run_kernels.sh
The script will generate the file build/output_time.csv that
constains the timings for each kernel.
In the generated timings file, the total_execution_time_reference
column indicates the total execution time for the OpenCV calls (this
includes the data copy time and the kernel execution time but does
not include kernel compilation time, the OpenCV kernel is precompiled
by being run once before time measurements are taken).
The total_execution_time_optimized column indicates the total execution
time for the PPCG generated code (this includes the data copy times, the
kernel execution time, the time spent by any host code generated by the
PENCIL compiler but does not include kernel compilation time, kernels are
precompiled also by being run once before taking time measurements).
kernel_only_execution_time_optimized column indicates kernel execution
time measured using OpenCL profiling (does not include data copies and
kernel compilation).
The script uses default PPCG options except for the workgroup sizes
where workgroups of size 16x16 are set instead of the default 32x32
workgroups which may not work on some architectures. Please note that in some cases (e.g. for ARM Mali) you might need to set an even lower value.
- You can also use the previous script to build the kernels with specific
PPCG options. In this case you need to pass a file containing the PPCG
options as an argument.
./scripts/compile_and_run_kernels.sh <ppcg_options_file>
The previous command compiles and runs the kernels using the options
defined in <ppcg_options_file> (for example you can use the file
scripts/ppcg_preset_options/best_optimizations_log.Nvidia_GTX470.sh).
Such a file is obtained in general using autotuning (as described above
the auto-tuning script generates a file containing the best optimization
options that were found during auto-tuning and which you can reuse if you
don't want to redo autotuning).
<ppcg_options_file> provides an option for each kernel listed in
$LIST_OF_KERNELS. The option that corresponds to the i'th kernel
in the $LIST_OF_KERNELS is stored in best_optimization_options[$i].
The folder scripts/ppcg_preset_options/ contains many files with preset
options.
# Running Individual Kernels Manually
#######################################
cd build/
./ppcg_test_KERNEL_NAME <path_to_image>
Details:
- Example images are in images/
- If you use CMake, it will also produce a test_* files.
- These binaries call the original PENCIL code not the code generated by the PENCIL compiler.
- Each executable runs a different operator with all three (C++, OpenCL, PENCIL) implementations.
- Results are cross-checked, and if there is no difference
(within a small allowed precision error), total times are
reported at the end.
# Troubleshooting
##################
- In case of a compilation/execution error, please check the log file in
build/benchmark_building_log.txt
- If you get an execution error while runing a script, you should run
a single kernel to see what is exactly the error message, to run the
resize kernel for example:
cd build/
./ppcg_test_resize ../images/M104_ngc4594_sombrero_galaxy_hi-res.jpg
this will run the resize kernel (OpenCV CPU, OpenCV OpenCL and the code
generated using the PENCIL compiler) on the M104_ngc4594_sombrero_galaxy_hi-res.jpg
image.
- You may get a CL_INVALID_WORK_GROUP_SIZE error, this means that the
workgroup sizes set by the PENCIL compiler are too big for your
architecture.
You can reduce the workgroup sizes for autotuning by removing big
workgroup sizes from the variable POSSIBLE_BLOCK_SIZES
If you are using the compile_and_run_kernels.sh script, you can set
the options of the PENCIL compiler by creating a file similar to
the file scripts/ppcg_preset_options/ppcg_default_options.sh