-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
forward will use more time when enable ACL #14
Comments
mtk_disable_acl.log ======================== |
mtk_open_acl.log ====================== |
-DCPU_ONLY=ON Ps. you can check whether GPU worked or not with below command before and after running the test app, For reference, in some computation cases GPU was slower than CPU(OpenBLAS) and in some case GPU was fater than CPU in my case. I guess you would need to except a first time measurement result because CL kernel is compiled in runtime first one time, which incurs some overhead. However, I guess if CaffeOnACL supports NNPACK - Caffe2 supports only it - in almost cases CPU would be faster than GPU. Of course this would depend on GPU power. The benefit of CaffeOnACL I think would be that it can have combinated pathes - OpenBLAS + ACL GPU or OpenBLAS + ACL neon by bypassing ACL or not - for forward computations such as convolution, activation funtions and pooling according to your Hardware performance. Thanks, |
Hello note both running on cpu version any possible hypothesis for these results? |
Issue summary
forward will use more time when enable ACL
Steps to reproduce
1:build https://github.com/ARM-software/ComputeLibrary by command:
scons Werror=1 -j8 debug=0 asserts=1 neon=1 opencl=1 embed_kernels=1 os=android arch=arm64-v8a
2:build ACLCAFFE to android platform by enable env:
export ACL_DIR=${ANDROID_LIB_ROOT}/ComputeLibrary
-DCPU_ONLY=ON
-DUSE_PROFILING=ON
-DUSE_ACL=ON \
3: run caffe on android MTK/QCOM arm platform by test mnist
4: use the same model and protobuf,
forward test mnist will take
0m20.48s to 30 iterations
ps : at the same code , only set -DUSE_ACL=OFF \ (meanings use CPU only )
forward test mnist just take
0m15.26s to 30 iterations
why cpu only more efficient than NEON+GPU ?
ps: USE mtk arm64 chip with MALI T88 GPU
If you are having difficulty building Caffe or training a model, please ask the caffe-users mailing list. If you are reporting a build error that seems to be due to a bug in Caffe, please attach your build configuration (either Makefile.config or CMakeCache.txt) and the output of the make (or cmake) command.
Your system configuration
Operating system:
Compiler:
CUDA version (if applicable):
CUDNN version (if applicable):
BLAS:
Python or MATLAB version (for pycaffe and matcaffe respectively):
The text was updated successfully, but these errors were encountered: