-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
performance gain ith ACL #2
Comments
Yes, kaishijeng. The performance gain percentage we got is just like what you got. Due to the time of loading model's parameters, the time in real classification application will be much fatser(only need load the parameters once). |
If you see how I measure time duration, I actually measure time starting
from the 2nd time of prediction. So loading parameter should not have an
effect of my time measurement
Thanks,
…On Sat, Jul 1, 2017 at 5:05 PM, honggui ***@***.***> wrote:
Yes, kaishijeng. The performance gain percentage we got is just like what
you got. Due to the time of loading model's parameters, the time in real
classification application will be much fatser(only need load the
parameters once).
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AMGg3nEZGz6QmHTMvmal5eq1tIruPMHrks5sJt7OgaJpZM4OLa-n>
.
|
Kaishijeng, your measure time is much longer than what I measured. In Arm Compute Library, there's a line "force_number_of_threads(0)" in the file src\runtime\CPP\CPPScheduler.cpp. You may change the line to "force_number_of_threads(1)", and have a try again. |
honggui
I can't find force_number_of_threads function on
src/runtime/CPP/CPPScheduler.cpp in the computelibrary. Can you check it?
I have a couple questions of your measurements:
1) What platform do you use and what is time spent you get on your platform?
2) Which portion of code do you measure time spent?
3) How do I know GU has been used? I modify the .the arm_gpu_mode function
in include/caffe/common.hpp below and not sure it is correct or not to
force using gpu mode:
//inline static bool arm_gpu_mode() {return Get().use_mali_gpu_;}
inline static bool arm_gpu_mode() {return true;}
Thanks,
…On Sun, Jul 2, 2017 at 1:15 AM, honggui ***@***.***> wrote:
Kaishijeng, you time is much longer than what I measured. there's a line
"force_number_of_threads(0)" in the file src\runtime\CPP\CPPScheduler.cpp.
You may change the line to "force_number_of_threads(1)", and have a try
again.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AMGg3iSp9i_UcMhOD90AK9f42Y5aOYbYks5sJ1GvgaJpZM4OLa-n>
.
|
Hi Kaishijeng, |
Reduce 0.3sec from 4.5 to 4.2 with set_num_threads(1). Thanks, |
kaishijeng, firefly@firefly:~/caffeOnACL$ ./build/examples/cpp_classification/classification_profiling.bin models/bvlc_reference_caffenet/deploy.prototxt models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel data/ilsvrc12/imagenet_mean.binaryproto data/ilsvrc12/synset_words.txt examples/images/cat.jpg LAYER IDX: 23 name: prob type: Softmax LAYER IDX: 22 name: fc8 type: InnerProduct LAYER IDX: 21 name: drop7 type: Dropout LAYER IDX: 20 name: relu7 type: ReLU LAYER IDX: 19 name: fc7 type: InnerProduct LAYER IDX: 18 name: drop6 type: Dropout LAYER IDX: 17 name: relu6 type: ReLU LAYER IDX: 16 name: fc6 type: InnerProduct LAYER IDX: 15 name: pool5 type: Pooling LAYER IDX: 14 name: relu5 type: ReLU LAYER IDX: 13 name: conv5 type: Convolution LAYER IDX: 12 name: relu4 type: ReLU LAYER IDX: 11 name: conv4 type: Convolution LAYER IDX: 10 name: relu3 type: ReLU LAYER IDX: 9 name: conv3 type: Convolution LAYER IDX: 8 name: norm2 type: LRN LAYER IDX: 7 name: pool2 type: Pooling LAYER IDX: 6 name: relu2 type: ReLU LAYER IDX: 5 name: conv2 type: Convolution LAYER IDX: 4 name: norm1 type: LRN LAYER IDX: 3 name: pool1 type: Pooling LAYER IDX: 2 name: relu1 type: ReLU LAYER IDX: 1 name: conv1 type: Convolution LAYER IDX: 0 name: data type: Input LAYER IDX: 23 name: prob type: Softmax ratio: 0 LAYER IDX: 22 name: fc8 type: InnerProduct ratio: 4.23632 LAYER IDX: 21 name: drop7 type: Dropout ratio: 0 LAYER IDX: 20 name: relu7 type: ReLU ratio: 0 LAYER IDX: 19 name: fc7 type: InnerProduct ratio: 20.903 LAYER IDX: 18 name: drop6 type: Dropout ratio: 0 LAYER IDX: 17 name: relu6 type: ReLU ratio: 0 LAYER IDX: 16 name: fc6 type: InnerProduct ratio: 42.5307 LAYER IDX: 15 name: pool5 type: Pooling ratio: 1.05905 LAYER IDX: 14 name: relu5 type: ReLU ratio: 0 LAYER IDX: 13 name: conv5 type: Convolution ratio: 1.61653 LAYER IDX: 12 name: relu4 type: ReLU ratio: 0.0557367 LAYER IDX: 11 name: conv4 type: Convolution ratio: 2.73132 LAYER IDX: 10 name: relu3 type: ReLU ratio: 0.0557367 LAYER IDX: 9 name: conv3 type: Convolution ratio: 10.7581 LAYER IDX: 8 name: norm2 type: LRN ratio: 0.334476 LAYER IDX: 7 name: pool2 type: Pooling ratio: 1.95095 LAYER IDX: 6 name: relu2 type: ReLU ratio: 0.222947 LAYER IDX: 5 name: conv2 type: Convolution ratio: 8.97438 LAYER IDX: 4 name: norm1 type: LRN ratio: 0.390212 LAYER IDX: 3 name: pool1 type: Pooling ratio: 1.11484 LAYER IDX: 2 name: relu1 type: ReLU ratio: 0.33442 LAYER IDX: 1 name: conv1 type: Convolution ratio: 2.73132 LAYER IDX: 0 name: data type: Input ratio: 0 STATS for 10 reptitions: ... time cost top 10 layers are: ... 0.3134 - "n02123045 tabby, tabby cat" |
How do you get the log? ./build/examples/cpp_classification/classification_profiling.bin models/bvlc_reference_caffenet/deploy.prototxt models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel data/ilsvrc12/imagenet_mean.binaryproto data/ilsvrc12/synset_words.txt examples/images/cat.jpg ---------- Prediction for examples/images/cat.jpg ---------- |
hi kaishijeng, |
See below is my profiling result which is very similar to you get: STATS for 10 reptitions: ... time cost top 10 layers are: ... |
Does it mean time per forward is 607msec? Thanks, |
Hi Kaishijeng, |
Honggui
How do you do profile with original caffe so that I can compare performance
with caffeOnACL?
Thanks
…On Thu, Jul 6, 2017 at 8:30 PM, honggui ***@***.***> wrote:
Hi Kaishijeng,
Yes,you are right.
Regards,
Honggui
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AMGg3vwngpkK70UA2cZtnw3l42V3wQmTks5sLaZBgaJpZM4OLa-n>
.
|
@kaishijeng you may use your/caffe/binary/caffe time -model alexnet.prototxt @kaishijeng @honggui by the way, you guys test the performance on desktop processor? Is there some statistics from mobile devices? and In the doc, the ACL_NEON seems slower than offical caffe with openblas. Which devices are tested? seems a long way to go if test on 32 bit platform, since 32 bit openblas don't use neon to speed up. |
Not sure performance number from the following command is a fair comparison
to CaffeOnACL:
I0707 06:58:46.069133 9441 caffe.cpp:417] Average Forward pass: 1580.33 ms.
firefly@firefly:~/2TB/src/caffe$ ./.build_release/tools/caffe time -model
models/bvlc_reference_caffenet/deploy.prototxt
I0707 06:56:26.991888 9441 caffe.cpp:352] Use CPU.
I0707 06:56:27.031245 9441 net.cpp:51] Initializing net from parameters:
name: "CaffeNet"
state {
phase: TRAIN
level: 0
stage: ""
}
layer {
name: "data"
type: "Input"
top: "data"
input_param {
shape {
dim: 10
dim: 3
dim: 227
dim: 227
}
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
convolution_param {
num_output: 96
kernel_size: 11
stride: 4
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "norm1"
type: "LRN"
bottom: "pool1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "norm1"
top: "conv2"
convolution_param {
num_output: 256
pad: 2
kernel_size: 5
group: 2
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "norm2"
type: "LRN"
bottom: "pool2"
top: "norm2"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "conv3"
type: "Convolution"
bottom: "norm2"
top: "conv3"
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "conv3"
top: "conv3"
}
layer {
name: "conv4"
type: "Convolution"
bottom: "conv3"
top: "conv4"
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
group: 2
}
}
layer {
name: "relu4"
type: "ReLU"
bottom: "conv4"
top: "conv4"
}
layer {
name: "conv5"
type: "Convolution"
bottom: "conv4"
top: "conv5"
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
group: 2
}
}
layer {
name: "relu5"
type: "ReLU"
bottom: "conv5"
top: "conv5"
}
layer {
name: "pool5"
type: "Pooling"
bottom: "conv5"
top: "pool5"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fc6"
type: "InnerProduct"
bottom: "pool5"
top: "fc6"
inner_product_param {
num_output: 4096
}
}
layer {
name: "relu6"
type: "ReLU"
bottom: "fc6"
top: "fc6"
}
layer {
name: "drop6"
type: "Dropout"
bottom: "fc6"
top: "fc6"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc7"
type: "InnerProduct"
bottom: "fc6"
top: "fc7"
inner_product_param {
num_output: 4096
}
}
layer {
name: "relu7"
type: "ReLU"
bottom: "fc7"
top: "fc7"
}
layer {
name: "drop7"
type: "Dropout"
bottom: "fc7"
top: "fc7"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc8"
type: "InnerProduct"
bottom: "fc7"
top: "fc8"
inner_product_param {
num_output: 1000
}
}
layer {
name: "prob"
type: "Softmax"
bottom: "fc8"
top: "prob"
}
I0707 06:56:27.254829 9441 caffe.cpp:360] Performing Forward
I0707 06:56:28.812577 9441 caffe.cpp:365] Initial loss: 0
I0707 06:56:28.812671 9441 caffe.cpp:366] Performing Backward
I0707 06:56:28.812688 9441 caffe.cpp:374] *** Benchmark begins ***
I0707 06:56:28.812697 9441 caffe.cpp:375] Testing for 50 iterations.
I0707 06:56:31.571130 9441 caffe.cpp:403] Iteration: 1 forward-backward
time: 2758 ms.
I0707 06:56:34.219300 9441 caffe.cpp:403] Iteration: 2 forward-backward
time: 2647 ms.
I0707 06:56:36.851164 9441 caffe.cpp:403] Iteration: 3 forward-backward
time: 2631 ms.
I0707 06:56:39.500258 9441 caffe.cpp:403] Iteration: 4 forward-backward
time: 2648 ms.
I0707 06:56:42.151398 9441 caffe.cpp:403] Iteration: 5 forward-backward
time: 2650 ms.
I0707 06:56:44.799932 9441 caffe.cpp:403] Iteration: 6 forward-backward
time: 2648 ms.
I0707 06:56:47.448256 9441 caffe.cpp:403] Iteration: 7 forward-backward
time: 2648 ms.
I0707 06:56:50.095988 9441 caffe.cpp:403] Iteration: 8 forward-backward
time: 2647 ms.
I0707 06:56:52.744285 9441 caffe.cpp:403] Iteration: 9 forward-backward
time: 2648 ms.
I0707 06:56:55.396378 9441 caffe.cpp:403] Iteration: 10 forward-backward
time: 2651 ms.
I0707 06:56:58.047657 9441 caffe.cpp:403] Iteration: 11 forward-backward
time: 2651 ms.
I0707 06:57:00.724208 9441 caffe.cpp:403] Iteration: 12 forward-backward
time: 2676 ms.
I0707 06:57:03.415966 9441 caffe.cpp:403] Iteration: 13 forward-backward
time: 2691 ms.
I0707 06:57:06.115960 9441 caffe.cpp:403] Iteration: 14 forward-backward
time: 2699 ms.
I0707 06:57:08.835702 9441 caffe.cpp:403] Iteration: 15 forward-backward
time: 2719 ms.
I0707 06:57:11.555269 9441 caffe.cpp:403] Iteration: 16 forward-backward
time: 2719 ms.
I0707 06:57:14.274786 9441 caffe.cpp:403] Iteration: 17 forward-backward
time: 2719 ms.
I0707 06:57:17.010529 9441 caffe.cpp:403] Iteration: 18 forward-backward
time: 2735 ms.
I0707 06:57:19.747344 9441 caffe.cpp:403] Iteration: 19 forward-backward
time: 2736 ms.
I0707 06:57:22.479828 9441 caffe.cpp:403] Iteration: 20 forward-backward
time: 2732 ms.
I0707 06:57:25.228466 9441 caffe.cpp:403] Iteration: 21 forward-backward
time: 2748 ms.
I0707 06:57:27.979506 9441 caffe.cpp:403] Iteration: 22 forward-backward
time: 2750 ms.
I0707 06:57:30.732939 9441 caffe.cpp:403] Iteration: 23 forward-backward
time: 2753 ms.
I0707 06:57:33.488718 9441 caffe.cpp:403] Iteration: 24 forward-backward
time: 2755 ms.
I0707 06:57:36.250659 9441 caffe.cpp:403] Iteration: 25 forward-backward
time: 2761 ms.
I0707 06:57:38.991574 9441 caffe.cpp:403] Iteration: 26 forward-backward
time: 2740 ms.
I0707 06:57:41.754909 9441 caffe.cpp:403] Iteration: 27 forward-backward
time: 2763 ms.
I0707 06:57:44.510370 9441 caffe.cpp:403] Iteration: 28 forward-backward
time: 2755 ms.
I0707 06:57:47.282030 9441 caffe.cpp:403] Iteration: 29 forward-backward
time: 2771 ms.
I0707 06:57:50.053514 9441 caffe.cpp:403] Iteration: 30 forward-backward
time: 2771 ms.
I0707 06:57:53.114980 9441 caffe.cpp:403] Iteration: 31 forward-backward
time: 3061 ms.
I0707 06:57:56.100261 9441 caffe.cpp:403] Iteration: 32 forward-backward
time: 2985 ms.
I0707 06:57:58.875066 9441 caffe.cpp:403] Iteration: 33 forward-backward
time: 2774 ms.
I0707 06:58:01.651820 9441 caffe.cpp:403] Iteration: 34 forward-backward
time: 2776 ms.
I0707 06:58:04.404618 9441 caffe.cpp:403] Iteration: 35 forward-backward
time: 2752 ms.
I0707 06:58:07.187002 9441 caffe.cpp:403] Iteration: 36 forward-backward
time: 2782 ms.
I0707 06:58:09.971091 9441 caffe.cpp:403] Iteration: 37 forward-backward
time: 2783 ms.
I0707 06:58:12.750619 9441 caffe.cpp:403] Iteration: 38 forward-backward
time: 2779 ms.
I0707 06:58:15.513088 9441 caffe.cpp:403] Iteration: 39 forward-backward
time: 2762 ms.
I0707 06:58:18.293782 9441 caffe.cpp:403] Iteration: 40 forward-backward
time: 2780 ms.
I0707 06:58:21.070822 9441 caffe.cpp:403] Iteration: 41 forward-backward
time: 2776 ms.
I0707 06:58:23.830873 9441 caffe.cpp:403] Iteration: 42 forward-backward
time: 2759 ms.
I0707 06:58:26.594636 9441 caffe.cpp:403] Iteration: 43 forward-backward
time: 2763 ms.
I0707 06:58:29.376324 9441 caffe.cpp:403] Iteration: 44 forward-backward
time: 2781 ms.
I0707 06:58:32.151278 9441 caffe.cpp:403] Iteration: 45 forward-backward
time: 2774 ms.
I0707 06:58:34.932479 9441 caffe.cpp:403] Iteration: 46 forward-backward
time: 2780 ms.
I0707 06:58:37.702002 9441 caffe.cpp:403] Iteration: 47 forward-backward
time: 2769 ms.
I0707 06:58:40.484354 9441 caffe.cpp:403] Iteration: 48 forward-backward
time: 2782 ms.
I0707 06:58:43.274502 9441 caffe.cpp:403] Iteration: 49 forward-backward
time: 2789 ms.
I0707 06:58:46.065948 9441 caffe.cpp:403] Iteration: 50 forward-backward
time: 2791 ms.
I0707 06:58:46.066244 9441 caffe.cpp:406] Average time per layer:
I0707 06:58:46.066313 9441 caffe.cpp:409] data forward: 0.00226 ms.
I0707 06:58:46.066375 9441 caffe.cpp:412] data backward: 0.0033 ms.
I0707 06:58:46.066432 9441 caffe.cpp:409] conv1 forward: 151.357 ms.
I0707 06:58:46.066490 9441 caffe.cpp:412] conv1 backward: 134.551
ms.
I0707 06:58:46.066547 9441 caffe.cpp:409] relu1 forward: 7.30002 ms.
I0707 06:58:46.066602 9441 caffe.cpp:412] relu1 backward: 0.00226
ms.
I0707 06:58:46.066658 9441 caffe.cpp:409] pool1 forward: 36.679 ms.
I0707 06:58:46.066712 9441 caffe.cpp:412] pool1 backward: 0.0037 ms.
I0707 06:58:46.066767 9441 caffe.cpp:409] norm1 forward: 67.7754 ms.
I0707 06:58:46.066823 9441 caffe.cpp:412] norm1 backward: 69.7601
ms.
I0707 06:58:46.066876 9441 caffe.cpp:409] conv2 forward: 354.68 ms.
I0707 06:58:46.066968 9441 caffe.cpp:412] conv2 backward: 339.333
ms.
I0707 06:58:46.067028 9441 caffe.cpp:409] relu2 forward: 4.3349 ms.
I0707 06:58:46.067081 9441 caffe.cpp:412] relu2 backward: 0.00196
ms.
I0707 06:58:46.067137 9441 caffe.cpp:409] pool2 forward: 23.469 ms.
I0707 06:58:46.067190 9441 caffe.cpp:412] pool2 backward: 0.00356
ms.
I0707 06:58:46.067245 9441 caffe.cpp:409] norm2 forward: 44.1165 ms.
I0707 06:58:46.067299 9441 caffe.cpp:412] norm2 backward: 45.2355
ms.
I0707 06:58:46.067378 9441 caffe.cpp:409] conv3 forward: 182.216 ms.
I0707 06:58:46.067433 9441 caffe.cpp:412] conv3 backward: 146.802
ms.
I0707 06:58:46.067489 9441 caffe.cpp:409] relu3 forward: 1.48994 ms.
I0707 06:58:46.067543 9441 caffe.cpp:412] relu3 backward: 0.0036 ms.
I0707 06:58:46.067597 9441 caffe.cpp:409] conv4 forward: 145.296 ms.
I0707 06:58:46.067652 9441 caffe.cpp:412] conv4 backward: 121.937
ms.
I0707 06:58:46.067708 9441 caffe.cpp:409] relu4 forward: 1.4964 ms.
I0707 06:58:46.067761 9441 caffe.cpp:412] relu4 backward: 0.00316
ms.
I0707 06:58:46.067816 9441 caffe.cpp:409] conv5 forward: 122.753 ms.
I0707 06:58:46.067870 9441 caffe.cpp:412] conv5 backward: 111.253
ms.
I0707 06:58:46.067925 9441 caffe.cpp:409] relu5 forward: 0.9969 ms.
I0707 06:58:46.067980 9441 caffe.cpp:412] relu5 backward: 0.00196
ms.
I0707 06:58:46.068033 9441 caffe.cpp:409] pool5 forward: 6.49218 ms.
I0707 06:58:46.068087 9441 caffe.cpp:412] pool5 backward: 0.00324
ms.
I0707 06:58:46.068141 9441 caffe.cpp:409] fc6 forward: 256.357 ms.
I0707 06:58:46.068197 9441 caffe.cpp:412] fc6 backward: 117.352
ms.
I0707 06:58:46.068250 9441 caffe.cpp:409] relu6 forward: 0.10042 ms.
I0707 06:58:46.068305 9441 caffe.cpp:412] relu6 backward: 0.00174
ms.
I0707 06:58:46.068358 9441 caffe.cpp:409] drop6 forward: 0.42372 ms.
I0707 06:58:46.068413 9441 caffe.cpp:412] drop6 backward: 0.00324
ms.
I0707 06:58:46.068469 9441 caffe.cpp:409] fc7 forward: 136.134 ms.
I0707 06:58:46.068522 9441 caffe.cpp:412] fc7 backward: 57.1792
ms.
I0707 06:58:46.068577 9441 caffe.cpp:409] relu7 forward: 0.09016 ms.
I0707 06:58:46.068631 9441 caffe.cpp:412] relu7 backward: 0.00196
ms.
I0707 06:58:46.068686 9441 caffe.cpp:409] drop7 forward: 0.37678 ms.
I0707 06:58:46.068739 9441 caffe.cpp:412] drop7 backward: 0.0037 ms.
I0707 06:58:46.068794 9441 caffe.cpp:409] fc8 forward: 35.62 ms.
I0707 06:58:46.068850 9441 caffe.cpp:412] fc8 backward: 20.6572
ms.
I0707 06:58:46.068903 9441 caffe.cpp:409] prob forward: 0.48392 ms.
I0707 06:58:46.068958 9441 caffe.cpp:412] prob backward: 0.13076
ms.
I0707 06:58:46.069133 9441 caffe.cpp:417] Average Forward pass: 1580.33 ms.
I0707 06:58:46.069190 9441 caffe.cpp:419] Average Backward pass: 1164.42
ms.
I0707 06:58:46.069244 9441 caffe.cpp:421] Average Forward-Backward:
2745.12 ms.
I0707 06:58:46.069344 9441 caffe.cpp:423] Total Time: 137256 ms.
I0707 06:58:46.069401 9441 caffe.cpp:424] *** Benchmark ends ***
…On Thu, Jul 6, 2017 at 11:29 PM, Yubin Wang ***@***.***> wrote:
@kaishijeng <https://github.com/kaishijeng> you may use
your/caffe/binary/caffe time -model alexnet.prototxt
@kaishijeng <https://github.com/kaishijeng> @honggui
<https://github.com/honggui> by the way, you guys test the performance on
desktop processor? Is there some statistics from mobile devices? and In the
doc, the ACL_NEON seems slower than offical caffe with openblas. Which
devices are tested? seems a long way to go if test on 32 bit platform,
since 32 bit openblas don't use neon to speed up.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AMGg3rnE6hS_dD7QjHifuZNeGJPFn9z-ks5sLdA5gaJpZM4OLa-n>
.
|
If the above numbers are fair comparison, then ACL has 2.5 speedup over a pure CPU on a firefly platform. I saw there is a MxnetonACL on github. Not sure whether it will have a plan to have TensorflowOnACL. |
Hi Kaishijeng: |
Thanks for an update. |
@xhbdahai, @honggui, @openailab-sh You will invariably end up with more questions about benchmarking Caffe-on-ACL against Caffe (or indeed other frameworks). Have you considered using / contributing to CK-Caffe? It's part of a growing suite of AI benchmarking tools based on Collective Knowledge, also including e.g. CK-Caffe2, CK-TensorFlow, CK-TensorRT, CK-KaNN. For example, we have released benchmarking data for the Firefly-RK3399 platform that @kaishijeng uses. For example, for the batch size of 2 (the smallest we have measured) on AlexNet (the closest to CaffeNet we have measured), we have obtained the following data for forward propagation (inference):
(I can easily benchmark CaffeNet with the batch size of 1 if you are interested.) Would you be interested in collaborating on adding Caffe-on-ACL to CK-Caffe? |
As an added bonus, we already support ACL package and crowdbenchmarking across mobile devices. |
@psyhtest adding caffeOnACL to CK-Caffe is a good idea. will give you feedback after checking effort. |
@OAIL How is the effort looking to you? :) |
Hello @honggui I am testing caffeACL vs caffe on tx2 board. note both running on cpu version any possible hypothesis for these results? |
Hi pcub, |
Thanks @honggui |
Hi honggui, I want to test the performance of face recognition application on multi threads. Where I can add "CPPScheduler::set_num_threads(x)" to enable multi thread test for ACL? The executable that I use is OAID/FaceRecognition/bin/face-recognition.cpp. Also, I want to know if there is any interface to modify the number of threads used. Thanks a lot! |
I did performance profiling on classification with BVLC model between original caffe and caffeonacl and saw some gain, but as big as I am hoping. Is this also what you observe on your platform?
I use the following command on firefly 3399:
./build/examples/cpp_classification/classification.bin models/bvlc_reference_caffenet/deploy.prototxt models/bvlc_reference_caffene
t/bvlc_reference_caffenet.caffemodel data/ilsvrc12/imagenet_mean.binaryproto data/ilsvrc12/synset_words.txt examples/images/cat.jpg
and measure time spent below:
std::vector Classifier::Classify(const cv::Mat& img, int N) {
std::vector output = Predict(img);
std::clock_t begin = std::clock();
output = Predict(img);
N = std::min(labels_.size(), N);
std::vector maxN = Argmax(output, N);
std::vector predictions;
for (int i = 0; i < N; ++i) {
int idx = maxN[i];
predictions.push_back(std::make_pair(labels_[idx], output[idx]));
}
std::clock_t end = std::clock();
double elapsed_secs = double(end - begin) / CLOCKS_PER_SEC;
std::cout <<"Time spent: " << elapsed_secs <<std::endl;
return predictions;
}
The time measurement for Caffe and CaffeOnACL are below:
CaffeonACL
Time spent: 4.53536
0.3134 - "n02123045 tabby, tabby cat"
0.2380 - "n02123159 tiger cat"
0.1235 - "n02124075 Egyptian cat"
0.1003 - "n02119022 red fox, Vulpes vulpes"
0.0715 - "n02127052 lynx, catamount"
Original Caffe
Time spent: 5.5306
0.3134 - "n02123045 tabby, tabby cat"
0.2380 - "n02123159 tiger cat"
0.1235 - "n02124075 Egyptian cat"
0.1003 - "n02119022 red fox, Vulpes vulpes"
0.0715 - "n02127052 lynx, catamount"
The text was updated successfully, but these errors were encountered: