-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
only inferring a single image frame once for a huge performance boost… #301
Conversation
@zoombinis Some tests fail: https://github.com/leggedrobotics/darknet_ros/pull/301/checks?check_run_id=1893822361 |
…. Before the same image would run inference multiple times redundantly hogging resources
365b80b
to
02f2170
Compare
@tomlankhorst Should the unit test be setting a header on the image message (specifying timestamp, index especially)? This change relies on the image having an index, which they always should, so it could just be due to the test setup. I'm away from my development environment for a while, sorry I can't verify right now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposed changes do improve the performance, but the results are no longer accurate all the time.
Because of multithreading, the buffer works as follows:
(buffIndex_ + 3) % 3
(also shortbuffIndex_
) is currently being fetched in thefetch_thread
(buffIndex_ + 2) % 3
is currently being processed indetect_thread
(buffIndex_ + 1) % 3
is a completely processed frame with detections from the previous loop iteration
When a new image has been fetched ("new" as in has a new sequence number) that needs to be processed by the detection thread, the last image is still the old image ("old" as in with the old sequence number) which did not get any detection.
Because when prevSeq_ != headerBuff_[(buffIndex_ + 2) % 3].seq
yields true, the publishInThread
function uses information from the previous iteration (from buff_[(buffIndex_ + 1) % 3]
).
Therefore it takes three iterations before an image is fully processed and detections are published:
- fetching the image
- detecting objects
- publishing the bounding boxes and the detection image
Either the publishing is done right after detect_thread.join()
, or it needs to wait until the next iteration, which is suggested in the following changes:
// Fix0: keep track of what std_msgs::Header id this is (consecutively increasing)
std::uint32_t prevSeq_ = 0;
bool newImageForDetection = false;
bool hasDetectionsReady = false;
while (!demoDone_)
{
buffIndex_ = (buffIndex_ + 1) % 3;
// Fix1: check this isn't an image already seen
newImageForDetection = (prevSeq_ != headerBuff_[(buffIndex_ + 2) % 3].seq);
fetch_thread = std::thread(&YoloObjectDetector::fetchInThread, this);
if (newImageForDetection)
{
// Fix2: only detect if this is an image we haven't see before
detect_thread = std::thread(&YoloObjectDetector::detectInThread, this);
}
// only publish bounding boxes if detection has been done in the last iteration
if (hasDetectionsReady)
{
if (!demoPrefix_)
{
fps_ = 1. / (what_time_is_it_now() - demoTime_);
demoTime_ = what_time_is_it_now();
if (viewImage_)
{
displayInThread(0);
}
else
{
generate_image(buff_[(buffIndex_ + 1) % 3], ipl_);
}
publishInThread();
}
else
{
char name[256];
sprintf(name, "%s_%08d", demoPrefix_, count);
save_image(buff_[(buffIndex_ + 1) % 3], name);
++count;
}
// state that the image has been published
hasDetectionsReady = false;
}
fetch_thread.join();
if (newImageForDetection)
{
// Fix3: increment the new sequence number to avoid detecting more than once
prevSeq_ = headerBuff_[(buffIndex_ + 2) % 3].seq;
// Fix4: no detection made, so let thread execution complete so that it can be destroyed safely
detect_thread.join();
// only after the detect thread is joined, set this flag to true
hasDetectionsReady = true;
}
if (!isNodeRunning())
{
demoDone_ = true;
}
}
@tomlankhorst I got tests passing and even included @Zorbing improvement. As predicted, the issue was due to missing header information. Tests are now correctly passing that information through. Here is my commit diff: feature/zoombinis-infer-single-image...zoombinis:feature/zoombinis-infer-single-image However, there's still some inaccuracy (<300%) since we're no longer iterating detection on the same image. If you look at the output it's not unacceptable by some standards - good enough for me :) at least |
I don't think you can change the source branch of a PR (the target branch works though). So you probably have to create a new PR and link and close this one. |
@tomlankhorst I don't have the ability to close but here is a new PR: #305 |
Thanks for the new PR |
Author: @zoombinis
When I first used darknet_ros, I discovered a single image frame from a video camera was being used as the same input for inference redundantly up to 70 times on my NVidia Quadro RTX 3000!
So to use far less processing resources unnecessarily, this PR simply keeps track of the last consumed image frame ID to ensure it doesn't infer with it again.
This resolves these issues:
#150
ZhiangChen/target_mapping#8