Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(autoware_pointcloud_preprocessor): reuse collectors to reduce creation of collector and timer #10074

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

vividf
Copy link
Contributor

@vividf vividf commented Feb 5, 2025

Description

#8300 introduces a collector and initializes a timer in each collector with a 100 ms interval, which leads to infinite object creation and deletion.

This PR reuses collectors that are no longer in use (i.e., those that have completed concatenation) by resetting their timers, effectively converting them into new collectors. This approach helps limit unnecessary collector creation and timer initialization.

Related links

Parent Issue:

  • Link

How was this PR tested?

Notes for reviewers

None.

Interface changes

None.

Effects on system behavior

None.

@vividf vividf added component:sensing Data acquisition from sensors, drivers, preprocessing. (auto-assigned) run:build-and-test-differential Mark to enable build-and-test-differential workflow. (used-by-ci) labels Feb 5, 2025
@vividf vividf self-assigned this Feb 5, 2025
Copy link

github-actions bot commented Feb 5, 2025

Thank you for contributing to the Autoware project!

🚧 If your pull request is in progress, switch it to draft mode.

Please ensure:

@vividf vividf requested a review from mojomex February 5, 2025 12:23
Copy link

codecov bot commented Feb 5, 2025

Codecov Report

Attention: Patch coverage is 56.25000% with 21 lines in your changes missing coverage. Please review.

Project coverage is 28.32%. Comparing base (bd84834) to head (a8444cb).
Report is 19 commits behind head on main.

Files with missing lines Patch % Lines
...oncatenate_data/concatenate_and_time_sync_node.cpp 53.33% 11 Missing and 3 partials ⚠️
...processor/src/concatenate_data/cloud_collector.cpp 61.11% 2 Missing and 5 partials ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main   #10074   +/-   ##
=======================================
  Coverage   28.32%   28.32%           
=======================================
  Files        1485     1486    +1     
  Lines      111090   111153   +63     
  Branches    43150    43176   +26     
=======================================
+ Hits        31466    31487   +21     
- Misses      76603    76643   +40     
- Partials     3021     3023    +2     
Flag Coverage Δ *Carryforward flag
differential 22.88% <56.25%> (?)
differential-cuda 20.64% <56.25%> (?)
total 28.31% <ø> (-0.01%) ⬇️ Carriedforward from 81ec941

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@YoshiRi YoshiRi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in codespace, but I did not check with any rosbags.

Copy link
Contributor

@mojomex mojomex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff!
I've left a few comments in places I think could be simplified or where I'd like the more safety guarantees.

Comment on lines 242 to 247
for (auto & collector : cloud_collectors_) {
if (collector->get_status() == CollectorStatus::Idle) {
selected_collector = collector;
break;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be expressed as a std::find_if instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! fixed in c99f1fe

Comment on lines 77 to 85
const auto period_ns = std::chrono::duration_cast<std::chrono::nanoseconds>(
std::chrono::duration<double>(timeout_sec_));
try {
set_period(period_ns.count());
} catch (rclcpp::exceptions::RCLError & ex) {
RCLCPP_WARN_THROTTLE(
ros2_parent_node_->get_logger(), *ros2_parent_node_->get_clock(), 5000, "%s", ex.what());
}
timer_->reset();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? It seems to me reading the RCL source code that the period is unaffected by cancel() and so just calling reset() without re-setting the period should be fine, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, thanks for pointing out!
cbb94ec

Comment on lines 371 to 378
if (cloud_collectors_.size() > collectors_threshold) {
RCLCPP_WARN_STREAM_THROTTLE(
get_logger(), *get_clock(), 1000,
"The number of cloud collectors (" << cloud_collectors_.size() << ") exceeds the threshold ("
<< collectors_threshold
<< "), be careful if it keeps increasing.");
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like this to be a hard limit. In production, nobody will look at those logs, so the memory footprint could increase forever.

Copy link
Contributor Author

@vividf vividf Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
Fixed with resetting collector that has the oldest timestamp:
81ec941 and 3375c43

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix logging: a8444cb

@vividf vividf requested a review from mojomex February 7, 2025 06:59
Signed-off-by: vividf <[email protected]>
Copy link
Contributor

@mojomex mojomex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably hard-coding number of collectors to 2 or 3 would be okay. I don't see a use case with more collectors at the moment.
Also would allow us to allocate all of them once on startup instead of dynamically.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed with @drwnz just now, it's probably a goof idea to allocate (on startup) collectors_threshold collectors and then not add/remove any during operation.
This makes it easier to reason about the program state.
(would also get rid of L251-L260)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!!!
Fixed in 85f75e6


// Reset the collector with the oldest timestamp if found
if (min_it != cloud_collectors_.end()) {
RCLCPP_WARN_STREAM(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably warn_throttle would be good here as this can run n_lidars x frequency times per second in the worst case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed in a8b512f

if (!selected_collector) {
// If no idle collector exists, create a new collector
selected_collector = std::make_shared<CloudCollector>(
std::dynamic_pointer_cast<PointCloudConcatenateDataSynchronizerComponent>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dynamic creation of a new collector may not be ideal. How about we have a fixed number of collectors that are initialized, and then use the first idle one - if none are idle, kick the oldest one and reuse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!!!
Fixed in 85f75e6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:sensing Data acquisition from sensors, drivers, preprocessing. (auto-assigned) run:build-and-test-differential Mark to enable build-and-test-differential workflow. (used-by-ci)
Projects
Status: To Triage
Development

Successfully merging this pull request may close these issues.

4 participants