Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix abort in some cases with DDS #13627

Merged
merged 1 commit into from
Dec 31, 2024
Merged

Conversation

AviaAv
Copy link
Contributor

@AviaAv AviaAv commented Dec 24, 2024

Fixed an issue where we randomly see an error where abort() is called after we finish using DDS (in the d'tor), added a try catch block due to the race condition there
Tracked on: [LRS-1212]

@AviaAv AviaAv requested a review from Nir-Az December 24, 2024 12:34
@Nir-Az Nir-Az requested a review from maloel December 24, 2024 13:03
@Nir-Az
Copy link
Collaborator

Nir-Az commented Dec 24, 2024

@maloel can you review?
We saw sporadic crash by this 2 lines:

_th.detach(); // so it's not joinable

the condition for joinable and the actual join actions are not atomic.
Maybe we can drop the join at all?

Currently this issue was also raised by validation as a critical BUG.

Copy link
Collaborator

@maloel maloel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the comments; otherwise good to go

@AviaAv
Copy link
Contributor Author

AviaAv commented Dec 29, 2024

Seeing we still could throw because of a failed join, even with the try catch, updating to a different approach of removing the thread detach. As far as I've noticed, we don't wait additional time because of this change

@Nir-Az
Copy link
Collaborator

Nir-Az commented Dec 30, 2024

Seeing we still could throw because of a failed join, even with the try catch, updating to a different approach of removing the thread detach. As far as I've noticed, we don't wait additional time because of this change

It's probably fast enough but the theoretical time to wait is 15 sec.
@maloel what do you think?

@maloel
Copy link
Collaborator

maloel commented Dec 30, 2024

In general, after reviewing the code, I think removing the detach() should be good - I arrived at the same conclusion. I assume this was tested and works.
I don't think a 15-second period of waiting on exit will happen: we reset the shared-ptr and the current code will pick up on it.
There is a small chance that we're done but waiting on the thread, which will still trigger our callbacks while we're waiting to destruct. To fix this we need additional changes.
About the join() throwing, I think it still makes sense to put a try-catch block just to protect against some unforeseen issues.

@Nir-Az
Copy link
Collaborator

Nir-Az commented Dec 30, 2024

In general, after reviewing the code, I think removing the detach() should be good - I arrived at the same conclusion. I assume this was tested and works. I don't think a 15-second period of waiting on exit will happen: we reset the shared-ptr and the current code will pick up on it. There is a small chance that we're done but waiting on the thread, which will still trigger our callbacks while we're waiting to destruct. To fix this we need additional changes. About the join() throwing, I think it still makes sense to put a try-catch block just to protect against some unforeseen issues.

I agree
@AviaAv lets add it

Copy link
Collaborator

@Nir-Az Nir-Az left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Nir-Az Nir-Az merged commit 0bad4a1 into IntelRealSense:development Dec 31, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants