Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make rmm::prefetch fault tolerant #1649

Open
wants to merge 1 commit into
base: branch-24.10
Choose a base branch
from

Conversation

bdice
Copy link
Contributor

@bdice bdice commented Aug 15, 2024

Description

Closes #1648.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@bdice bdice requested review from a team as code owners August 15, 2024 20:10
@bdice bdice requested review from vyasr and miscco August 15, 2024 20:10
@github-actions github-actions bot added Python Related to RMM Python API cpp Pertains to C++ code labels Aug 15, 2024
@bdice bdice added feature request New feature or request non-breaking Non-breaking change labels Aug 15, 2024
@bdice bdice self-assigned this Aug 15, 2024
@@ -33,6 +33,11 @@ namespace rmm::mr {
/**
* @brief Resource that prefetches all memory allocations.
*
* Note that prefetching is a no-op if the upstream resource is not using
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adaptor has more information than rmm::prefetch does. It could (in its ctor) detect that the upstream MR is not managed memory, or that the device doesn't support managed memory. Then it could skip the prefetching, saving a call to cudaMemPrefetchAsync on every alloc.

If it doesn't do this, then there is overhead on alloc on systems that don't support prefetching, and so this isn't truly a no-op and the docs are technically inaccurate.

I think it would be better to make it truly a no-op. What do you think @bdice ?

Copy link
Contributor Author

@bdice bdice Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree 100% and want to do that! I did not implement this because I did not know how to check if the upstream MR is using managed memory. Do you have advice on how to do that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do a test allocation, try to prefetch it, check for errors, deallocate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cpp Pertains to C++ code feature request New feature or request non-breaking Non-breaking change Python Related to RMM Python API
Projects
Status: Review
Development

Successfully merging this pull request may close these issues.

[FEA] Make rmm::prefetch fault tolerant
2 participants