Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove maxPassCount from NVPW_RawMetricsConfig_BeginPassGroup_Params Initialization #303

Merged

Conversation

Treece-Burgess
Copy link
Contributor

@Treece-Burgess Treece-Burgess commented Jan 9, 2025

Pull Request Description

With Cuda versions 12.4.1 and 12.5.1 upon running ./papi_native_avail, events that had multiple passes ( > 1) would not output with the proper value, instead showing Numpass=0. Example from an A100:

--------------------------------------------------------------------------------
| cuda:::gpu__compute_memory_access_throughput.avg.pct_of_peak_sustained_elapsed    |
|            Compute Memory Pipeline : throughput of internal activity within c|
|            aches and DRAM. Units=(percent) Numpass=0                         |       
|     :device=0                                                                |       
|            Mandatory device qualifier [0]                                    |       
--------------------------------------------------------------------------------

This occurred from the function calculate_num_passes where maxPassCount was set to 1 in NVPW_RawMetricsConfig_BeginPassGroup_Params. This PR removes maxPassCount from calculate_num_passes and adds further documentation on the behavior of maxPassCount.

Note: I was able to recreate this behavior with the simpleQuery.cpp script from NVIDIA using Cuda 12.5.1.

Output for papi_native_avail with maxPassCount removed from calculate_num_passes:

Event from the A100:

--------------------------------------------------------------------------------
| cuda:::gpu__compute_memory_access_throughput.avg.pct_of_peak_sustained_elapsed    |
|            Compute Memory Pipeline : throughput of internal activity within c|
|            aches and DRAM. Units=(percent) Numpass=4 (multi-pass not supporte|
|            d)                                                                |       
|     :device=0                                                                |       
|            Mandatory device qualifier [0]                                    |       
--------------------------------------------------------------------------------

Author Checklist

  • Description
    Why this PR exists. Reference all relevant information, including background, issues, test failures, etc
  • Commits
    Commits are self contained and only do one thing
    Commits have a header of the form: module: short description
    Commits have a body (whenever relevant) containing a detailed description of the addressed problem and its solution
  • Tests
    The PR needs to pass all the tests

@Treece-Burgess Treece-Burgess self-assigned this Jan 9, 2025
@Treece-Burgess Treece-Burgess added component-cuda PRs and Issues related to the cuda component type-bug Issues discussing bugs or PRs fixing bugs status-ready-for-review PR is ready to be reviewed labels Jan 9, 2025
@Treece-Burgess Treece-Burgess merged commit 894f65e into icl-utk-edu:master Feb 13, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component-cuda PRs and Issues related to the cuda component status-ready-for-review PR is ready to be reviewed type-bug Issues discussing bugs or PRs fixing bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants