Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Mellanox] update asic and module temperature in a thread for CMIS management #16955

Merged

Conversation

Junchao-Mellanox
Copy link
Collaborator

@Junchao-Mellanox Junchao-Mellanox commented Oct 20, 2023

Why I did it

When module is totally under software control, driver cannot get module temperature/temperature threshold from firmware. In this case, sonic needs to get temperature/temperature threshold from EEPROM. In this PR, a thread thermal updater is created to update module temperature/temperature threshold while software control is enabled.

Work item tracking
  • Microsoft ADO (number only):

How I did it

Query ASIC temperature from SDK sysfs and update hw-management-tc periodically
Query Module temperature from EEPROM and update hw-management-tc periodically

How to verify it

Manual test
New unit tests

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305
  • 202311

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@Junchao-Mellanox Junchao-Mellanox changed the title [Mellanox] update asic and module temperature in a thread [Mellanox] update asic and module temperature in a thread for CMIS management Oct 23, 2023
@liat-grozovik
Copy link
Collaborator

@prgeor kindly reminder to review this PR

@xincunli-sonic
Copy link
Contributor

Can we add a HLD to describe how would this thread work?

@prgeor
Copy link
Contributor

prgeor commented Dec 1, 2023

@Junchao-Mellanox Module temperature is already read by Xcvrd DOM thread so why not use that?

image

@Junchao-Mellanox
Copy link
Collaborator Author

@Junchao-Mellanox Module temperature is already read by Xcvrd DOM thread so why not use that?

image

Hi Prince, thanks for the comments. DOM info is updated every 1 minutes. However, this get_temperature function might be called at any time. So, we decided to have implementation not from DB.

@@ -785,6 +791,77 @@ def get_tx_fault(self):
api = self.get_xcvr_api()
return [False] * api.NUM_CHANNELS if api else None

def get_temperature(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Junchao-Mellanox why we need this thread to read the optics temperature when the same is available in TRANSCEIVER_DOM_SENSOR table

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Prince, thanks for the comments. DOM info is updated every 1 minutes. However, this get_temperature function might be called at any time. So, we decided to have implementation not from DB.

if not api:
return None

thresh_support = api.get_transceiver_thresholds_support()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Junchao-Mellanox thresholds are available in TRANSCEIVER_DOM_THRESHOLD table why read again?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Prince, thanks for the comments. DOM info is updated every 1 minutes. However, this get_temperature function might be called at any time. So, we decided to have implementation not from DB.

@Junchao-Mellanox
Copy link
Collaborator Author

Hi @prgeor, could you please review again?

@liat-grozovik liat-grozovik merged commit 1b84f3d into sonic-net:master Dec 13, 2023
keboliu pushed a commit to keboliu/sonic-buildimage that referenced this pull request Dec 19, 2023
…nagement (sonic-net#16955)

- Why I did it
When module is totally under software control, driver cannot get module temperature/temperature threshold from firmware. In this case, sonic needs to get temperature/temperature threshold from EEPROM. In this PR, a thread thermal updater is created to update module temperature/temperature threshold while software control is enabled.

- How I did it
Query ASIC temperature from SDK sysfs and update hw-management-tc periodically
Query Module temperature from EEPROM and update hw-management-tc periodically

- How to verify it
Manual test
New Unit tests
@Junchao-Mellanox Junchao-Mellanox deleted the master_thermal_updater branch January 8, 2024 01:51
Junchao-Mellanox added a commit to Junchao-Mellanox/sonic-buildimage that referenced this pull request Jan 8, 2024
…r CMIS management (sonic-net#16955)

- Why I did it
When module is totally under software control, driver cannot get module temperature/temperature threshold from firmware. In this case, sonic needs to get temperature/temperature threshold from EEPROM. In this PR, a thread thermal updater is created to update module temperature/temperature threshold while software control is enabled.

- How I did it
Query ASIC temperature from SDK sysfs and update hw-management-tc periodically
Query Module temperature from EEPROM and update hw-management-tc periodically

- How to verify it
Manual test
New Unit tests
yxieca pushed a commit that referenced this pull request Jan 8, 2024
…r CMIS management (#16955) (#17699)

- Why I did it
When module is totally under software control, driver cannot get module temperature/temperature threshold from firmware. In this case, sonic needs to get temperature/temperature threshold from EEPROM. In this PR, a thread thermal updater is created to update module temperature/temperature threshold while software control is enabled.

- How I did it
Query ASIC temperature from SDK sysfs and update hw-management-tc periodically
Query Module temperature from EEPROM and update hw-management-tc periodically

- How to verify it
Manual test
New Unit tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants