Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

portsorch: don't call updateDbPortOperStatus on all port types #3505

Merged
merged 3 commits into from
Feb 18, 2025

Conversation

bradh352
Copy link
Contributor

@bradh352 bradh352 commented Feb 10, 2025

What I did

Only call updateDbPortOperStatus() on PHY and TUNNEL ports.

Why I did it

PORT_TABLE contains PortChannel oper_status entries which are not expected by portsorch which leads to warm/fastreboot failures like:

2025 Feb 10 09:33:07.111055 sonic NOTICE swss#orchagent: :- bake: foundPortConfigDone = 1
2025 Feb 10 09:33:07.111080 sonic NOTICE swss#orchagent: :- bake: foundPortInitDone = 1
2025 Feb 10 09:33:07.111395 sonic NOTICE swss#orchagent: :- bake: m_portTable->getKeys 263
2025 Feb 10 09:33:07.111403 sonic NOTICE swss#orchagent: :- bake: portCount = 257, m_portCount = 0
2025 Feb 10 09:33:07.111403 sonic ERR swss#orchagent: :- bake: Invalid port table: portCount, expecting 257, got 261

Regression caused by #3383

How I verified it

Asked @stepanblyschak to verify who reported issue.

Details if related
Fixes sonic-net/sonic-buildimage#21688

Signed-off-by: Brad House (@bradh352)

@bradh352 bradh352 requested a review from prsunny as a code owner February 10, 2025 12:40
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

PORT_TABLE contains PortChannel oper_status entries which are not
expected by portsorch which leads to warm/fastreboot failures
like:
```
2025 Feb 10 09:33:07.111055 sonic NOTICE swss#orchagent: :- bake: foundPortConfigDone = 1
2025 Feb 10 09:33:07.111080 sonic NOTICE swss#orchagent: :- bake: foundPortInitDone = 1
2025 Feb 10 09:33:07.111395 sonic NOTICE swss#orchagent: :- bake: m_portTable->getKeys 263
2025 Feb 10 09:33:07.111403 sonic NOTICE swss#orchagent: :- bake: portCount = 257, m_portCount = 0
2025 Feb 10 09:33:07.111403 sonic ERR swss#orchagent: :- bake: Invalid port table: portCount, expecting 257, got 261
```

Fixes sonic-net/sonic-buildimage#21688

Signed-off-by: Brad House (@bradh352)
@bradh352 bradh352 force-pushed the bradh352/fastreboot-regression branch from 5172cec to 6480e04 Compare February 10, 2025 12:41
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@stepanblyschak stepanblyschak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, can you cover this with a VS test?

@bradh352
Copy link
Contributor Author

Looks good to me, can you cover this with a VS test?

I'm not at all familiar with the vs testing framework. It would be a lot easier in test_portchannel.py to check to make sure there are no PortChannel entries in the port table. Would that be acceptable since that would have caught this regression?

@stepanblyschak
Copy link
Contributor

Looks good to me, can you cover this with a VS test?

I'm not at all familiar with the vs testing framework. It would be a lot easier in test_portchannel.py to check to make sure there are no PortChannel entries in the port table. Would that be acceptable since that would have caught this regression?

Sure

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bradh352
Copy link
Contributor Author

Looks good to me, can you cover this with a VS test?

I'm not at all familiar with the vs testing framework. It would be a lot easier in test_portchannel.py to check to make sure there are no PortChannel entries in the port table. Would that be acceptable since that would have caught this regression?

Sure

Test added, I haven't tested locally but CI here should validate I didn't make a typo or something.

@bradh352
Copy link
Contributor Author

@stepanblyschak

Looks good to me, can you cover this with a VS test?

I'm not at all familiar with the vs testing framework. It would be a lot easier in test_portchannel.py to check to make sure there are no PortChannel entries in the port table. Would that be acceptable since that would have caught this regression?

Sure

Test added, I haven't tested locally but CI here should validate I didn't make a typo or something.

Looks like all tests passed. Can you approve?

Co-authored-by: Stepan Blyshchak <[email protected]>
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bradh352
Copy link
Contributor Author

@stepanblyschak maybe now approve it? :)

@bingwang-ms
Copy link
Contributor

@prsunny Can you help review?

@prsunny
Copy link
Collaborator

prsunny commented Feb 18, 2025

@prgeor, @vaibhavhd for viz

@prsunny prsunny merged commit d3d95ab into sonic-net:master Feb 18, 2025
15 checks passed
@vaibhavhd
Copy link
Contributor

Has someone verified that the warm-reboot issue does not exist with the TUNNEL oper status still present in the DB? The failure was seen due to portchannel in the DB and that I believe is fixed. But, we may still have a problem with TUNNEL status in DB. That test_wr_arp validates that part by running warm-reboot with VXLAN tunnel programmed. Please check.

@bradh352
Copy link
Contributor Author

bradh352 commented Feb 18, 2025

Has someone verified that the warm-reboot issue does not exist with the TUNNEL oper status still present in the DB? The failure was seen due to portchannel in the DB and that I believe is fixed. But, we may still have a problem with TUNNEL status in DB. That test_wr_arp validates that part by running warm-reboot with VXLAN tunnel programmed. Please check.

The tunnel port type gets routed to a different table so there's no way it could cause the same issue:

void PortsOrch::updateDbPortOperStatus(const Port& port, sai_port_oper_status_t status) const
{
SWSS_LOG_ENTER();
if(port.m_type == Port::TUNNEL)
{
VxlanTunnelOrch* tunnel_orch = gDirectory.get<VxlanTunnelOrch*>();
tunnel_orch->updateDbTunnelOperStatus(port.m_alias, status);
return;
}
vector<FieldValueTuple> tuples;
FieldValueTuple tuple("oper_status", oper_status_strings.at(status));
tuples.push_back(tuple);
m_portTable->set(port.m_alias, tuples);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

warm/fast-reboot fails on latest master due to error in PortsOrch::bake(): Invalid port table
6 participants