Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize counter polling interval by making it more accurate #1457

Merged
merged 6 commits into from
Feb 7, 2025

Conversation

stephenxs
Copy link
Contributor

@stephenxs stephenxs commented Nov 8, 2024

What I did

Optimize the counter-polling performance in terms of polling interval accuracy

  1. Enable bulk counter-polling to run at a smaller chunk size
    There is one counter-polling thread for each counter group. All such threads can compete for the critical sections at the vendor SAI level, which means a counter-polling thread can wait for a critical section if another thread has been in it, which introduces latency for the waiting counter group.
    An example is the competition between the PFC watchdog and the port counter groups.
    The port counter group contains many counters and is polled in a bulk mode which takes a relatively longer time. The PFC watchdog counter group contains only a few counters but is polled at a short interval. Sometimes, PFC watchdog counters need to wait before polling, which makes the polling interval inaccurate and prevents the PFC storm from being detected in time.
    To resolve this issue, we can reduce the chunk size of the port counter group. The port counter group polls the counters of all ports in a single bulk operation by default. By using a smaller chunk size, it polls the counters in several bulk operations with each polling counter of a subset (whose size <= chunk size) of all ports.
    By doing so, the port counter group stays in the critical section for a shorter time and the PFC watchdog is more likely to be scheduled to poll counters and detect the PFC storm in time.

  2. Collect the time stamp immediately after vendor SAI API returns.
    Currently, many counter groups require a Lua plugin to execute based on polling interval, to calculate rates, detect certain events, etc.
    Eg. For PFC watchdog counter group to PFC storm. In this case, the polling interval is calculated based on the difference of time stamps between the current and last poll to avoid deviation due to scheduling latency. However, the timestamp is collected in the Lua plugin which is several steps after the SAI API returns and is executed in a different context (redis-server). Both introduce even larger deviations. To overcome this, we collect the timestamp immediately after the SAI API returns.

Depends on

  1. Define bulk chunk size and bulk chunk size per counter ID #1519
  2. Optimize counter polling interval by making it more accurate sonic-swss#3391

Why I did it

How I verified it

Run regression test and observe counter-polling performance.

A comparison test shows very good results if we put any/or all of the above optimizations.

Details if related

For 2, each counter group contains more than one counter context based on the type of objects. counter context is mapped from (group, object type). But the counters fetched from different counter groups will be pushed into the same entry for the same objects.
eg. PFC_WD group contains counters of ports and queues. PORT group contains counters of ports. QUEUE_STAT group contains counters of queues.
Both PFC_WD and PORT groups will push counter data into an item representing a port. but each counter has its own polling interval, which means counter IDs polled from different counter groups can be polled with different time stamps.
We use the name of a counter group to identify the time stamp of the counter group.
Eg. In port counter entry, PORT_timestamp represents last time when the port counter group polls the counters. PFC_WD_timestamp represents the last time when the PFC watchdog counter group polls the counters

@stephenxs
Copy link
Contributor Author

This PR requires swss to be updated correspondingly. The swss PR will be opened soon.

@stephenxs
Copy link
Contributor Author

Depends on sonic-net/sonic-swss-common#950

@stephenxs stephenxs force-pushed the counter-optimization-all-in-one branch from c14fd22 to 6b362f6 Compare November 18, 2024 02:15
@stephenxs stephenxs marked this pull request as ready for review November 25, 2024 12:06
@stephenxs stephenxs force-pushed the counter-optimization-all-in-one branch from 6b362f6 to b82e233 Compare November 25, 2024 12:06
@stephenxs
Copy link
Contributor Author

HLD sonic-net/SONiC#1864

@stephenxs stephenxs force-pushed the counter-optimization-all-in-one branch from b82e233 to 5442b37 Compare December 13, 2024 06:54
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@kcudnik
Copy link
Collaborator

kcudnik commented Dec 24, 2024

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@stephenxs
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@stephenxs
Copy link
Contributor Author

Hi @kcudnik
Would you please review and approve this PR?
I added two commits to adjust code according to WRED/ECN and policer counters which are recently merged since last approval.
Thanks.

@stephenxs
Copy link
Contributor Author

Hi @kcudnik
All test passes. Could you please help to merge it?
Thanks

@kcudnik kcudnik merged commit 8fe5596 into sonic-net:master Feb 7, 2025
15 checks passed
@stephenxs stephenxs deleted the counter-optimization-all-in-one branch February 7, 2025 20:54
@r12f
Copy link
Contributor

r12f commented Feb 9, 2025

hi @kperumalbfn , do you mind to help get this PR merged into 202411?

mssonicbld added a commit to mssonicbld/sonic-sairedis that referenced this pull request Feb 10, 2025
Define bulk chunk size and bulk chunk size per counter ID.
This is to resolve the VS test failure in sonic-net#1457, which is caused by loop dependency.
In PR sonic-net#1457, new fields `bulk_chunk_size` and `bulk_chunk_size_per_prefix` have been introduced to `sai_redis_flex_counter_group_parameter_t` whose instances are initialized by orchagent.
However, the orchagent is still compiled with the old sairedis header, which prevents both new fields from being uninitialized which in turn fails vs test.

We have to split this PR into two:
1. sonic-net#1519 which updates the header sairedis.h only. the motivation is to compile swss(orchagent) with both new fields initiated.
2. sonic-net#1457 contains all the rest of code

The order to merge:
1. sonic-net#1519
2. sonic-net/sonic-swss#3391
3. sonic-net#1457
@kperumalbfn
Copy link

@stephenxs could you pls check the conflict in 202411?

mssonicbld added a commit that referenced this pull request Feb 10, 2025
Define bulk chunk size and bulk chunk size per counter ID.
This is to resolve the VS test failure in #1457, which is caused by loop dependency.
In PR #1457, new fields `bulk_chunk_size` and `bulk_chunk_size_per_prefix` have been introduced to `sai_redis_flex_counter_group_parameter_t` whose instances are initialized by orchagent.
However, the orchagent is still compiled with the old sairedis header, which prevents both new fields from being uninitialized which in turn fails vs test.

We have to split this PR into two:
1. #1519 which updates the header sairedis.h only. the motivation is to compile swss(orchagent) with both new fields initiated.
2. #1457 contains all the rest of code

The order to merge:
1. #1519
2. sonic-net/sonic-swss#3391
3. #1457
@stephenxs
Copy link
Contributor Author

stephenxs commented Feb 13, 2025

Hi @kperumalbfn
We need the following PRs to be cherry-picked first to resolve the conflict

  1. sonic-sairedis : Wred stats feature changes on Sai-redis and Syncd #1234
  2. [FC] Support Policer Counter #1484

However, there is also a conflict in cherry-picking 1. #1234 to 202411.
Could you help resolve the conflict or create a backport PR?
thanks

@r12f
Copy link
Contributor

r12f commented Feb 14, 2025

hi @kperumalbfn , do you mind to help with the previous PRs?

@kperumalbfn
Copy link

@stephenxs We are not planning to merge WRED stats to 202411 branch as it is a full feature and requires multiple PRs to be merged. Could you double commit this PR to 202411 branch?

#1484 is cherrypicked to 202411 branch.

@stephenxs
Copy link
Contributor Author

@stephenxs We are not planning to merge WRED stats to 202411 branch as it is a full feature and requires multiple PRs to be merged. Could you double commit this PR to 202411 branch?

#1484 is cherrypicked to 202411 branch.

Thank you for letting me know it.
I will create a back port PR

@stephenxs
Copy link
Contributor Author

@stephenxs We are not planning to merge WRED stats to 202411 branch as it is a full feature and requires multiple PRs to be merged. Could you double commit this PR to 202411 branch?

#1484 is cherrypicked to 202411 branch.

@kperumalbfn what about 202012-msft? thanks

@kperumalbfn
Copy link

@stephenxs 202412-msft branch will have all the WRED stats related PRs. So we no need additional PR.

@kperumalbfn
Copy link

@r12f Could you merge the WRED PRs to 202412 branch. We have few PRs in swss, sairedis and have tagged them.

stephenxs added a commit to stephenxs/sonic-sairedis that referenced this pull request Feb 19, 2025
…et#1457)

What I did

Optimize the counter-polling performance in terms of polling interval accuracy

Enable bulk counter-polling to run at a smaller chunk size
There is one counter-polling thread for each counter group. All such threads can compete for the critical sections at the vendor SAI level, which means a counter-polling thread can wait for a critical section if another thread has been in it, which introduces latency for the waiting counter group.
An example is the competition between the PFC watchdog and the port counter groups.
The port counter group contains many counters and is polled in a bulk mode which takes a relatively longer time. The PFC watchdog counter group contains only a few counters but is polled at a short interval. Sometimes, PFC watchdog counters need to wait before polling, which makes the polling interval inaccurate and prevents the PFC storm from being detected in time.
To resolve this issue, we can reduce the chunk size of the port counter group. The port counter group polls the counters of all ports in a single bulk operation by default. By using a smaller chunk size, it polls the counters in several bulk operations with each polling counter of a subset (whose size <= chunk size) of all ports.
By doing so, the port counter group stays in the critical section for a shorter time and the PFC watchdog is more likely to be scheduled to poll counters and detect the PFC storm in time.

Collect the time stamp immediately after vendor SAI API returns.
Currently, many counter groups require a Lua plugin to execute based on polling interval, to calculate rates, detect certain events, etc.
Eg. For PFC watchdog counter group to PFC storm. In this case, the polling interval is calculated based on the difference of time stamps between the current and last poll to avoid deviation due to scheduling latency. However, the timestamp is collected in the Lua plugin which is several steps after the SAI API returns and is executed in a different context (redis-server). Both introduce even larger deviations. To overcome this, we collect the timestamp immediately after the SAI API returns.
stephenxs added a commit to stephenxs/sonic-sairedis that referenced this pull request Feb 19, 2025
…et#1457)

What I did

Optimize the counter-polling performance in terms of polling interval accuracy

Enable bulk counter-polling to run at a smaller chunk size
There is one counter-polling thread for each counter group. All such threads can compete for the critical sections at the vendor SAI level, which means a counter-polling thread can wait for a critical section if another thread has been in it, which introduces latency for the waiting counter group.
An example is the competition between the PFC watchdog and the port counter groups.
The port counter group contains many counters and is polled in a bulk mode which takes a relatively longer time. The PFC watchdog counter group contains only a few counters but is polled at a short interval. Sometimes, PFC watchdog counters need to wait before polling, which makes the polling interval inaccurate and prevents the PFC storm from being detected in time.
To resolve this issue, we can reduce the chunk size of the port counter group. The port counter group polls the counters of all ports in a single bulk operation by default. By using a smaller chunk size, it polls the counters in several bulk operations with each polling counter of a subset (whose size <= chunk size) of all ports.
By doing so, the port counter group stays in the critical section for a shorter time and the PFC watchdog is more likely to be scheduled to poll counters and detect the PFC storm in time.

Collect the time stamp immediately after vendor SAI API returns.
Currently, many counter groups require a Lua plugin to execute based on polling interval, to calculate rates, detect certain events, etc.
Eg. For PFC watchdog counter group to PFC storm. In this case, the polling interval is calculated based on the difference of time stamps between the current and last poll to avoid deviation due to scheduling latency. However, the timestamp is collected in the Lua plugin which is several steps after the SAI API returns and is executed in a different context (redis-server). Both introduce even larger deviations. To overcome this, we collect the timestamp immediately after the SAI API returns.
stephenxs added a commit to stephenxs/sonic-sairedis that referenced this pull request Feb 20, 2025
…et#1457)

What I did

Optimize the counter-polling performance in terms of polling interval accuracy

Enable bulk counter-polling to run at a smaller chunk size
There is one counter-polling thread for each counter group. All such threads can compete for the critical sections at the vendor SAI level, which means a counter-polling thread can wait for a critical section if another thread has been in it, which introduces latency for the waiting counter group.
An example is the competition between the PFC watchdog and the port counter groups.
The port counter group contains many counters and is polled in a bulk mode which takes a relatively longer time. The PFC watchdog counter group contains only a few counters but is polled at a short interval. Sometimes, PFC watchdog counters need to wait before polling, which makes the polling interval inaccurate and prevents the PFC storm from being detected in time.
To resolve this issue, we can reduce the chunk size of the port counter group. The port counter group polls the counters of all ports in a single bulk operation by default. By using a smaller chunk size, it polls the counters in several bulk operations with each polling counter of a subset (whose size <= chunk size) of all ports.
By doing so, the port counter group stays in the critical section for a shorter time and the PFC watchdog is more likely to be scheduled to poll counters and detect the PFC storm in time.

Collect the time stamp immediately after vendor SAI API returns.
Currently, many counter groups require a Lua plugin to execute based on polling interval, to calculate rates, detect certain events, etc.
Eg. For PFC watchdog counter group to PFC storm. In this case, the polling interval is calculated based on the difference of time stamps between the current and last poll to avoid deviation due to scheduling latency. However, the timestamp is collected in the Lua plugin which is several steps after the SAI API returns and is executed in a different context (redis-server). Both introduce even larger deviations. To overcome this, we collect the timestamp immediately after the SAI API returns.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants