diff --git a/doc/bulk_counter/bulk_counter.md b/doc/bulk_counter/bulk_counter.md index f4eb344a6d..c3e963ad1e 100644 --- a/doc/bulk_counter/bulk_counter.md +++ b/doc/bulk_counter/bulk_counter.md @@ -23,28 +23,26 @@ PR https://github.com/opencomputeproject/SAI/pull/1352/files introduced new SAI - sai_bulk_object_get_stats - sai_bulk_object_clear_stats -SONiC flex counter infrastructure shall utilize bulk stats API to gain better performance. This document discusses how to integrate these two new APIs to SONiC. +SONiC flex counter infrastructure shall utilize bulk stats API to gain better performance. This document discusses how to integrate these two new APIs to SONiC. ### Requirements - Syncd shall use bulk stats APIs based on object type. E.g. for a counter group that queries queue and pg stats, queue stats support bulk while pg stats does not, in that case queue stats shall use bulk API, pg stats shall use non bulk API -- For a certain object type in a counter group, it shall use bulk stats only if: - - The stats capability for each counter IDs shall match the stats mode of the counter group - - Each object queries exactly the same counter IDs. (Requirement from function signature of sai_bulk_object_get_stats and sai_bulk_object_clear_stats) +- For a certain object in a counter group, it shall use bulk stats only if all counter IDs support bulk API - Syncd shall automatically fall back to old way if bulk stats APIs are not supported -- Syncd shall utilize API sai_query_stats_capability to query bulk capability. Syncd shall treat counter as no bulk capability if API sai_query_stats_capability return error except SAI_STATUS_BUFFER_OVERFLOW (SAI_STATUS_BUFFER_OVERFLOW requires a retry with larger buffer) -- Syncd shall call bulk stats API in flex counter thread and avoid calling it in main thread to make sure main thread only handles short and high priority tasks. (This is the default behavior in flex counter infrastructure) +- Syncd shall utilize sai_bulk_object_get_stats/sai_bulk_object_clear_stats to query bulk capability. Syncd shall treat counter as no bulk capability if API return error +- Syncd shall call bulk stats API in flex counter thread and avoid calling it in main thread to make sure main thread only handles short and high priority tasks. (This is the default behavior in current flex counter infrastructure) - In phase 1, the change is limited to syncd only, no CLI/swss change. Syncd shall deduce the bulk stats mode according to the stats mode defined in FLEX DB: - SAI_STATS_MODE_READ -> SAI_STATS_MODE_BULK_READ - SAI_STATS_MODE_READ_AND_CLEAR -> SAI_STATS_MODE_BULK_READ_AND_CLEAR ### Architecture Design -For each counter group, different statistic type is allowed to chooose bulk or non-bulk API based on SAI capability. +For each counter group, different statistic type is allowed to choose bulk or non-bulk API based on vendor SAI implementation. ![architecture](/doc/bulk_counter/bulk_counter.svg). -> Note: In the picture, pg/queue watermark statistic use bulk API and buffer watermark statistic uses non-bulk API. This is just an example to show the design idea. +> Note: In the picture, pg/queue watermark statistic use bulk API and buffer watermark statistic uses non-bulk API. This is just an example to show the design idea. ### High-Level Design @@ -56,7 +54,7 @@ Changes shall be made to sonic-sairedis to support this feature. No CLI change. ##### Bulk Statistic Context -A new structure shall be added to FlexCounter class. +A new structure shall be added to FlexCounter class. This structure is created because: @@ -64,35 +62,32 @@ This structure is created because: - Avoid constructing these information each time collecting statistic. The bulk context shall only be updated under below cases: - New object join counter group. E.g. adding a new port object. - Existing object leave counter group. E.g removing an existing port object. - - Other case such as counter IDs is updated by user. + - Other case such as counter IDs is updated by upper layer. ```cpp struct BulkStatsContext { sai_object_type_t object_type; std::vector object_vids; - std::vector object_keys; + std::vector object_keys; std::vector counter_ids; - std::vector statuses; + std::vector object_statuses; std::vector counters; - std::shared_ptr stats_capas; }; ``` - object_type: object type. -- object_keys: objects that participate the bulk call. E.g. for port, SAI object id value shall be put into sai_object_key_t structure. +- object_vids: virtual IDs. +- object_keys: real IDs. - counter_ids: SAI statistic IDs that will be queried/cleared by the bulk call. -- statuses: SAI bulk API return value for each object. +- object_statuses: SAI bulk API return value for each object. - counters: counter values that will be fill by vendor SAI. -- stats_capas: stats capability for each statitstic IDs for current object type. The flow of how to updating bulk context will be discussed in following section. -For a given object type, diffrent object instance may support different stats capability, so, a list of BulkStatsContext shall be added to FlexCounter class for each object type. +For a given object type, different object instance may support different stats capability, so, a map of BulkStatsContext shall be added to FlexCounter class for each object type. ```cpp - -std::vector m_portBulkContexts; -std::vector m_priorityGroupBulkContexts; +std::map, BulkStatsContext> m_portBulkContexts; ... ``` @@ -124,7 +119,7 @@ N/A No extra logic on SONiC side is needed to handle warmboot/fastboot. -- As fastboot dealys all counters querying, this feature does not affect fastboot. +- As fastboot delays all counters querying, this feature does not affect fastboot. - For warmboot, it is vendor SAI implementation's responsible to make sure that there must be no error if warmboot starts while bulk API is called. ### Restrictions/Limitations @@ -135,12 +130,12 @@ No extra logic on SONiC side is needed to handle warmboot/fastboot. ### Performance Improvement -A rough test has been done on Nvidia platform for queue. +A rough test has been done on Nvidia platform for queue. - Non bulk API: get stats for one queue takes X seconds; get stats for 32 port * 8 queue is 256X seconds; - Bulk API: get stats for one queue takes Y seconds; get stats for 32 port * 8 queue is almost Y seconds; -X is almost euqal to Y. So, more object instances, more performance improvement. +X is almost equal to Y. So, more object instances, more performance improvement. ### Testing Requirements/Design @@ -148,5 +143,10 @@ As this feature does not introduce any new function, unit test shall be good eno #### Unit Test cases -- test_update_bulk_context -- test_bulk_collect_stats +- addRemoveBulkCounter +- counterIdChange + - not support bulk -> support bulk + - support bulk but counter IDs change + - support bulk with different counter IDs + - support bulk -> not support bulk + - not support bulk but counter IDs change diff --git a/doc/bulk_counter/counter_collect.svg b/doc/bulk_counter/counter_collect.svg index f8d4364f46..cc167c4fda 100644 --- a/doc/bulk_counter/counter_collect.svg +++ b/doc/bulk_counter/counter_collect.svg @@ -1 +1 @@ -
Collect stats start
Collect stats start
Old way
Old way
Support bulk?
Support bulk?
Call sai_object_bulk_get_stats
Call sai_object_bulk...
Success?
Success?
Need Clear?
Need Clear?
Call sai_object_bulk_clear_stats
Call sai_object_bulk...
Success?
Success?
Fill Counters DB
Fill Counters DB
yes
yes
no
no
yes
yes
no
no
no
no
no
no
yes
yes
yes
yes
Viewer does not support full SVG 1.1
\ No newline at end of file +
For each bulk context
For each bulk context
Call sai_object_bulk_get_stats
Call sai_object_bulk...
Success?
Success?
Fill Counters DB
Fill Counters DB
yes
yes
Log warning
Log warning
For each item in object_statuses
For each item in obj...
Success?
Success?
Log error
Log error
yes
yes
no
no
no
no
Viewer does not support full SVG 1.1
\ No newline at end of file diff --git a/doc/bulk_counter/object_join_counter_group.svg b/doc/bulk_counter/object_join_counter_group.svg index 5b242c530f..ce5475b421 100644 --- a/doc/bulk_counter/object_join_counter_group.svg +++ b/doc/bulk_counter/object_join_counter_group.svg @@ -1 +1 @@ -
New entry to FlexCounter table
New entry to FlexCou...
Query stats capability
Query stats capabili...
Extract object ID and counter IDs
Extract object ID an...
FlexCounter::addCounter
FlexCounter::addCoun...
Success?
Success?
All Counter IDs support bulk?
All Counter IDs support b...
Bulk not supported
Bulk not supported
yes
yes
no
no
yes
yes
no
no
Find existing bulk context?
Find existing bulk contex...
Update bulk context
Update bulk context
yes
yes
no
no
Create bulk context
Create bulk context
Viewer does not support full SVG 1.1
\ No newline at end of file +
New entry to FlexCounter table
New entry to FlexCou...
Get supported counter IDs
Get supported counte...
Extract object ID and counter IDs
Extract object ID an...
FlexCounter::addCounter
FlexCounter::addCoun...
Success?
Success?
All Counter IDs support bulk?
All Counter IDs support b...
Bulk not supported
Bulk not supported
yes
yes
no
no
yes
yes
no
no
Find existing bulk context?
Find existing bulk contex...
Update bulk context
Update bulk context
yes
yes
no
no
Create bulk context
Create bulk context
Viewer does not support full SVG 1.1
\ No newline at end of file