Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cacl] caclmgrd reports errors when dataplane ACL rules are deleted #5544

Closed
daall opened this issue Oct 5, 2020 · 2 comments · Fixed by #5560
Closed

[cacl] caclmgrd reports errors when dataplane ACL rules are deleted #5544

daall opened this issue Oct 5, 2020 · 2 comments · Fixed by #5560

Comments

@daall
Copy link
Contributor

daall commented Oct 5, 2020

Description
We observe that the acl counter CRM test fails against the 201911 image due to loganalyzer failures. These failures are because caclmgrd is writing error messages to the syslog referencing dataplane ACL rules. These failures are observed during test teardown.

Steps to reproduce the issue:

  1. Run test_crm.py::test_acl_counter against 201911 image

Describe the results you received:

Oct  4 09:25:39.761878 str-dx010-acs-1 ERR caclmgrd[2593]: :- pops: Failed to get content for table key ACL_RULE|DATAACL|RULE_47

Describe the results you expected:
Tests should pass, no errors should be reported for dataplane ACL rules.

Output of show version:

admin@str-dx010-acs-1:~$ show ver

SONiC Software Version: SONiC.20191130.50
Distribution: Debian 9.13
Kernel: 4.9.0-11-2-amd64
Build commit: fb8dbcdd3
Build date: Wed Sep 30 23:47:35 UTC 2020
Built by: sonicbld@jenkins-slave-phx-2

Platform: x86_64-cel_seastone-r0
HwSKU: Celestica-DX010-C32
ASIC: broadcom
Serial Number: DX010F2B018B03BY100009
Uptime: 18:29:34 up  3:35,  2 users,  load average: 0.71, 0.71, 1.17

Docker images:
REPOSITORY                 TAG                 IMAGE ID            SIZE
docker-snmp-sv2            20191130.50         8028250707ba        348MB
docker-snmp-sv2            latest              8028250707ba        348MB
docker-fpm-frr             20191130.50         f4baba63a326        335MB
docker-fpm-frr             latest              f4baba63a326        335MB
docker-lldp-sv2            20191130.50         f0ba69be6205        312MB
docker-lldp-sv2            latest              f0ba69be6205        312MB
docker-orchagent           20191130.50         627673b32849        333MB
docker-orchagent           latest              627673b32849        333MB
docker-teamd               20191130.50         c0cd4579c21d        314MB
docker-teamd               latest              c0cd4579c21d        314MB
docker-syncd-brcm          20191130.50         38d2e8f41d91        436MB
docker-syncd-brcm          latest              38d2e8f41d91        436MB
docker-platform-monitor    20191130.50         77d4b95e5eaa        357MB
docker-platform-monitor    latest              77d4b95e5eaa        357MB
docker-sonic-telemetry     20191130.50         3130b2643362        353MB
docker-sonic-telemetry     latest              3130b2643362        353MB
docker-database            20191130.50         ef988f7f6dc3        289MB
docker-database            latest              ef988f7f6dc3        289MB
docker-dhcp-relay          20191130.50         2659b59c9097        299MB
docker-dhcp-relay          latest              2659b59c9097        299MB
docker-router-advertiser   20191130.50         9985f9526e08        289MB
docker-router-advertiser   latest              9985f9526e08        289MB
k8s.gcr.io/pause           3.2                 80d28bedfe5d        683kB
@jleveque
Copy link
Contributor

jleveque commented Oct 5, 2020

@abdosi: The log message in question is being generated by swsscommon, so it appears that this was introduced when you migrated caclmgrd to use swsscommon in lieu of swsssdk when adding multi-ASIC support. I assume this issue is also present in the master branch. Can you please investigate?

@jleveque jleveque assigned abdosi and unassigned jleveque Oct 5, 2020
@daall daall added the P0 Priority of the issue label Oct 7, 2020
@abdosi abdosi linked a pull request Oct 7, 2020 that will close this issue
@abdosi
Copy link
Contributor

abdosi commented Oct 7, 2020

Possible Root casue: (Not able to reproduce on my local setup)

caclmgrd subscribes to swsscommon.CFG_ACL_RULE_TABLE_NAME Which is also subscribed by Orchagent also .

In this test case we are loading and deleting DATAACL rules which gets processed by caclmgrd and causes iptables to be reprogrammed for each event (though not causing any issue with rule itself) but this take time

Meanwhile acl-loader delete on DATAACL gets called and Orchagent process those events. Meanwhile caclmgrd being slow still processing SET events and can trigger above error where Delete is already done by Orchagent and caclmgrd set event is still trying to get information.

PR##5560 ignores DATAACL event by caclmgrd and should avoid caclmgrd getting slow and hopefully should not se above error messages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants