Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SmartSwitch] Gnmi resets due to memory exceeding threshold when scaled DASH config is applied #21590

Open
vivekrnv opened this issue Jan 31, 2025 · 6 comments
Assignees
Labels
Triaged this issue has been triaged

Comments

@vivekrnv
Copy link
Contributor

vivekrnv commented Jan 31, 2025

  • Apply big config via gnmi
  • Eg: Apply 5M DASH_ROUTE_TABLE in chunks of 20K

while pushing dash_config to gnmi , the gnmi exceed memory threshold and is reset by monit. Config wont be applied as expected in this case.

2024 Dec 19 09:46:24.706341 smartswitch ERR monit[2176]: 'container_memory_gnmi' status failed (3) -- [gnmi]: Memory usage (588218368 Bytes) is larger than the threshold (419430400 Bytes)!

syslog:

2024 Dec 19 09:46:24.567180 smartswitch NOTICE gnmi#telemetry: :- dbUpdateThread: dbUpdateThread for table: DASH_ROUTE_TABLE is exiting
2024 Dec 19 09:46:24.706341 smartswitch ERR monit[2176]: 'container_memory_gnmi' status failed (3) -- [gnmi]: Memory usage (588218368 Bytes) is larger than the threshold (419430400 Bytes)!
2024 Dec 19 09:46:24.706617 smartswitch INFO monit[2176]: 'container_memory_gnmi' exec: '/usr/bin/restart_service gnmi'
2024 Dec 19 09:46:24.778007 smartswitch INFO restart_service: Resetting failed status of service 'gnmi' ...
2024 Dec 19 09:46:24.809153 smartswitch INFO restart_service: Succeeded to reset failed status of service 'gnmi.service'.
2024 Dec 19 09:46:24.809302 smartswitch INFO restart_service: Restarting service 'gnmi' ...
2024 Dec 19 09:46:24.874363 smartswitch INFO systemd[1]: Stopping gnmi.service - GNMI container...
2024 Dec 19 09:46:24.894671 smartswitch NOTICE admin: Stopping gnmi service...
2024 Dec 19 09:46:24.971303 smartswitch NOTICE gnmi#telemetry: :- dbUpdateThread: dbUpdateThread begin
2024 Dec 19 09:46:24.990763 smartswitch NOTICE admin: Warm boot flag: gnmi false.
2024 Dec 19 09:46:24.997049 smartswitch NOTICE admin: Fast boot flag: gnmi false.
2024 Dec 19 09:46:25.025371 smartswitch NOTICE gnmi#telemetry: :- dbUpdateThread: dbUpdateThread for table: DASH_ROUTE_TABLE is exiting
2024 Dec 19 09:46:25.148253 smartswitch INFO memory_checker: [memory_checker] Container ID of 'gnmi' is: '2feb3b901b952f73f57162817a4393654c13dc6899730a089970dd1902f14d90'.
2024 Dec 19 09:46:25.148332 smartswitch INFO memory_checker: [memory_checker] The memory usage of container 'gnmi' is '702083072' Bytes!
2024 Dec 19 09:46:25.149295 smartswitch INFO memory_checker: [memory_checker] The cache usage of container 'gnmi' is '45056' Bytes!
2024 Dec 19 09:46:25.149350 smartswitch INFO memory_checker: [memory_checker] Total memory usage of container 'gnmi' is '702038016' Bytes!
2024 Dec 19 09:46:25.152430 smartswitch NOTICE memory_checker: :- publish: EVENT_PUBLISHED: {"sonic-events-host:mem-threshold-exceeded":{"ctr_name":"gnmi","mem_usage":"702038016.00","threshold":"419430400","timestamp":"2024-12-19T09:46:25.152331Z"}}
2024 Dec 19 09:46:25.341549 smartswitch DEBUG container: read_data: config:True feature:gnmi fields:[('set_owner', 'local'), ('no_fallback_to_local', False), ('state', 'disabled')] val:['local', False, 'enabled']
2024 Dec 19 09:46:25.342403 smartswitch DEBUG container: read_data: config:False feature:gnmi fields:[('current_owner', 'none'), ('remote_state', 'none'), ('container_id', '')] val:['none', 'none', '']
2024 Dec 19 09:46:25.343268 smartswitch DEBUG container: container_stop: gnmi: set_owner:local current_owner:none remote_state:none docker_id:gnmi
2024 Dec 19 09:46:25.409693 smartswitch INFO gnmi#supervisord 2024-12-19 09:46:25,409 WARN received SIGTERM indicating exit request
2024 Dec 19 09:46:25.410346 smartswitch INFO gnmi#supervisord 2024-12-19 09:46:25,409 INFO waiting for supervisor-proc-exit-listener, rsyslogd, gnmi-native, dialout to die
2024 Dec 19 09:46:25.416604 smartswitch INFO gnmi#supervisord 2024-12-19 09:46:25,416 WARN stopped: dialout (terminated by SIGTERM)
2024 Dec 19 09:46:25.438753 smartswitch INFO gnmi#supervisord 2024-12-19 09:46:25,438 INFO stopped: gnmi-native (exit status 0)
2024 Dec 19 09:46:26.606091 smartswitch INFO container: docker cmd: wait for gnmi
2024 Dec 19 09:46:26.606606 smartswitch INFO container: docker cmd: stop for gnmi
2024 Dec 19 09:46:26.680504 smartswitch NOTICE admin: Stopped gnmi service...
2024 Dec 19 09:46:26.683450 smartswitch INFO systemd[1]: gnmi.service: Deactivated successfully.
2024 Dec 19 09:46:26.684326 smartswitch INFO systemd[1]: Stopped gnmi.service - GNMI container.
2024 Dec 19 09:46:26.730690 smartswitch INFO systemd[1]: Starting gnmi.service - GNMI container...
2024 Dec 19 09:46:26.750226 smartswitch NOTICE admin: Starting gnmi service...
2024 Dec 19 09:46:26.950197 smartswitch INFO gnmi.sh[4070087]: Starting existing gnmi container with HWSKU ACS-SN4280
2024 Dec 19 09:46:27.282127 smartswitch DEBUG container: read_data: config:True feature:gnmi fields:[('set_owner', 'local'), ('no_fallback_to_local', False), ('state', 'disabled')] val:['local', False, 'enabled']
2024 Dec 19 09:46:27.282901 smartswitch DEBUG container: read_data: config:False feature:gnmi fields:[('current_owner', 'none'), ('remote_state', 'none'), ('container_id', '')] val:['none', 'none', '']
2024 Dec 19 09:46:27.283784 smartswitch DEBUG container: container_start: gnmi: set_owner:local fallback:True remote_state:none server_connected:false
2024 Dec 19 09:46:27.485569 smartswitch INFO container: docker cmd: start for gnmi
2024 Dec 19 09:46:27.547144 smartswitch NOTICE admin: Started gnmi service...
2024 Dec 19 09:46:27.593990 smartswitch INFO systemd[1]: Started gnmi.service - GNMI container.

Seen on master images

@vivekrnv
Copy link
Contributor Author

@prsunny FYI

@prsunny
Copy link
Contributor

prsunny commented Feb 1, 2025

@qiluo-msft , @prabhataravind for viz

@qiluo-msft
Copy link
Collaborator

@ganglyu could you check?

@ganglyu
Copy link
Contributor

ganglyu commented Feb 2, 2025

@vivekrnv what's the SONiC version?

@vivekrnv
Copy link
Contributor Author

vivekrnv commented Feb 3, 2025

@ganglyu Updated the description

@ganglyu
Copy link
Contributor

ganglyu commented Feb 4, 2025

@vivekrnv
Could you please provide a sample configuration?

@bingwang-ms bingwang-ms added the Triaged this issue has been triaged label Feb 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Triaged this issue has been triaged
Projects
None yet
Development

No branches or pull requests

5 participants