You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Last Friday we detected that the maintenance agents on NY2 are deployed on asd vms.
This is not what we configure in the agents_layout config.
After manually running the checkup_maintenance_agents method the services were removed on the asd vms and deployed on the correct nodes.
I'm not able to find information in the logging why the maintenance agents were deployed on different nodes.
Maybe the update triggered something or it was already deployed at the beginning...
What i don't understand is why the cron didn't detected this and a manually run did.
Agents layouts:
root@NY2SRV0011:~# ovs config get ovs/alba/backends/1ae3eb7e-a197-4021-bec1-888e167bba05/maintenance/agents_layout
["M9e2WY2yg13NlsEg7ssx7nmWmCRmqhsY", "ClniWKVepnpkVIXUHxtLoi6MJmr69wSb"]
root@NY2SRV0011:~# ovs config get ovs/alba/backends/3b408aa1-0407-4d9e-be3a-babce370ab13/maintenance/agents_layout
["RAAF6YiDaEWlmKvoS3Q9m3CRdUG9Dr8k", "iZ577tqejLcOesIg011uVZo2H475CIzN"]
root@NY2SRV0011:~# ovs config get ovs/alba/backends/460620d3-984b-4feb-a217-adf56fb14038/maintenance/agents_layout
["RAAF6YiDaEWlmKvoS3Q9m3CRdUG9Dr8k", "iZ577tqejLcOesIg011uVZo2H475CIzN"]
root@NY2SRV0011:~# ovs config get ovs/alba/backends/e4a9beee-2eff-466a-951d-257bf8395a0a/maintenance/agents_layout
["QBLmEzzfbnL6glKNVkGOR0VoKWiMDxNS", "OKf0P4IhuPdPcZlTFQm6AvXNOyIx8EaV"]
We discovered this when we added a second backend to globalbackend02.
At a certain time some asd vms didn't response to checkmk anymore. (the one with a running maintenance agent)
At that point we saw some errors in the celery log:
Jan 13 05:00:00 NY2SRV0014 celery[29157]: 2018-01-13 05:00:00 20700 -0500 - NY2SRV0014 - 25828/139935744538368 - lib/ensure single - 58714 - INFO - Ensure single CHAINED mode - ID 1515837600_shv8q5CzvC - New task alba.checkup_maintenance_
agents with default params scheduled for execution
Jan 13 05:00:00 NY2SRV0014 celery[29157]: 2018-01-13 05:00:00 20900 -0500 - NY2SRV0014 - 25828/139935744538368 - lib/alba - 58715 - INFO - Loading maintenance information
Jan 13 05:00:00 NY2SRV0014 celery[29157]: 2018-01-13 05:00:00 22900 -0500 - NY2SRV0014 - 29157/139935744538368 - celery/celery.worker.strategy - 58722 - INFO - Received task: get asd statistics[f9d36c0b-9c1b-473d-b4a3-67b51be2adcf]
Jan 13 05:00:00 NY2SRV0014 celery[29157]: 2018-01-13 05:00:00 23000 -0500 - NY2SRV0014 - 29157/139935744538368 - celery/celery.worker.autoscale - 58723 - INFO - Scaling up 1 processes.
Jan 13 05:00:00 NY2SRV0014 celery[29157]: 2018-01-13 05:00:00 27400 -0500 - NY2SRV0014 - 29157/139935744538368 - celery/celery.worker.strategy - 58724 - INFO - Received task: get asd statistics[ca549abb-c0e1-4226-9e2e-27338cce3cae]
Jan 13 05:00:00 NY2SRV0014 celery[29157]: 2018-01-13 05:00:00 27500 -0500 - NY2SRV0014 - 29157/139935744538368 - celery/celery.worker.autoscale - 58725 - INFO - Scaling up 1 processes.
Jan 13 05:00:00 NY2SRV0014 celery[29157]: 2018-01-13 05:00:00 32200 -0500 - NY2SRV0014 - 29157/139935744538368 - celery/celery.worker.job - 58726 - INFO - Task statsmonkey.get_mds_loads[8d36a846-81dd-409e-a98d-ad84a6f8117b] succeeded in 0
.139715231024s: []
Jan 13 05:00:00 NY2SRV0014 celery[29157]: 2018-01-13 05:00:00 52200 -0500 - NY2SRV0014 - 29157/139935744538368 - celery/celery.worker.job - 58727 - INFO - Task get asd statistics[52511ca2-cda2-461d-ad1a-dcc47430aa0a] succeeded in 0.198631
620035s: [{'fields': {'disk_usage': 403495647668.0, 'MultiGet2_low_max': 0.0325810909, 'GetDiskUsage_avg': 3.1140400000000004e-05,...
Jan 13 05:00:00 NY2SRV0014 celery[29157]: 2018-01-13 05:00:00 56100 -0500 - NY2SRV0014 - 29157/139935744538368 - celery/celery.worker.job - 58728 - INFO - Task get asd statistics[f9d36c0b-9c1b-473d-b4a3-67b51be2adcf] succeeded in 0.236320
186872s: [{'fields': {'disk_usage': 114842563930.0, 'MultiGet2_low_max': 45.2979691029, 'GetDiskUsage_avg': 2.9235600000000002e-05,...
Jan 13 05:00:00 NY2SRV0014 celery[29157]: 2018-01-13 05:00:00 58800 -0500 - NY2SRV0014 - 29157/139935744538368 - celery/celery.worker.job - 58729 - INFO - Task get asd statistics[57adccab-20b1-4c61-9fdd-fda75e1c8c9e] succeeded in 0.264004
799072s: [{'fields': {'PartialGets_histogram_1e+04': 1.0, 'disk_usage': 1911106014156.0, 'PartialGets_histogram_1': 1295.0,...
Jan 13 05:00:00 NY2SRV0014 celery[29157]: 2018-01-13 05:00:00 60500 -0500 - NY2SRV0014 - 29157/139935744538368 - celery/celery.worker.job - 58730 - INFO - Task get asd statistics[ca549abb-c0e1-4226-9e2e-27338cce3cae] succeeded in 0.280572
557356s: [{'fields': {'disk_usage': 1190283083963.0, 'MultiGet2_low_max': 32.6431541443, 'GetDiskUsage_avg': 2.55124e-05,...
Jan 13 05:00:01 NY2SRV0014 celery[29157]: 2018-01-13 05:00:01 02400 -0500 - NY2SRV0014 - 25832/139935744538368 - lib/ensure single - 58717 - INFO - Ensure single DEFAULT mode - ID 1515837600_K6VxAtZDNp - Task statsmonkey.get_nsm_stats fin
ished successfully
Jan 13 05:00:01 NY2SRV0014 celery[29157]: 2018-01-13 05:00:01 02400 -0500 - NY2SRV0014 - 25832/139935744538368 - lib/ensure single - 58718 - INFO - Ensure single DEFAULT mode - ID 1515837600_K6VxAtZDNp - Deleting key ovs_ensure_single_sta
tsmonkey.get_nsm_stats
Jan 13 05:00:01 NY2SRV0014 celery[29157]: 2018-01-13 05:00:01 06900 -0500 - NY2SRV0014 - 29157/139935744538368 - celery/celery.worker.job - 58731 - INFO - Task statsmonkey.get_nsm_stats[21ed941f-b1dc-445a-afb1-44112e4d0788] succeeded in 0
.886369946878s: [{'fields': {'CleanupForNamespace_min': 0.0010738372802734375, 'UpdateObject3_avg': 0.0024721622467041016,...
Jan 13 05:00:01 NY2SRV0014 celery[29157]: 2018-01-13 05:00:01 63400 -0500 - NY2SRV0014 - 24782/139935744538368 - lib/ensure single - 58702 - INFO - Ensure single DEFAULT mode - ID 1515837600_C6uyMekIFy - Task statsmonkey.get_disk_safety f
inished successfully
Jan 13 05:00:01 NY2SRV0014 celery[29157]: 2018-01-13 05:00:01 63400 -0500 - NY2SRV0014 - 24782/139935744538368 - lib/ensure single - 58703 - INFO - Ensure single DEFAULT mode - ID 1515837600_C6uyMekIFy - Deleting key ovs_ensure_single_sta
tsmonkey.get_disk_safety
Jan 13 05:00:01 NY2SRV0014 celery[29157]: 2018-01-13 05:00:01 66200 -0500 - NY2SRV0014 - 29157/139935744538368 - celery/celery.worker.job - 58732 - INFO - Task statsmonkey.get_disk_safety[7bf95072-1ea0-473a-aff2-205d888f80b4] succeeded in
1.59923929581s: [{'fields': {'total_objects': 5081976, 'objects': 5081976}, 'tags': {'environment': u'NY2', 'disk_lost': 0, 'backend_name':...
Jan 13 05:00:02 NY2SRV0014 celery[29157]: 2018-01-13 05:00:02 06600 -0500 - NY2SRV0014 - 25828/139935744538368 - celery/celery.redirected - 58718 - WARNING - 2018-01-13 05:00:02 06600 -0500 - NY2SRV0014 - 25828/139935744538368 - extension
s/asdmanagerclient - 58717 - INFO - Request "list_maintenance_services" took 1.18 seconds (internal duration 1.18 seconds)
Jan 13 05:00:02 NY2SRV0014 celery[29157]: 2018-01-13 05:00:02 08400 -0500 - NY2SRV0014 - 25828/139935744538368 - lib/alba - 58719 - ERROR - * Cannot fetch maintenance information for 172.17.23.32
Jan 13 05:00:02 NY2SRV0014 celery[29157]: Traceback (most recent call last):
Jan 13 05:00:02 NY2SRV0014 celery[29157]: File "/opt/OpenvStorage/ovs/lib/alba.py", line 1408, in checkup_maintenance_agents
Jan 13 05:00:02 NY2SRV0014 celery[29157]: service_names = node.client.list_maintenance_services()
Jan 13 05:00:02 NY2SRV0014 celery[29157]: File "/opt/OpenvStorage/ovs/extensions/plugins/asdmanager.py", line 304, in list_maintenance_services
Jan 13 05:00:02 NY2SRV0014 celery[29157]: return self._call(requests.get, 'maintenance', clean=True)['services']
Jan 13 05:00:02 NY2SRV0014 celery[29157]: File "/opt/OpenvStorage/ovs/extensions/plugins/asdmanager.py", line 95, in _call
Jan 13 05:00:02 NY2SRV0014 celery[29157]: response = method(**kwargs)
Jan 13 05:00:02 NY2SRV0014 celery[29157]: File "/usr/lib/python2.7/dist-packages/requests/api.py", line 67, in get
Jan 13 05:00:02 NY2SRV0014 celery[29157]: return request('get', url, params=params, **kwargs)
Jan 13 05:00:02 NY2SRV0014 celery[29157]: File "/usr/lib/python2.7/dist-packages/requests/api.py", line 53, in request
Jan 13 05:00:02 NY2SRV0014 celery[29157]: return session.request(method=method, url=url, **kwargs)
Jan 13 05:00:02 NY2SRV0014 celery[29157]: File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 468, in request
Jan 13 05:00:02 NY2SRV0014 celery[29157]: resp = self.send(prep, **send_kwargs)
Jan 13 05:00:02 NY2SRV0014 celery[29157]: File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 576, in send
Jan 13 05:00:02 NY2SRV0014 celery[29157]: r = adapter.send(request, **kwargs)
Jan 13 05:00:02 NY2SRV0014 celery[29157]: File "/usr/lib/python2.7/dist-packages/requests/adapters.py", line 437, in send
Jan 13 05:00:02 NY2SRV0014 celery[29157]: raise ConnectionError(e, request=request)
Jan 13 05:00:02 NY2SRV0014 celery[29157]: ConnectionError: HTTPSConnectionPool(host='172.17.23.32', port=8500): Max retries exceeded with url: /maintenance (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPS
Connection object at 0x7f453c4d1f50>: Failed to establish a new connection: [Errno 111] Connection refused',))
Jan 13 05:00:02 NY2SRV0014 celery[29157]: 2018-01-13 05:00:02 08700 -0500 - NY2SRV0014 - 25828/139935744538368 - lib/alba - 58720 - ERROR - * Cannot fetch maintenance information for 172.17.23.41
Jan 13 05:00:02 NY2SRV0014 celery[29157]: Traceback (most recent call last):
Jan 13 05:00:02 NY2SRV0014 celery[29157]: File "/opt/OpenvStorage/ovs/lib/alba.py", line 1408, in checkup_maintenance_agents
Jan 13 05:00:02 NY2SRV0014 celery[29157]: service_names = node.client.list_maintenance_services()
Jan 13 05:00:02 NY2SRV0014 celery[29157]: File "/opt/OpenvStorage/ovs/extensions/plugins/asdmanager.py", line 304, in list_maintenance_services
Jan 13 05:00:02 NY2SRV0014 celery[29157]: return self._call(requests.get, 'maintenance', clean=True)['services']
Jan 13 05:00:02 NY2SRV0014 celery[29157]: File "/opt/OpenvStorage/ovs/extensions/plugins/asdmanager.py", line 95, in _call
Jan 13 05:00:02 NY2SRV0014 celery[29157]: response = method(**kwargs)
Jan 13 05:00:02 NY2SRV0014 celery[29157]: File "/usr/lib/python2.7/dist-packages/requests/api.py", line 67, in get
Jan 13 05:00:02 NY2SRV0014 celery[29157]: return request('get', url, params=params, **kwargs)
Jan 13 05:00:02 NY2SRV0014 celery[29157]: File "/usr/lib/python2.7/dist-packages/requests/api.py", line 53, in request
Jan 13 05:00:02 NY2SRV0014 celery[29157]: return session.request(method=method, url=url, **kwargs)
Jan 13 05:00:02 NY2SRV0014 celery[29157]: File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 468, in request
Jan 13 05:00:02 NY2SRV0014 celery[29157]: resp = self.send(prep, **send_kwargs)
Jan 13 05:00:02 NY2SRV0014 celery[29157]: File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 576, in send
Jan 13 05:00:02 NY2SRV0014 celery[29157]: r = adapter.send(request, **kwargs)
Jan 13 05:00:02 NY2SRV0014 celery[29157]: File "/usr/lib/python2.7/dist-packages/requests/adapters.py", line 437, in send
Jan 13 05:00:02 NY2SRV0014 celery[29157]: raise ConnectionError(e, request=request)
Jan 13 05:00:02 NY2SRV0014 celery[29157]: ConnectionError: HTTPSConnectionPool(host='172.17.23.41', port=8500): Max retries exceeded with url: /maintenance (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f453c48e050>: Failed to establish a new connection: [Errno 111] Connection refused',))
Jan 13 05:00:02 NY2SRV0014 celery[29157]: 2018-01-13 05:00:02 08900 -0500 - NY2SRV0014 - 25828/139935744538368 - lib/alba - 58721 - ERROR - * Cannot fetch maintenance information for 172.17.23.9
Jan 13 05:00:02 NY2SRV0014 celery[29157]: Traceback (most recent call last):
Jan 13 05:00:02 NY2SRV0014 celery[29157]: File "/opt/OpenvStorage/ovs/lib/alba.py", line 1408, in checkup_maintenance_agents
Jan 13 05:00:02 NY2SRV0014 celery[29157]: service_names = node.client.list_maintenance_services()
Jan 13 05:00:02 NY2SRV0014 celery[29157]: File "/opt/OpenvStorage/ovs/extensions/plugins/asdmanager.py", line 304, in list_maintenance_services
Jan 13 05:00:02 NY2SRV0014 celery[29157]: return self._call(requests.get, 'maintenance', clean=True)['services']
Jan 13 05:00:02 NY2SRV0014 celery[29157]: File "/opt/OpenvStorage/ovs/extensions/plugins/asdmanager.py", line 95, in _call
Jan 13 05:00:02 NY2SRV0014 celery[29157]: response = method(**kwargs)
Jan 13 05:00:02 NY2SRV0014 celery[29157]: File "/usr/lib/python2.7/dist-packages/requests/api.py", line 67, in get
Jan 13 05:00:02 NY2SRV0014 celery[29157]: return request('get', url, params=params, **kwargs)
Jan 13 05:00:02 NY2SRV0014 celery[29157]: File "/usr/lib/python2.7/dist-packages/requests/api.py", line 53, in request
Jan 13 05:00:02 NY2SRV0014 celery[29157]: return session.request(method=method, url=url, **kwargs)
Jan 13 05:00:02 NY2SRV0014 celery[29157]: File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 468, in request
Jan 13 05:00:02 NY2SRV0014 celery[29157]: resp = self.send(prep, **send_kwargs)
Jan 13 05:00:02 NY2SRV0014 celery[29157]: File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 576, in send
Jan 13 05:00:02 NY2SRV0014 celery[29157]: r = adapter.send(request, **kwargs)
Jan 13 05:00:02 NY2SRV0014 celery[29157]: File "/usr/lib/python2.7/dist-packages/requests/adapters.py", line 437, in send
Jan 13 05:00:02 NY2SRV0014 celery[29157]: raise ConnectionError(e, request=request)
Jan 13 05:00:02 NY2SRV0014 celery[29157]: ConnectionError: HTTPSConnectionPool(host='172.17.23.9', port=8500): Max retries exceeded with url: /maintenance (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f453c48e190>: Failed to establish a new connection: [Errno 111] Connection refused',))
Jan 13 05:00:02 NY2SRV0014 celery[29157]: 2018-01-13 05:00:02 09200 -0500 - NY2SRV0014 - 25828/139935744538368 - lib/alba - 58722 - ERROR - * Cannot fetch maintenance information for 172.17.23.38
Jan 13 05:00:02 NY2SRV0014 celery[29157]: Traceback (most recent call last):
Jan 13 05:00:02 NY2SRV0014 celery[29157]: File "/opt/OpenvStorage/ovs/lib/alba.py", line 1408, in checkup_maintenance_agents
Jan 13 05:00:02 NY2SRV0014 celery[29157]: service_names = node.client.list_maintenance_services()
Jan 13 05:00:02 NY2SRV0014 celery[29157]: File "/opt/OpenvStorage/ovs/extensions/plugins/asdmanager.py", line 304, in list_maintenance_services
Jan 13 05:00:02 NY2SRV0014 celery[29157]: return self._call(requests.get, 'maintenance', clean=True)['services']
Jan 13 05:00:02 NY2SRV0014 celery[29157]: File "/opt/OpenvStorage/ovs/extensions/plugins/asdmanager.py", line 95, in _call
Jan 13 05:00:02 NY2SRV0014 celery[29157]: response = method(**kwargs)
Jan 13 05:00:02 NY2SRV0014 celery[29157]: File "/usr/lib/python2.7/dist-packages/requests/api.py", line 67, in get
Jan 13 05:00:02 NY2SRV0014 celery[29157]: return request('get', url, params=params, **kwargs)
Jan 13 05:00:02 NY2SRV0014 celery[29157]: File "/usr/lib/python2.7/dist-packages/requests/api.py", line 53, in request
Jan 13 05:00:02 NY2SRV0014 celery[29157]: return session.request(method=method, url=url, **kwargs)
Jan 13 05:00:02 NY2SRV0014 celery[29157]: File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 468, in request
Jan 13 05:00:02 NY2SRV0014 celery[29157]: resp = self.send(prep, **send_kwargs)
Jan 13 05:00:02 NY2SRV0014 celery[29157]: File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 576, in send
Jan 13 05:00:02 NY2SRV0014 celery[29157]: r = adapter.send(request, **kwargs)
Jan 13 05:00:02 NY2SRV0014 celery[29157]: File "/usr/lib/python2.7/dist-packages/requests/adapters.py", line 437, in send
Jan 13 05:00:02 NY2SRV0014 celery[29157]: raise ConnectionError(e, request=request)
Jan 13 05:00:02 NY2SRV0014 celery[29157]: ConnectionError: HTTPSConnectionPool(host='172.17.23.38', port=8500): Max retries exceeded with url: /maintenance (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f453c48e2d0>: Failed to establish a new connection: [Errno 111] Connection refused',))
Jan 13 05:00:02 NY2SRV0014 celery[29157]: 2018-01-13 05:00:02 51900 -0500 - NY2SRV0014 - 29157/139935744538368 - celery/celery.worker.strategy - 58733 - INFO - Received task: ovs.storagerouter.ping[f551dacd-6c61-4037-b7ca-87102661c6f3]
Jan 13 05:00:02 NY2SRV0014 celery[29157]: 2018-01-13 05:00:02 56100 -0500 - NY2SRV0014 - 29157/139935744538368 - celery/celery.worker.job - 58734 - INFO - Task ovs.storagerouter.ping[f551dacd-6c61-4037-b7ca-87102661c6f3] succeeded in 0.0409898106009s: None
Jan 13 05:00:03 NY2SRV0014 celery[29157]: 2018-01-13 05:00:03 33100 -0500 - NY2SRV0014 - 25828/139935744538368 - celery/celery.redirected - 58724 - WARNING - 2018-01-13 05:00:03 33100 -0500 - NY2SRV0014 - 25828/139935744538368 - extensions/asdmanagerclient - 58723 - INFO - Request "list_maintenance_services" took 1.23 seconds (internal duration 0.07 seconds)
Jan 13 05:00:04 NY2SRV0014 celery[29157]: 2018-01-13 05:00:04 63000 -0500 - NY2SRV0014 - 25828/139935744538368 - lib/alba - 58726 - ERROR - * Cannot fetch maintenance information for 172.17.23.33
Jan 13 05:00:04 NY2SRV0014 celery[29157]: Traceback (most recent call last):
Jan 13 05:00:04 NY2SRV0014 celery[29157]: File "/opt/OpenvStorage/ovs/lib/alba.py", line 1408, in checkup_maintenance_agents
Jan 13 05:00:04 NY2SRV0014 celery[29157]: service_names = node.client.list_maintenance_services()
Jan 13 05:00:04 NY2SRV0014 celery[29157]: File "/opt/OpenvStorage/ovs/extensions/plugins/asdmanager.py", line 304, in list_maintenance_services
Jan 13 05:00:04 NY2SRV0014 celery[29157]: return self._call(requests.get, 'maintenance', clean=True)['services']
Jan 13 05:00:04 NY2SRV0014 celery[29157]: File "/opt/OpenvStorage/ovs/extensions/plugins/asdmanager.py", line 95, in _call
Jan 13 05:00:04 NY2SRV0014 celery[29157]: response = method(**kwargs)
Jan 13 05:00:04 NY2SRV0014 celery[29157]: File "/usr/lib/python2.7/dist-packages/requests/api.py", line 67, in get
Jan 13 05:00:04 NY2SRV0014 celery[29157]: return request('get', url, params=params, **kwargs)
Jan 13 05:00:04 NY2SRV0014 celery[29157]: File "/usr/lib/python2.7/dist-packages/requests/api.py", line 53, in request
Jan 13 05:00:04 NY2SRV0014 celery[29157]: return session.request(method=method, url=url, **kwargs)
Jan 13 05:00:04 NY2SRV0014 celery[29157]: File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 468, in request
Jan 13 05:00:04 NY2SRV0014 celery[29157]: resp = self.send(prep, **send_kwargs)
Jan 13 05:00:04 NY2SRV0014 celery[29157]: File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 576, in send
Jan 13 05:00:04 NY2SRV0014 celery[29157]: r = adapter.send(request, **kwargs)
Jan 13 05:00:04 NY2SRV0014 celery[29157]: File "/usr/lib/python2.7/dist-packages/requests/adapters.py", line 437, in send
Jan 13 05:00:04 NY2SRV0014 celery[29157]: raise ConnectionError(e, request=request)
Jan 13 05:00:04 NY2SRV0014 celery[29157]: ConnectionError: HTTPSConnectionPool(host='172.17.23.33', port=8500): Max retries exceeded with url: /maintenance (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f453c3987d0>: Failed to establish a new connection: [Errno 111] Connection refused',))
Jan 13 05:00:07 NY2SRV0014 celery[29157]: 2018-01-13 05:00:07 77800 -0500 - NY2SRV0014 - 25828/139935744538368 - lib/alba - 58731 - INFO - Generating service work log for ny2-ssdbackend01
Jan 13 05:00:07 NY2SRV0014 celery[29157]: 2018-01-13 05:00:07 78500 -0500 - NY2SRV0014 - 25828/139935744538368 - lib/alba - 58733 - INFO - Applying service work log for ny2-ssdbackend01
Jan 13 05:00:07 NY2SRV0014 celery[29157]: 2018-01-13 05:00:07 78500 -0500 - NY2SRV0014 - 25828/139935744538368 - lib/alba - 58734 - INFO - Finished service work log for ny2-ssdbackend01
Jan 13 05:00:07 NY2SRV0014 celery[29157]: 2018-01-13 05:00:07 78700 -0500 - NY2SRV0014 - 25828/139935744538368 - lib/alba - 58735 - INFO - Generating service work log for ny2-hddbackend03
Jan 13 05:00:07 NY2SRV0014 celery[29157]: 2018-01-13 05:00:07 79400 -0500 - NY2SRV0014 - 25828/139935744538368 - lib/alba - 58737 - WARNING - * Layout contains unknown node RAAF6YiDaEWlmKvoS3Q9m3CRdUG9Dr8k
Jan 13 05:00:07 NY2SRV0014 celery[29157]: 2018-01-13 05:00:07 79400 -0500 - NY2SRV0014 - 25828/139935744538368 - lib/alba - 58738 - INFO - Applying service work log for ny2-hddbackend03
Jan 13 05:00:07 NY2SRV0014 celery[29157]: 2018-01-13 05:00:07 79400 -0500 - NY2SRV0014 - 25828/139935744538368 - lib/alba - 58739 - INFO - Finished service work log for ny2-hddbackend03
Jan 13 05:00:07 NY2SRV0014 celery[29157]: 2018-01-13 05:00:07 79600 -0500 - NY2SRV0014 - 25828/139935744538368 - lib/alba - 58740 - INFO - Generating service work log for ny2-hddbackend02
Jan 13 05:00:07 NY2SRV0014 celery[29157]: 2018-01-13 05:00:07 80200 -0500 - NY2SRV0014 - 25828/139935744538368 - lib/alba - 58742 - WARNING - * Layout contains unknown node RAAF6YiDaEWlmKvoS3Q9m3CRdUG9Dr8k
Jan 13 05:00:07 NY2SRV0014 celery[29157]: 2018-01-13 05:00:07 80200 -0500 - NY2SRV0014 - 25828/139935744538368 - lib/alba - 58743 - INFO - Applying service work log for ny2-hddbackend02
Jan 13 05:00:07 NY2SRV0014 celery[29157]: 2018-01-13 05:00:07 80200 -0500 - NY2SRV0014 - 25828/139935744538368 - lib/alba - 58744 - INFO - Finished service work log for ny2-hddbackend02
Jan 13 05:00:07 NY2SRV0014 celery[29157]: 2018-01-13 05:00:07 80400 -0500 - NY2SRV0014 - 25828/139935744538368 - lib/alba - 58745 - INFO - Generating service work log for ny2-hddbackend01
Jan 13 05:00:07 NY2SRV0014 celery[29157]: 2018-01-13 05:00:07 81000 -0500 - NY2SRV0014 - 25828/139935744538368 - lib/alba - 58747 - INFO - Applying service work log for ny2-hddbackend01
Jan 13 05:00:07 NY2SRV0014 celery[29157]: 2018-01-13 05:00:07 81000 -0500 - NY2SRV0014 - 25828/139935744538368 - lib/alba - 58748 - INFO - Finished service work log for ny2-hddbackend01
Jan 13 05:00:07 NY2SRV0014 celery[29157]: 2018-01-13 05:00:07 81100 -0500 - NY2SRV0014 - 25828/139935744538368 - lib/ensure single - 58749 - INFO - Ensure single CHAINED mode - ID 1515837600_shv8q5CzvC - Task alba.checkup_maintenance_agents finished successfully
Jan 13 05:00:07 NY2SRV0014 celery[29157]: 2018-01-13 05:00:07 87800 -0500 - NY2SRV0014 - 29157/139935744538368 - celery/celery.worker.job - 58735 - INFO - Task alba.checkup_maintenance_agents[dd8e1c4d-54a4-46ed-90a2-71c144156679] succeeded in 7.69593919115s: None
Before the errors everything was normal even when the agents_layout was not respected.
Jan 13 03:00:00 NY2SRV0014 celery[29157]: 2018-01-13 03:00:00 18400 -0500 - NY2SRV0014 - 4009/139935744538368 - lib/ensure single - 56427 - INFO - Ensure single CHAINED mode - ID 1515830400_OYs6jfrM4h - New task alba.checkup_maintenance_a
gents with default params scheduled for execution
Jan 13 03:00:00 NY2SRV0014 celery[29157]: 2018-01-13 03:00:00 18600 -0500 - NY2SRV0014 - 4009/139935744538368 - lib/alba - 56428 - INFO - Loading maintenance information
Jan 13 03:00:00 NY2SRV0014 celery[29157]: 2018-01-13 03:00:00 19200 -0500 - NY2SRV0014 - 29157/139935744538368 - celery/celery.worker.strategy - 56435 - INFO - Received task: get asd statistics[f65e7dfd-59ec-4746-b5f6-34ddf14c9134]
Jan 13 03:00:00 NY2SRV0014 celery[29157]: 2018-01-13 03:00:00 19300 -0500 - NY2SRV0014 - 29157/139935744538368 - celery/celery.worker.autoscale - 56436 - INFO - Scaling up 1 processes.
Jan 13 03:00:00 NY2SRV0014 celery[29157]: 2018-01-13 03:00:00 20000 -0500 - NY2SRV0014 - 4010/139935744538368 - lib/ensure single - 56429 - INFO - Ensure single DEFAULT mode - ID 1515830400_193ERl1JF1 - Setting key ovs_ensure_single_stats
monkey.get_mds_loads
Jan 13 03:00:00 NY2SRV0014 celery[29157]: 2018-01-13 03:00:00 22400 -0500 - NY2SRV0014 - 4010/139935744538368 - lib/ensure single - 56430 - INFO - Ensure single DEFAULT mode - ID 1515830400_193ERl1JF1 - Task statsmonkey.get_mds_loads fini
shed successfully
Jan 13 03:00:00 NY2SRV0014 celery[29157]: 2018-01-13 03:00:00 22500 -0500 - NY2SRV0014 - 4010/139935744538368 - lib/ensure single - 56431 - INFO - Ensure single DEFAULT mode - ID 1515830400_193ERl1JF1 - Deleting key ovs_ensure_single_stat
smonkey.get_mds_loads
Jan 13 03:00:00 NY2SRV0014 celery[29157]: 2018-01-13 03:00:00 23000 -0500 - NY2SRV0014 - 29157/139935744538368 - celery/celery.worker.strategy - 56437 - INFO - Received task: get asd statistics[d4df3516-95e3-4359-829c-ca3a2f94a426]
Jan 13 03:00:00 NY2SRV0014 celery[29157]: 2018-01-13 03:00:00 23100 -0500 - NY2SRV0014 - 29157/139935744538368 - celery/celery.worker.autoscale - 56438 - INFO - Scaling up 1 processes.
Jan 13 03:00:00 NY2SRV0014 celery[29157]: 2018-01-13 03:00:00 26900 -0500 - NY2SRV0014 - 29157/139935744538368 - celery/celery.worker.job - 56439 - INFO - Task statsmonkey.get_mds_loads[346efcc8-90ae-450f-b854-dfdf47310d7e] succeeded in 0
.11239305418s: []
Jan 13 03:00:00 NY2SRV0014 celery[29157]: 2018-01-13 03:00:00 39900 -0500 - NY2SRV0014 - 29157/139935744538368 - celery/celery.worker.job - 56440 - INFO - Task get asd statistics[e0946e8c-f83c-407f-a121-c5fa0db5e785] succeeded in 0.128930
922132s: [{'fields': {'disk_usage': 403727929203.0, 'MultiGet2_low_max': 0.0325810909, 'GetDiskUsage_avg': 3.11421e-05,...
Jan 13 03:00:00 NY2SRV0014 celery[29157]: 2018-01-13 03:00:00 42000 -0500 - NY2SRV0014 - 29157/139935744538368 - celery/celery.worker.job - 56441 - INFO - Task get asd statistics[d4df3516-95e3-4359-829c-ca3a2f94a426] succeeded in 0.149578
18063s: [{'fields': {'disk_usage': 1181972018035.0, 'MultiGet2_low_max': 32.6431541443, 'GetDiskUsage_avg': 2.60286e-05,...
Jan 13 03:00:00 NY2SRV0014 celery[29157]: 2018-01-13 03:00:00 43300 -0500 - NY2SRV0014 - 29157/139935744538368 - celery/celery.worker.job - 56442 - INFO - Task get asd statistics[f65e7dfd-59ec-4746-b5f6-34ddf14c9134] succeeded in 0.162970
298901s: [{'fields': {'disk_usage': 108035437824.0, 'MultiGet2_low_max': 45.2979691029, 'GetDiskUsage_avg': 2.93947e-05,...
Jan 13 03:00:00 NY2SRV0014 celery[29157]: 2018-01-13 03:00:00 45800 -0500 - NY2SRV0014 - 29157/139935744538368 - celery/celery.worker.job - 56443 - INFO - Task get asd statistics[838dc903-3db5-4dd7-bdf6-470930b82e92] succeeded in 0.186922
571156s: [{'fields': {'PartialGets_histogram_1e+04': 1.0, 'disk_usage': 1906268525337.0, 'PartialGets_histogram_1': 1293.0,...
Jan 13 03:00:00 NY2SRV0014 celery[29157]: 2018-01-13 03:00:00 95200 -0500 - NY2SRV0014 - 2897/139935744538368 - lib/ensure single - 56416 - INFO - Ensure single DEFAULT mode - ID 1515830400_K6mCA1I1fO - Task statsmonkey.get_nsm_stats fini
shed successfully
Jan 13 03:00:00 NY2SRV0014 celery[29157]: 2018-01-13 03:00:00 95300 -0500 - NY2SRV0014 - 2897/139935744538368 - lib/ensure single - 56417 - INFO - Ensure single DEFAULT mode - ID 1515830400_K6mCA1I1fO - Deleting key ovs_ensure_single_stat
smonkey.get_nsm_stats
Jan 13 03:00:00 NY2SRV0014 celery[29157]: 2018-01-13 03:00:00 99000 -0500 - NY2SRV0014 - 29157/139935744538368 - celery/celery.worker.job - 56444 - INFO - Task statsmonkey.get_nsm_stats[4b1789aa-69a7-4630-ae9d-075f6a1dcb2a] succeeded in 0
.838836473878s: [{'fields': {'CleanupForNamespace_min': 0.0010738372802734375, 'UpdateObject3_avg': 0.0024721622467041016,...
Jan 13 03:00:01 NY2SRV0014 celery[29157]: 2018-01-13 03:00:01 74900 -0500 - NY2SRV0014 - 2895/139935744538368 - lib/ensure single - 56412 - INFO - Ensure single DEFAULT mode - ID 1515830400_TOOOT4azLJ - Task statsmonkey.get_asd_stats fini
shed successfully
Jan 13 03:00:01 NY2SRV0014 celery[29157]: 2018-01-13 03:00:01 75000 -0500 - NY2SRV0014 - 2895/139935744538368 - lib/ensure single - 56413 - INFO - Ensure single DEFAULT mode - ID 1515830400_TOOOT4azLJ - Deleting key ovs_ensure_single_stat
smonkey.get_asd_stats
Jan 13 03:00:01 NY2SRV0014 celery[29157]: 2018-01-13 03:00:01 83100 -0500 - NY2SRV0014 - 29157/139935744538368 - celery/celery.worker.job - 56445 - INFO - Task statsmonkey.get_asd_stats[2ec82f45-d09a-44a8-a9d2-c790c88a1e4c] succeeded in 1
.79584695678s: [{'fields': {'disk_usage': 403727929203.0, 'MultiGet2_low_max': 0.0325810909, 'GetDiskUsage_avg': 3.11421e-05,...
Jan 13 03:00:02 NY2SRV0014 celery[29157]: 2018-01-13 03:00:02 24500 -0500 - NY2SRV0014 - 4009/139935744538368 - celery/celery.redirected - 56431 - WARNING - 2018-01-13 03:00:02 24500 -0500 - NY2SRV0014 - 4009/139935744538368 - extensions/
asdmanagerclient - 56430 - INFO - Request "list_maintenance_services" took 1.38 seconds (internal duration 1.37 seconds)
Jan 13 03:00:02 NY2SRV0014 celery[29157]: 2018-01-13 03:00:02 53000 -0500 - NY2SRV0014 - 29157/139935744538368 - celery/celery.worker.strategy - 56446 - INFO - Received task: ovs.storagerouter.ping[0e834f7e-d7ef-4677-b58a-5e34bad68936]
Jan 13 03:00:02 NY2SRV0014 celery[29157]: 2018-01-13 03:00:02 55600 -0500 - NY2SRV0014 - 29157/139935744538368 - celery/celery.worker.job - 56447 - INFO - Task ovs.storagerouter.ping[0e834f7e-d7ef-4677-b58a-5e34bad68936] succeeded in 0.02
50106919557s: None
Jan 13 03:00:05 NY2SRV0014 celery[29157]: 2018-01-13 03:00:05 45000 -0500 - NY2SRV0014 - 4009/139935744538368 - lib/alba - 56439 - INFO - Generating service work log for ny2-ssdbackend01
Jan 13 03:00:05 NY2SRV0014 celery[29157]: 2018-01-13 03:00:05 45700 -0500 - NY2SRV0014 - 4009/139935744538368 - lib/alba - 56441 - INFO - Applying service work log for ny2-ssdbackend01
Jan 13 03:00:05 NY2SRV0014 celery[29157]: 2018-01-13 03:00:05 45700 -0500 - NY2SRV0014 - 4009/139935744538368 - lib/alba - 56442 - INFO - Finished service work log for ny2-ssdbackend01
Jan 13 03:00:05 NY2SRV0014 celery[29157]: 2018-01-13 03:00:05 45900 -0500 - NY2SRV0014 - 4009/139935744538368 - lib/alba - 56443 - INFO - Generating service work log for ny2-hddbackend03
Jan 13 03:00:05 NY2SRV0014 celery[29157]: 2018-01-13 03:00:05 46600 -0500 - NY2SRV0014 - 4009/139935744538368 - lib/alba - 56445 - INFO - Applying service work log for ny2-hddbackend03
Jan 13 03:00:05 NY2SRV0014 celery[29157]: 2018-01-13 03:00:05 46700 -0500 - NY2SRV0014 - 4009/139935744538368 - lib/alba - 56446 - INFO - Finished service work log for ny2-hddbackend03
Jan 13 03:00:05 NY2SRV0014 celery[29157]: 2018-01-13 03:00:05 46900 -0500 - NY2SRV0014 - 4009/139935744538368 - lib/alba - 56447 - INFO - Generating service work log for ny2-hddbackend02
Jan 13 03:00:05 NY2SRV0014 celery[29157]: 2018-01-13 03:00:05 47600 -0500 - NY2SRV0014 - 4009/139935744538368 - lib/alba - 56449 - INFO - Applying service work log for ny2-hddbackend02
Jan 13 03:00:05 NY2SRV0014 celery[29157]: 2018-01-13 03:00:05 47600 -0500 - NY2SRV0014 - 4009/139935744538368 - lib/alba - 56450 - INFO - Finished service work log for ny2-hddbackend02
Jan 13 03:00:05 NY2SRV0014 celery[29157]: 2018-01-13 03:00:05 47800 -0500 - NY2SRV0014 - 4009/139935744538368 - lib/alba - 56451 - INFO - Generating service work log for ny2-hddbackend01
Jan 13 03:00:05 NY2SRV0014 celery[29157]: 2018-01-13 03:00:05 48400 -0500 - NY2SRV0014 - 4009/139935744538368 - lib/alba - 56453 - INFO - Applying service work log for ny2-hddbackend01
Jan 13 03:00:05 NY2SRV0014 celery[29157]: 2018-01-13 03:00:05 48400 -0500 - NY2SRV0014 - 4009/139935744538368 - lib/alba - 56454 - INFO - Finished service work log for ny2-hddbackend01
Jan 13 03:00:05 NY2SRV0014 celery[29157]: 2018-01-13 03:00:05 48400 -0500 - NY2SRV0014 - 4009/139935744538368 - lib/ensure single - 56455 - INFO - Ensure single CHAINED mode - ID 1515830400_OYs6jfrM4h - Task alba.checkup_maintenance_agent
s finished successfully
Jan 13 03:00:05 NY2SRV0014 celery[29157]: 2018-01-13 03:00:05 50400 -0500 - NY2SRV0014 - 29157/139935744538368 - celery/celery.worker.job - 56448 - INFO - Task alba.checkup_maintenance_agents[e784a051-99c3-4a14-bcb0-a9141ee24aa3] succeede
d in 5.35206198692s: None
In CheckMK we added some extra monitor so we could detect if one of the maintenance agent goes down.
The text was updated successfully, but these errors were encountered:
This morning, while the maintenance node (NY1SRV1000) was down, the maintenance agents started being deployed on ASD VM's which was bringing down the ASD VM's as they got out of memory.
There is only 1 node configured to be candidate:
root@NY1SRV0019:~# ovs config get ovs/alba/backends/f9599945-f1f5-44d3-8497-0c462ede4ef9/maintenance/agents_layout
["BYkbxixfsebWk78YJtUXpYwEYIg4Teex"]
It might be that the all asd nodes were unresponsive when the checkup was invoked.
When all asdnodes don't return their maintenance services, the layout is ignored.
Last Friday we detected that the maintenance agents on NY2 are deployed on asd vms.
This is not what we configure in the agents_layout config.
After manually running the checkup_maintenance_agents method the services were removed on the asd vms and deployed on the correct nodes.
I'm not able to find information in the logging why the maintenance agents were deployed on different nodes.
Maybe the update triggered something or it was already deployed at the beginning...
What i don't understand is why the cron didn't detected this and a manually run did.
Agents layouts:
node_ids:
We discovered this when we added a second backend to globalbackend02.
At a certain time some asd vms didn't response to checkmk anymore. (the one with a running maintenance agent)
At that point we saw some errors in the celery log:
Before the errors everything was normal even when the agents_layout was not respected.
In CheckMK we added some extra monitor so we could detect if one of the maintenance agent goes down.
The text was updated successfully, but these errors were encountered: