Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Old NSM interfaces might remain in proxy after OOM kill #407

Open
zolug opened this issue Apr 20, 2023 · 1 comment
Open

Old NSM interfaces might remain in proxy after OOM kill #407

zolug opened this issue Apr 20, 2023 · 1 comment

Comments

@zolug
Copy link
Collaborator

zolug commented Apr 20, 2023

Describe the bug
When proxy process gets killed by OOM, there's no chance for the proxy to close them obviously. However, NSM might not be able to clean them up either when the MaxTokenLifetime expires due to rpc error: code = PermissionDenied desc = no sufficient privileges.

The old interfaces do not seem to cause any problem, but they are confusing. (Their IPs are also leaked.)
The connected peer interface in LB also remains. (and can be used to reach the proxy).

To Reproduce
Steps to reproduce the behavior:

  1. Kill a proxy process using KILL signal. (For example from its worker.)
  2. Wait the container to recover and check the NSM interfaces towards LBs.
  3. Wait MaxTokenLifetime (10 minutes by default) and check the interfaces agains. If they are present check the logs of the local nsmgr or forwarder for the Close.

Expected behavior
The interfaces should be event removed (probably by NSM).
(Resolving the leaked IP issue sound much more complex.)

Context

  • Network Service Mesh: v1.6.2

  • Meridio: 115ca87
    Author: Lionel Jouin [email protected]
    Date: Fri Mar 24 19:25:56 2023 +0100

    Use Helm chart to deploy e2e operator env

    • Helm charts are now used in the kind-operator test env
    • Enable Meridio and TAPA versions in the test scope
    • Set hostPID to true in vpp-forwarder

Logs

Apr 20 16:29:35.434ESC[37m [TRAC] [id:4967924d-fc4b-42df-b071-20824f7fa158] [type:networkService] ESC[0m(2.1)    close={"id":"4967924d-fc4b-42df-b071-20824f7fa158","network_service":"load-balancer.trench-a.default","mechanism":{"cls":"LOCAL","type":"KERNEL","parameters":{"inodeURL":"inode://4/4026532326","name":"load-balan-fb82"}},"context":{"ip_context":{"src_ip_addrs":["172.16.3.3/24","fd00:0:0:3::3/64"],"dst_ip_addrs":["172.16.3.2/24","fd00:0:0:3::2/64"],"excluded_prefixes":["11.0.0.0/22","11.0.4.0/23","11.0.6.0/24","12.0.0.1/32","12.0.27.146/32","12.0.116.178/32","12.0.155.217/32","12.0.184.219/32","12.0.201.243/32","12.0.250.201/32"]},"MTU":1446},"labels":{"nodeName":"vm-002"},"path":{"index":2,"path_segments":[{"name":"proxy-trench-a-cfqvk","id":"fb82d6e2-ecaf-496e-8072-0c90da0ddfd1-proxy-trench-a-cfqvk-load-balancer.trench-a.default-load-balancer-trench-a-7c885b6d9b-xgvx8","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9kZWZhdWx0L3NhL2RlZmF1bHQiLCJhdWQiOlsic3BpZmZlOi8vZXhhbXBsZS5vcmcvbnMvbnNtL3NhL25zbWdyLXNhIl0sImV4cCI6MTY4MjAwODE3NH0.0QtsWuA3vEKWj0dEwIkkCgPu5Rf9uBD5hzT50GAqnUcWrQ0P_XfEZyCbX-Z7l-Bzzg_S0tiyAp0pcDVrVg-CFw","expires":{"seconds":1682008174,"nanos":455339069}},{"name":"nsmgr-tr5n7","id":"44a7009f-5a59-4329-a253-38ae6ccd93d5","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20vc2EvbnNtZ3Itc2EiLCJhdWQiOlsic3BpZmZlOi8vZXhhbXBsZS5vcmcvbnMvbnNtL3NhL2RlZmF1bHQiXSwiZXhwIjoxNjgyMDA4MTc1fQ.exoviVXnqYh4aYdzm1Qx9JpxVS5mYlMYzDQaPs7RiMjm7ilyy9rJJAQLhY258EHpd1gUa6vlqy5LfD9-bJqc2Q","expires":{"seconds":1682008175,"nanos":429149769}},{"name":"forwarder-vpp-thdt2","id":"4967924d-fc4b-42df-b071-20824f7fa158","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20vc2EvZGVmYXVsdCIsImF1ZCI6WyJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20vc2EvbnNtZ3Itc2EiXSwiZXhwIjoxNjgyMDA4MTc2fQ.kcaJ8HkJmLu5A9Q02R7ACiolZ6JUS00Vcuzxt1MgC5_fq4Vc-ob57E-yQTqDky9suZjjlMrevtJWw9fLX4pwfg","expires":{"seconds":1682008176,"nanos":273042269},"metrics":{"client_drops":"0","client_rx_bytes":"0","client_rx_packets":"0","client_tx_bytes":"0","client_tx_packets":"0","server_drops":"0","server_rx_bytes":"0","server_rx_packets":"0","server_tx_bytes":"0","server_tx_packets":"0"}},{"name":"nsmgr-27j6j","id":"ddfd106c-e964-4f9f-bb54-9b6e1912a0f8","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20vc2EvbnNtZ3Itc2EiLCJhdWQiOlsic3BpZmZlOi8vZXhhbXBsZS5vcmcvbnMvbnNtL3NhL2RlZmF1bHQiXSwiZXhwIjoxNjgyMDA4MTc3fQ.8BBI4hhSXXw-yoXJwSsNP3O08x9Q4TOLAuS_oTOnu37ZtjnFo3DbC6nmAJCUpSrfEOQesq8Nk2MhFeLqrKZrWA","expires":{"seconds":1682008177,"nanos":830338082}},{"name":"forwarder-vpp-cp8sp","id":"eebdbac2-5393-4cd1-a148-6aa2fb41544d","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20vc2EvZGVmYXVsdCIsImF1ZCI6WyJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9kZWZhdWx0L3NhL21lcmlkaW8tYXV0aC10cmVuY2gtYSJdLCJleHAiOjE2ODIwMDgxNzh9.0FEurb6Wm_U1OXlwGNQ7CEHWHwx1uB37m3I_oXYotGckctlmfYihBcNDlPvvX3LKWGKbjzihbf8CQfYvHhfOqw","expires":{"seconds":1682008178,"nanos":249942792},"metrics":{"client_drops":"0","client_rx_bytes":"0","client_rx_packets":"0","client_tx_bytes":"0","client_tx_packets":"0","server_drops":"0","server_rx_bytes":"0","server_rx_packets":"0","server_tx_bytes":"0","server_tx_packets":"0"}},{"name":"load-balancer-trench-a-7c885b6d9b-xgvx8","id":"30958abe-2bde-4f45-83a0-e67fa66d801b","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9kZWZhdWx0L3NhL21lcmlkaW8tYXV0aC10cmVuY2gtYSIsImF1ZCI6WyJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20vc2EvZGVmYXVsdCJdLCJleHAiOjE2ODIwMDgxNzh9.5q5UYPUNFCV5aw-4yPSmG0sjGNW9JhsdXTlq_xO2wjVsV5Bcx1EMSNCevk5PBtLNthIg3O152flg6lLUjOQFZA","expires":{"seconds":1682008178,"nanos":250666242}}]},"network_service_endpoint_name":"load-balancer-trench-a-7c885b6d9b-xgvx8","payload":"ETHERNET"}
Apr 20 16:29:35.434ESC[37m [TRAC] [id:4967924d-fc4b-42df-b071-20824f7fa158] [type:networkService] ESC[0m(3)   ⎆ sdk/pkg/networkservice/common/updatetoken/updateTokenServer.Close()
Apr 20 16:29:35.434ESC[37m [TRAC] [id:4967924d-fc4b-42df-b071-20824f7fa158] [type:networkService] ESC[0m(3.1)     close-diff={"path":{"path_segments":{"2":{"expires":{"nanos":434310849,"seconds":1682008775},"token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20vc2EvZGVmYXVsdCIsImF1ZCI6WyJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20vc2EvbnNtZ3Itc2EiXSwiZXhwIjoxNjgyMDA4Nzc1fQ.nbjavxp1tsSZxvPiiKISnXo8WFsVmRWhX9AL7-ZuSLEpPjtfh8Xpddia3SKjjL7hDWS6hAlMOpT1NTXqM8s0Zg"}}}}
Apr 20 16:29:35.434ESC[37m [TRAC] [id:4967924d-fc4b-42df-b071-20824f7fa158] [type:networkService] ESC[0m(4)    ⎆ sdk/pkg/networkservice/common/authorize/authorizeServer.Close()
Apr 20 16:29:35.435ESC[31m [ERRO] [id:4967924d-fc4b-42df-b071-20824f7fa158] [type:networkService] ESC[0m(4.1)      rpc error: code = PermissionDenied desc = no sufficient privileges;  Error returned from sdk/pkg/networkservice/common/authorize/authorizeServer.Close;      github.com/networkservicemesh/sdk/pkg/networkservice/core/trace.logError;               /root/go/pkg/mod/github.com/networkservicemesh/[email protected]/pkg/networkservice/core/trace/common.go:206;  github.com/networkservicemesh/sdk/pkg/networkservice/core/trace.(*beginTraceServer).Close;              /root/go/pkg/mod/github.com/networkservicemesh/[email protected]/pkg/networkservice/core/trace/server.go:71;   github.com/networkservicemesh/sdk/pkg/networkservice/core/next.(*nextServer).Close;             /root/go/pkg/mod/github.com/networkservicemesh/[email protected]/pkg/networkservice/core/next/server.go:65;    github.com/networkservicemesh/sdk/pkg/networkservice/core/next.(*nextServer).Close;             /root/go/pkg/mod/github.com/networkservicemesh/[email protected]/pkg/networkservice/core/next/server.go:65;    github.com/networkservicemesh/sdk/pkg/networkservice/core/trace.(*endTraceServer).Close;                /root/go/pkg/mod/github.com/networkservicemesh/[email protected]/pkg/networkservice/core/trace/server.go:86;   github.com/networkservicemesh/sdk/pkg/networkservice/core/next.(*nextServer).Close;             /root/go/pkg/mod/github.com/networkservicemesh/[email protected]/pkg/networkservice/core/next/server.go:65;    github.com/networkservicemesh/sdk/pkg/networkservice/common/updatetoken.(*updateTokenServer).Close;             /root/go/pkg/mod/github.com/networkservicemesh/[email protected]/pkg/networkservice/common/updatetoken/server.go:84;   github.com/networkservicemesh/sdk/pkg/networkservice/core/trace.(*beginTraceServer).Close;              /root/go/pkg/mod/github.com/networkservicemesh/[email protected]/pkg/networkservice/core/trace/server.go:69;   github.com/networkservicemesh/sdk/pkg/networkservice/core/next.(*nextServer).Close;             /root/go/pkg/mod/github.com/networkservicemesh/[email protected]/pkg/networkservice/core/next/server.go:65;    github.com/networkservicemesh/sdk/pkg/networkservice/core/next.(*nextServer).Close;             /root/go/pkg/mod/github.com/networkservicemesh/[email protected]/pkg/networkservice/core/next/server.go:65;    github.com/networkservicemesh/sdk/pkg/networkservice/core/trace.(*endTraceServer).Close;                /root/go/pkg/mod/github.com/networkservicemesh/[email protected]/pkg/networkservice/core/trace/server.go:86;   github.com/networkservicemesh/sdk/pkg/networkservice/core/next.(*nextServer).Close;             /root/go/pkg/mod/github.com/networkservicemesh/[email protected]/pkg/networkservice/core/next/server.go:65;    github.com/networkservicemesh/sdk/pkg/networkservice/common/begin.(*eventFactoryServer).Close.func1;            /root/go/pkg/mod/github.com/networkservicemesh/[email protected]/pkg/networkservice/common/begin/event_factory.go:225; github.com/edwarnicke/serialize.(*Executor).process;            /root/go/pkg/mod/github.com/edwarnicke/[email protected]/serialize.go:68;        runtime.goexit;         /go/src/runtime/asm_amd64.s:1571;       
Apr 20 16:29:35.435ESC[37m [TRAC] [id:4967924d-fc4b-42df-b071-20824f7fa158] [type:networkService] ESC[0m(3.2)     close-response-diff={"id":"4967924d-fc4b-42df-b071-20824f7fa158","path":{"index":2,"path_segments":{"2":{"expires":{"nanos":434310849,"seconds":1682008775},"token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20vc2EvZGVmYXVsdCIsImF1ZCI6WyJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20vc2EvbnNtZ3Itc2EiXSwiZXhwIjoxNjgyMDA4Nzc1fQ.nbjavxp1tsSZxvPiiKISnXo8WFsVmRWhX9AL7-ZuSLEpPjtfh8Xpddia3SKjjL7hDWS6hAlMOpT1NTXqM8s0Zg"}}}}
Apr 20 16:29:35.435ESC[31m [ERRO] [id:4967924d-fc4b-42df-b071-20824f7fa158] [type:networkService] ESC[0m(3.3)     Error returned from sdk/pkg/networkservice/common/authorize/authorizeServer.Close: rpc error: code = PermissionDenied desc = no sufficient privileges
Apr 20 16:29:35.854ESC[37m [TRAC] [id:5ae244a3-d784-4ec5-b3b4-0d5951b1b7db] [type:networkService] ESC[0m(2.1)    close={"id":"5ae244a3-d784-4ec5-b3b4-0d5951b1b7db","network_service":"load-balancer.trench-a.default","mechanism":{"cls":"LOCAL","type":"KERNEL","parameters":{"inodeURL":"inode://4/4026532326","name":"load-balan-0515"}},"context":{"ip_context":{"src_ip_addrs":["172.16.3.5/24","fd00:0:0:3::5/64"],"dst_ip_addrs":["172.16.3.4/24","fd00:0:0:3::4/64"],"excluded_prefixes":["11.0.0.0/22","11.0.4.0/23","11.0.6.0/24","12.0.0.1/32","12.0.27.146/32","12.0.116.178/32","12.0.155.217/32","12.0.184.219/32","12.0.201.243/32","12.0.250.201/32"]},"MTU":1500},"labels":{"nodeName":"vm-002"},"path":{"index":2,"path_segments":[{"name":"proxy-trench-a-cfqvk","id":"051579bb-4ced-459c-bd02-e17721370210-proxy-trench-a-cfqvk-load-balancer.trench-a.default-load-balancer-trench-a-7c885b6d9b-k7fnk","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9kZWZhdWx0L3NhL2RlZmF1bHQiLCJhdWQiOlsic3BpZmZlOi8vZXhhbXBsZS5vcmcvbnMvbnNtL3NhL25zbWdyLXNhIl0sImV4cCI6MTY4MjAwODE3NH0.uNZ4KPC2NZbqHDIxpXFbLhxxXSWhkZ1VbD3s3k4yVr86mMP0w24Vw7DsWmVyIv3ywDMMqUIHyOvA2tHnIdY7UQ","expires":{"seconds":1682008174,"nanos":472660529}},{"name":"nsmgr-tr5n7","id":"3902abe8-3678-43b9-bcbf-569c716a8685","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20vc2EvbnNtZ3Itc2EiLCJhdWQiOlsic3BpZmZlOi8vZXhhbXBsZS5vcmcvbnMvbnNtL3NhL2RlZmF1bHQiXSwiZXhwIjoxNjgyMDA4MTc1fQ.1FuetkXxojB8W_CsYdydm3Bswz_e27-ttZoTvfCL_lRQiHDvCQtvOcBhitrIRq64snn8dfMSWw_p6LoNwGCkAw","expires":{"seconds":1682008175,"nanos":852200649}},{"name":"forwarder-vpp-thdt2","id":"5ae244a3-d784-4ec5-b3b4-0d5951b1b7db","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20vc2EvZGVmYXVsdCIsImF1ZCI6WyJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9kZWZhdWx0L3NhL21lcmlkaW8tYXV0aC10cmVuY2gtYSJdLCJleHAiOjE2ODIwMDgxNzZ9.LNIU7MDeu7AJtRjU2n5xPRa4Q-zn_uagGjQ5NQTe3oSrbLf-r_Mw6AvnzR_Z_3izq77Yq4FEu3P9MxmMjvnFbg","expires":{"seconds":1682008176,"nanos":643886529},"metrics":{"client_drops":"0","client_rx_bytes":"0","client_rx_packets":"0","client_tx_bytes":"0","client_tx_packets":"0","server_drops":"0","server_rx_bytes":"0","server_rx_packets":"0","server_tx_bytes":"0","server_tx_packets":"0"}},{"name":"load-balancer-trench-a-7c885b6d9b-k7fnk","id":"db2bd643-5505-4c38-9704-7eb194b51f44","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9kZWZhdWx0L3NhL21lcmlkaW8tYXV0aC10cmVuY2gtYSIsImF1ZCI6WyJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20vc2EvZGVmYXVsdCJdLCJleHAiOjE2ODIwMDgxNzZ9.m0FIfhJPKfomZ6eXTBBWef7gjuc1xddGLfo959MFBg4zvNapP2Ephv-sYKVSrcunCRzE2uU88cyTdU0CkkXMFw","expires":{"seconds":1682008176,"nanos":644706639}}]},"network_service_endpoint_name":"load-balancer-trench-a-7c885b6d9b-k7fnk","payload":"ETHERNET"}

@zolug
Copy link
Collaborator Author

zolug commented Sep 23, 2024

There were improvements in NSM related to cleaning up connections. Haven't noticed "PersmissionDenied" error for a while.

There were also improvements in Meridio:
-IP garbage collection for Meridio IPAM:
#527
-Proxy can restore previous connections to LBs by learning them from NSMgr using NSM's Monitor Connection feature:
#536
#542

Run verification tests with most recent Meridio and NSM >= v.1.13.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 📋 To Do
Development

No branches or pull requests

1 participant