You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I interpret the "signal: killed" as a failed probe which also is indicated by;
# kubectl describe pod forwarder-vpp-kksqq
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 8m4s default-scheduler Successfully assigned default/forwarder-vpp-kksqq to vm-002
Normal Pulling 8m4s kubelet Pulling image "ghcr.io/networkservicemesh/cmd-forwarder-vpp:v1.8.0"
Normal Pulled 8m1s kubelet Successfully pulled image "ghcr.io/networkservicemesh/cmd-forwarder-vpp:v1.8.0" in 3.191323307s (3.191332663s including waiting)
Warning Unhealthy 97s (x3 over 7m59s) kubelet Readiness probe failed: timeout: failed to connect service "unix:///listen.on.sock" within 1s
Normal Created 58s (x4 over 8m) kubelet Created container forwarder-vpp
Normal Started 58s (x4 over 8m) kubelet Started container forwarder-vpp
Normal Pulled 58s (x3 over 2m5s) kubelet Container image "ghcr.io/networkservicemesh/cmd-forwarder-vpp:v1.8.0" already present on machine
Warning Unhealthy 57s kubelet Readiness probe failed: command timed out
Warning BackOff 44s (x9 over 115s) kubelet Back-off restarting failed container forwarder-vpp in pod forwarder-vpp-kksqq_default(afab3247-3397-415d-a4a1-f943c279a981)
Even when everything is OK (one trench) there are error printouts in the forwarder-vpp log. The problem seem to be transient though;
Mar 3 07:39:26.091 [ERRO] [cmd:/bin/forwarder] [component:netNsMonitor] [inodeURL:inode://4/4026532665] inode 4026532665 is not found in /proc
time="2023-03-03T07:39:26Z" level=info msg="No subscription found for the notification message." msg_id=701 msg_size=19
time="2023-03-03T07:39:26Z" level=info msg="No subscription found for the notification message." msg_id=701 msg_size=19
time="2023-03-03T07:39:26Z" level=error msg="Channel ID not known, ignoring the message." channel=132 msg_id=24
time="2023-03-03T07:39:26Z" level=info msg="No subscription found for the notification message." msg_id=701 msg_size=19
time="2023-03-03T07:41:15Z" level=error msg="Channel ID not known, ignoring the message." channel=176 msg_id=24
Altering the liveness/readiness timeouts did not help. Not even removing the probes. Forwarder-vpp still crashes with "signal: killed". Since no probe fails, I don't know where the signal comes from.
Describe the bug
One trench usually works, but when starting more trenches the
forwarder-vpp
on the nodes where the Meridio LB is deployed dives into a crash-loop.I saw OOM-kill once when monitoring with "watch", but I am very unsure that's the root cause
Expected behavior
No crash
Context
...
Logs
The last entries before re-start below:
I interpret the "signal: killed" as a failed probe which also is indicated by;
Even when everything is OK (one trench) there are error printouts in the
forwarder-vpp
log. The problem seem to be transient though;I also got an enormous stack trace once.
The text was updated successfully, but these errors were encountered: