Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xcluster; Target misconfigured in larger clusters (infrequent) #376

Open
uablrek opened this issue Feb 20, 2023 · 10 comments
Open

Xcluster; Target misconfigured in larger clusters (infrequent) #376

uablrek opened this issue Feb 20, 2023 · 10 comments
Labels

Comments

@uablrek
Copy link
Contributor

uablrek commented Feb 20, 2023

When starting a larger cluster with about >8 nodes the targets may get faulty routes:

# ip ro show table 1
default dev nsm-0       # Not good!

Return traffic is dropped.

The route should look something like;

# ip ro show table 1
default via 172.17.1.1 dev nsm-0 onlink     # Good

This happens infrequently, less than 1/20 it seems. But when it happens, at least 80% of the targets seem to get this configuration. Some targets works though.

Everything else seem to work. There are no errors in the tapa logs.

To Reproduce

With ovl/forwarder-test do:

./forwarder-test.sh test --nvm=10 --trenches=blue > $log

Many times

Context

  • Kernel: 6.1.0
  • Kubernetes: Local build
  • Spire: v1.5.1
  • Network Service Mesh: v1.7.1
  • Meridio: Local build
    ...
@LionelJouin LionelJouin moved this to 📋 To Do in Meridio Feb 22, 2023
@uablrek
Copy link
Contributor Author

uablrek commented Mar 27, 2023

Still present in v1.8.0 with /dev/vhost-net and forwarder-vpp locally built from "main".

@uablrek
Copy link
Contributor Author

uablrek commented Mar 27, 2023

On this particular run all targets on vm-006 (out of 10) were OK, while all other targets failed.

@uablrek
Copy link
Contributor Author

uablrek commented Mar 27, 2023

Repeated error in tapa logs:

{
  "severity": "error",
  "timestamp": "2023-03-27T11:02:24.48+00:00",
  "service_id": "Meridio-tapa",
  "message": "opening stream",
  "version": "1.0.0",
  "extra_data": {
    "stream": "name:\"stream1\" conduit:{name:\"load-balancer\" trench:{name:\"blue\"}}",
    "error": "ips not set"
  }
}

@uablrek
Copy link
Contributor Author

uablrek commented Mar 28, 2023

The Conduit seems connected OK. Why the "ips not set"?

{
  "severity": "info",
  "timestamp": "2023-03-28T05:47:04.256+00:00",
  "service_id": "Meridio-tapa",
  "message": "Connect",
  "version": "1.0.0",
  "extra_data": {
    "class": "Conduit",
    "instance": "load-balancer"
  }
}
{
  "severity": "info",
  "timestamp": "2023-03-28T05:47:05.867+00:00",
  "service_id": "Meridio-tapa",
  "message": "Connected",
  "version": "1.0.0",
  "extra_data": {
    "class": "Conduit",
    "instance": "load-balancer",
    "connection": "id:\"meridio-app-745c7774fc-mbbj6-proxy.load-balancer.blue.blue-0\" network_service:\"proxy.load-balancer.blue.blue\" mechanism:{cls:\"LOCAL\" type:\"KERNEL\" parameters:{key:\"inodeURL\" value:
\"file:///proc/thread-self/ns/net\"} parameters:{key:\"name\" value:\"nsm-0\"}} context:{ip_context:{src_ip_addrs:\"172.17.9.11/24\" src_ip_addrs:\"fd00::ac11:90b/120\" dst_ip_addrs:\"172.17.9.12/24\" dst_ip_addrs
:\"fd00::ac11:90c/120\" excluded_prefixes:\"11.0.0.0/21\" excluded_prefixes:\"11.0.8.0/23\" excluded_prefixes:\"12.0.0.1/32\" excluded_prefixes:\"12.0.6.4/32\" excluded_prefixes:\"12.0.63.27/32\" excluded_prefixes
:\"12.0.137.166/32\" excluded_prefixes:\"12.0.228.80/32\" excluded_prefixes:\"12.0.255.75/32\"} MTU:1500} labels:{key:\"nodeName\" value:\"vm-008\"} path:{path_segments:{name:\"meridio-app-745c7774fc-mbbj6\" id:\"
meridio-app-745c7774fc-mbbj6-proxy.load-balancer.blue.blue-0\" token:\"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9ibHVlL3NhL2RlZmF1bHQiLCJhdWQiOlsic3BpZmZlOi8vZXhhbXBsZS5vcmcvb
nMvZGVmYXVsdC9zYS9kZWZhdWx0Il0sImV4cCI6MTY3OTk4MzAyNH0.ifMGTbcw4AM0H0RlZl8zq4BTTRNYlCzDbNdWKlCHqoOeiww1M0YXS1kKJ9cyOvhMZ6L-XzBhymL9Lan3_goXRg\" expires:{seconds:1679983024 nanos:260833707}} path_segments:{name:\"n
smgr-s9c6b\" id:\"7c7ae118-78dd-4349-9b98-d514dcd30a45\" token:\"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9kZWZhdWx0L3NhL2RlZmF1bHQiLCJhdWQiOlsic3BpZmZlOi8vZXhhbXBsZS5vcmcvbnM
vZGVmYXVsdC9zYS9kZWZhdWx0Il0sImV4cCI6MTY3OTk4MzAyNX0.VLUpDreEhxZ0P-l65aE82NUSPg89PAL2E1_s2liEXf__VUg1XLHrqT0l0y1HfGcbjB2oYfa84UP6AFpD7flg7w\" expires:{seconds:1679983025 nanos:72233057}} path_segments:{name:\"forw
arder-vpp-2qcgn\" id:\"f9752724-c6b2-484c-a91a-ed4e2996a2c6\" token:\"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9kZWZhdWx0L3NhL2RlZmF1bHQiLCJhdWQiOlsic3BpZmZlOi8vZXhhbXBsZS5vcm
cvbnMvYmx1ZS9zYS9kZWZhdWx0Il0sImV4cCI6MTY3OTk4MzAyNX0.QD6b0zs7yAtdH9WheCwrDO4BvqEV1zhoHEjuzjAmYGt6Rxgs7GmsAfJ888KzflX2OeofpSOuo5SEmes2IURurQ\" expires:{seconds:1679983025 nanos:666906540} metrics:{key:\"client_dro
ps\" value:\"0\"} metrics:{key:\"client_rx_bytes\" value:\"0\"} metrics:{key:\"client_rx_packets\" value:\"0\"} metrics:{key:\"client_tx_bytes\" value:\"0\"} metrics:{key:\"client_tx_packets\" value:\"0\"} metrics
:{key:\"server_drops\" value:\"0\"} metrics:{key:\"server_rx_bytes\" value:\"0\"} metrics:{key:\"server_rx_packets\" value:\"0\"} metrics:{key:\"server_tx_bytes\" value:\"0\"} metrics:{key:\"server_tx_packets\" va
lue:\"0\"}} path_segments:{name:\"meridio-proxy-6558f\" id:\"8f406552-5c11-4f03-aec2-ac623b9191e0\" token:\"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9ibHVlL3NhL2RlZmF1bHQiLCJh
dWQiOlsic3BpZmZlOi8vZXhhbXBsZS5vcmcvbnMvZGVmYXVsdC9zYS9kZWZhdWx0Il0sImV4cCI6MTY3OTk4MzAyNX0.-yr5vQetqRMJELQr8CQUyvOLRRV-JwmRZdlOEQaM_z75DuFl3kxYr9Y1dAJtKW9ppxX2oBtyAzYTvWtSZhazVw\" expires:{seconds:1679983025 nano
s:667213424}}} network_service_endpoint_name:\"meridio-proxy-6558f\" payload:\"ETHERNET\""
  }
}
{
  "severity": "info",
  "timestamp": "2023-03-28T05:47:06.688+00:00",
  "service_id": "Meridio-tapa",
  "message": "VIPs updated",
  "version": "1.0.0",
  "extra_data": {
    "class": "Conduit",
    "instance": "load-balancer",
    "vips": [
      "1000::1:a00:2/128",
      "10.0.0.32/28",
      "1000::1:a00:20/124",
      "10.0.0.2/32"
    ]
  }
}
{
  "severity": "error",
  "timestamp": "2023-03-28T05:47:06.688+00:00",
  "service_id": "Meridio-tapa",
  "message": "opening stream",
  "version": "1.0.0",
  "extra_data": {
    "stream": "name:\"stream1\" conduit:{name:\"load-balancer\" trench:{name:\"blue\"}}",
    "error": "ips not set"
  }
}

@uablrek
Copy link
Contributor Author

uablrek commented Mar 28, 2023

I added a log printout, and when this happens gateways is empty.

{
  "severity": "info",
  "timestamp": "2023-03-28T07:47:19.268+00:00",
  "service_id": "Meridio-tapa",
  "message": "No IPs",
  "version": "1.0.0",
  "extra_data": {
    "class": "Conduit",
    "instance": "load-balancer",
    "addresses": [
      "10.0.0.32/28",
      "1000::1:a00:20/124",
      "10.0.0.2/32",
      "1000::1:a00:2/128",
      "172.17.14.5/24",
      "fd00::ac11:905/120"
    ],
    "vips": [
      "10.0.0.32/28",
      "1000::1:a00:20/124",
      "10.0.0.2/32",
      "1000::1:a00:2/128"
    ],
    "gateways": []
  }
}

@uablrek
Copy link
Contributor Author

uablrek commented Apr 4, 2023

Logs in the "proxy" shows empty gateways;

{
  "severity": "info",
  "timestamp": "2023-04-04T05:25:48.134+00:00",
  "service_id": "Meridio-proxy",
  "message": "InterfaceCreated",
  "version": "1.0.0",
  "extra_data": {
    "class": "Proxy",
    "intf": {
      "LocalIPs": [
        "172.17.2.4/24",
        "fd00::ac11:504/120"
      ],
      "NeighborIPs": [
        "172.17.2.3/24",
        "fd00::ac11:503/120"
      ],
      "Gateways": null,
      "InterfaceType": 1
    },
    "nexthops": [
      "172.17.2.1/24",
      "fd00::ac11:501/120",
      "172.17.2.3/24",
      "fd00::ac11:503/120"
    ]
  }
}

May be related;

{
  "severity": "error",
  "timestamp": "2023-04-04T05:21:34.759+00:00",
  "service_id": "Meridio-proxy",
  "message": "Setting the bridge IP",
  "version": "1.0.0",
  "extra_data": {
    "class": "Proxy",
    "error": "rpc error: code = Unknown desc = no more prefix available"
  }
}

{
  "severity": "error",
  "timestamp": "2023-04-04T05:21:36.446+00:00",
  "service_id": "Meridio-proxy",
  "message": "try attempt has failed: Error returned from github.com/nordix/meridio/pkg/nsm/ipcontext/ipcontextClient.Request: rpcerror: code = Unknown desc = no more prefix available: cannot support any of the requested mechanism",
  "version": "1.0.0",
  "extra_data": {
    "subsystem": "NSM",
    "retryClient": "Request"
  }
}

@LionelJouin
Copy link
Member

Have you changed the environment variables of the IPAM? There is a limit of number of conduits/nodes/targets specified here: https://meridio.nordix.org/docs/concepts/trench#limitations
The env variables are called IPAM_PREFIX_IPV4, IPAM_CONDUIT_PREFIX_LENGTH_IPV4 and IPAM_NODE_PREFIX_LENGTH_IPV4

@uablrek
Copy link
Contributor Author

uablrek commented Apr 4, 2023

My values. No IPAM_PREFIX_IPV4 set:

            - name: IPAM_CONDUIT_PREFIX_LENGTH_IPV4
              value: "20"
            - name: IPAM_CONDUIT_PREFIX_LENGTH_IPV6
              value: "116"
            - name: IPAM_NODE_PREFIX_LENGTH_IPV4
              value: "24"
            - name: IPAM_NODE_PREFIX_LENGTH_IPV6
              value: "120"

@LionelJouin
Copy link
Member

This message comes from the IPAM, not sure why it happens only sometimes. When you reinstall, you need to delete the persistent storage used by the IPAM and NSP. It might help.

@uablrek
Copy link
Contributor Author

uablrek commented Apr 4, 2023

Xcluster always starts from scratch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 📋 To Do
Development

No branches or pull requests

2 participants