-
Notifications
You must be signed in to change notification settings - Fork 374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very low throughput when using L7NetworkPolicies with an external host #6806
Comments
@tnqn @hongliangl it seems that this PR was aimed at solving this issue: #3957. Any idea why it was abandonned. On a side note, we should also find a way to catch this condition with our L7NP e2e tests. |
@antoninbas IIRC, we didn't get a good idea how to deal with the default behavior of checksum for antrea-gw0 and the option
cc @tnqn |
@antoninbas Interesting that for you the download is just too slow. For me it fails
Even the workaround I can also see incorrect checksum on tcpdump
|
@jsalatiel
This is traffic from the Pod to the external server. If
The only explanation I can think of (assuming |
@hongliangl I don't have a great idea. Maybe we could change
We do need to address this however, as the L7NP feature is currently broken because of this issue. |
You are correct. Restarting the client Pod (with IP |
While this is not fixed, this systemd unit may help.
|
I investigated the root cause of the low throughput and identified that Suricata is unable to send oversized packets back to OVS through For a connection between a Pod and an external network governed by an L7 NetworkPolicy, reply packets traverse the following network adapters:
Packet Flow Analysis:
Solution:To resolve the issue, the following changes must be applied to
After disabling TX checksum of l7np-antrea-gw0-tx-checksum-on.tar.gz |
@antoninbas, I discussed this with @tnqn , and we agreed it’s better to keep the current bool type. This ensures compatibility with released versions while allowing continued use of the existing option. Proper documentation is essential to clearly inform users about how the option works (if we disable TX checksum with this option), what they need to be aware of, and how to restore the default behavior if necessary. |
@antoninbas the problem is that the option is already set to |
Describe the bug
A user reported low throughout when using the following L7 NP:
While running
curl https://ash-speed.hetzner.com
will work fine, runningcurl https://ash-speed.hetzner.com/100MB.bin -o /dev/null
(essentially a speed test) will show very low throughput, less than 10Kbps.When removing the policy, the speed is much better (a few 10s Mbps in my case).
The same issue can be observed with no-TLS HTTP traffic (using
http://ash-speed.hetzner.com
) as the host.When capturing the traffic on antrea-gw0, I observed some large packets (larger than the 1500 MTU) with an incorrect checksum:
I was able to resolve the issue by disabling transmit checksum offload on antrea-gw0 (
ethtool -K antrea-gw0 tx-checksumming off
).Versions:
Antrea v2.2.0
Additional information
Surprisingly, the packets captured on eth0 on the receive path are also larger than the MTU, which is a bit surprising to me, as GRO is disabled on eth0. But maybe I am misunderstanding something and GRO doesn't apply here, for this traffic which is forwarded by the Linux kernel from eth0 to antrea-gw0.
The text was updated successfully, but these errors were encountered: