-
Notifications
You must be signed in to change notification settings - Fork 374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Flaky test] TestPacketCapture e2e test #6815
Comments
cc @hangyan |
There is a small time window between we start the capture and we apply the filter, so the first few packets maybe unrelated. this is not easy to address in the current architecture, Quan's last follow up PR has some improvement on this and it seems gone by that time. I didn't reproduce since then either. A possible solution is to add an extra layer of check after we get the packet, but that will make the code more complicated. Do you have any thoughts on this? Or we can temporary bring this testcase down? meanwhile i will post an issue to the gopacket repo. |
@hangyan Thanks for the info. Is it possible to implement this workaround on our side: https://natanyellin.com/posts/ebpf-filtering-done-right/. Apparently, this is what libpcap does (or did at the time the post was written?)
However, I am not 100% sure we have a good way to do the second step (drain the socket). When I look at https://github.com/gopacket/gopacket/blob/v1.3.1/pcapgo/capture.go, it seems that the socket is blocking, which is not ideal. That means that we may have to use the packet source as follows: func (p *pcapCapture) Capture(ctx context.Context, device string, srcIP, dstIP net.IP, packet *crdv1alpha1.Packet) (chan gopacket.Packet, error) {
// Compile the BPF filter in advance to reduce the time window between starting the capture and applying the filter.
inst := compilePacketFilter(packet, srcIP, dstIP)
klog.V(5).InfoS("Generated bpf instructions for PacketCapture", "device", device, "srcIP", srcIP, "dstIP", dstIP, "packetSpec", packet, "bpf", inst)
rawInst, err := bpf.Assemble(inst)
if err != nil {
return nil, err
}
eth, err := pcapgo.NewEthernetHandle(device)
if err != nil {
return nil, err
}
if err = eth.SetPromiscuous(false); err != nil {
return nil, err
}
// Install a BPF filter that won't match any packets
if err = eth.SetBPF(rawInstForZeroFilter); err != nil {
return nil, err
}
if err = eth.SetCaptureLength(maxSnapshotBytes); err != nil {
return nil, err
}
packetSource := gopacket.NewPacketSource(eth, layers.LinkTypeEthernet, gopacket.WithNoCopy(true))
packetCh := packetSource.PacketsCtx(ctx)
// Drain the channel
for {
select {
case <- ctx.Done():
return nil, ctx.Err()
case <- packetCh:
break
case <- time.After(50*time.Millisecond):
// timeout: channel is drained so socket is drained
// install the correct BPF filter
if err := eth.SetBPF(rawInst); err != nil {
return nil, err
}
return packetCh, nil
}
}
} It would be more elegant if we could call |
i will take a look and try this out, seems promising. Thanks. libbpf still use this. |
In PacketCapture, packets which don’t match the target BPF can be received after the socket is created and before the bpf filter is applied.This patch use a zero bpf filter(match no packet), then empty out any packets that arrived before the “zero-BPF” filter was applied.At this point the socket is definitely empty and it can’t fill up with junk because the zero-BPF is in place. Then we replace the zero-BPF with the real BPF we want. Signed-off-by: Hang Yan <[email protected]> Co-authored-by: Antonin Bas <[email protected]>
I have created a MR based on your suggestions, it worked well but the tests result is confusing compared to the old ones. It's like 10/15 chance this could happen, and 1-4 packets will be discarded before we apply the 'real' filter. The rate is bit of high, don't know why we didn't hit this so often before. |
I have seen the issue quite often, even after Quan's patch. |
In PacketCapture, packets which don’t match the target BPF can be received after the socket is created and before the bpf filter is applied. This patch uses a zero bpf filter (matches no packet), then empties out any packets that arrived before the "zero-BPF" filter was applied. At this point the socket is definitely empty and it can’t fill up with junk because the zero-BPF is in place. Then we replace the zero-BPF with the real BPF we want. Signed-off-by: Hang Yan <[email protected]> Co-authored-by: Antonin Bas <[email protected]>
@hangyan I assume we can close this now that the PR is merged? |
Yes! |
@hangyan I just noticed https://github.com/antrea-io/antrea/actions/runs/11938753429/job/33277918607. It seems to be the same type of failure, even though we have merged the fix? Edit: I guess not exactly the same type of failure, as we are actually missing a packet now, and the capture times out... |
i will take a look. on my first guess, it would be the problem that caused by the time window between we sent the test packet and applied the real filter. It shouldn't be a real problem in real world case. |
In PacketCapture, packets which don’t match the target BPF can be received after the socket is created and before the bpf filter is applied. This patch uses a zero bpf filter (matches no packet), then empties out any packets that arrived before the "zero-BPF" filter was applied. At this point the socket is definitely empty and it can’t fill up with junk because the zero-BPF is in place. Then we replace the zero-BPF with the real BPF we want. Signed-off-by: Hang Yan <[email protected]> Co-authored-by: Antonin Bas <[email protected]>
@hangyan I still see frequent e2e test failures. Should we reopen this issue or open a new one? |
reopened this one. I will create PR to tests this and see if there is a solution. Do you think we can temporary remove this case and add it back later once we figure out the root cause? |
Describe the bug
The
TestPacketCapture/testPacketCaptureBasic/testPacketCaptureBasic/ipv4-icmp-timeout
e2e test has failed in CI:The text was updated successfully, but these errors were encountered: