Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DHCP server reconfiguration does not always take effect #936

Open
mattiaswal opened this issue Feb 8, 2025 · 2 comments
Open

DHCP server reconfiguration does not always take effect #936

mattiaswal opened this issue Feb 8, 2025 · 2 comments
Assignees
Labels
bug Something isn't working triage Pending investigation & classification (CCB)

Comments

@mattiaswal
Copy link
Contributor

mattiaswal commented Feb 8, 2025

NOTE I have seen this in the test server_host and server_subnet, in both tests DHCP server starts working if I:

killall dnsmasq

When looping until fail (on my computer, x86_64) with some PYTHONHASHSEEDs:

e=0; while [ $? -eq 0  ]; do e=$((e+1)); echo "Loop $e"; PYTHONHASHSEED=12395139  ./case/infix_dhcp/server_host/test.py  && ./case/infix_dhcp/server_subnets/test.py  ; done

the test may sometimes fail:

Starting test 0007-server_subnets
2025-02-08 12:42:57 # Starting (2025-02-08 12:42:57)
2025-02-08 12:42:57 # host:    host
2025-02-08 12:42:57 #     client1: d1a
2025-02-08 12:42:57 #     client3: d3a
2025-02-08 12:42:57 #     server:  d2a
2025-02-08 12:42:57 #     client2: d4a
2025-02-08 12:42:57 # client1: dut1
2025-02-08 12:42:57 #     server:  e5
2025-02-08 12:42:57 #     mgmt:    e1
2025-02-08 12:42:57 # server:  dut2
2025-02-08 12:42:57 #     client1: e6
2025-02-08 12:42:57 #     client3: e5
2025-02-08 12:42:57 #     mgmt:    e1
2025-02-08 12:42:57 #     client2: e7
2025-02-08 12:42:57 # client2: dut4
2025-02-08 12:42:57 #     server:  e7
2025-02-08 12:42:57 #     mgmt:    e1
2025-02-08 12:42:57 # client3: dut3
2025-02-08 12:42:57 #     server:  e6
2025-02-08 12:42:57 #     mgmt:    e1
2025-02-08 12:42:57 #
2025-02-08 12:42:57 # Probing dut2 on port d2a for IPv6LL mgmt address ...
2025-02-08 12:42:57 # Testing using RESTCONF
2025-02-08 12:42:57 # YANG models downloaded.
2025-02-08 12:42:58 # Probing dut1 on port d1a for IPv6LL mgmt address ...
2025-02-08 12:42:58 # Testing using RESTCONF
2025-02-08 12:42:58 # YANG models downloaded.
2025-02-08 12:43:00 # Probing dut4 on port d4a for IPv6LL mgmt address ...
2025-02-08 12:43:00 # Testing using RESTCONF
2025-02-08 12:43:00 # YANG models downloaded.
2025-02-08 12:43:01 # Probing dut3 on port d3a for IPv6LL mgmt address ...
2025-02-08 12:43:01 # Testing using RESTCONF
2025-02-08 12:43:01 # YANG models downloaded.
2025-02-08 12:43:02 ok 1 - Set up topology and attach to client and server DUTs
2025-02-08 12:43:04 ok 2 - Configure DHCP server and clients
2025-02-08 12:43:14 not ok 3 - Verify DHCP client1 get correct lease
2025-02-08 12:43:14 # Exiting (2025-02-08 12:43:14)
2025-02-08 12:43:14 # Traceback (most recent call last):
2025-02-08 12:43:14 #   File "/home/lazzer/Documents/github/infix/test/case/infix_dhcp/server_subnets/test.py", line 287, in 
2025-02-08 12:43:14 #     until(lambda: iface.address_exist(client1, client1["server"], POOL1))
2025-02-08 12:43:14 #   File "/home/lazzer/Documents/github/infix/test/infamy/util.py", line 46, in until
2025-02-08 12:43:14 #     raise Exception("Expected condition did not materialize")
2025-02-08 12:43:14 # Exception: Expected condition did not materialize
2025-02-08 12:43:14 #
test error, no plan
Aborting execution
@mattiaswal mattiaswal added bug Something isn't working triage Pending investigation & classification (CCB) labels Feb 8, 2025
@mattiaswal mattiaswal changed the title DHCP subnet test fails sometimes DHCP server reconfiguration does not always take effect Feb 10, 2025
@troglobit
Copy link
Contributor

Reproduced, both on my lingering-containers branch and main.

Analysis

  • Server DUT has applied most of the config, e.g., hostname has changed but /log/syslog still reports test-config hostname
  • Calling initctl reload manually rolls out remaining configuration changes, restarting all queued up daemons: syslogd, dnsmasq, etc.

Hypothesis A

  • Final call to initctl reload in confd:core_post_hook() is not made
  • Possibly the core_post_hook() is sometimes not called properly

Hypothesis B

Possible bug in netopeer2-server:

Feb 11 13:53:19 foo udhcpc[5334]: deleting DHCP routes from e7
Feb 11 13:53:19 foo confd[3287]: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!! CALLED INITCTL RELOAD !!!!!!!!!!!!!!!!!!!!!!!!!!!!
Feb 11 13:53:19 foo finit[1]: Restarting syslogd[2393], sending SIGHUP ...

>>> System 'reset' by applying test-config', first daemon restarted <<<

Feb 11 13:53:19 test-00-01-00 syslogd[2393]: syslogd v2.5.2: restart.
Feb 11 13:53:19 test-00-01-00 finit[1]: Restarting sshd[3917], sending SIGHUP ...
Feb 11 13:53:19 test-00-01-00 staticd[5234]: route_notify_owner: Route 0.0.0.0/0 failed to install for table: 254
Feb 11 13:53:23 test-00-01-00 netopeer2-server[3875]: user "admin" called RPC /ietf-netconf:edit-config
Feb 11 13:53:24 test-00-01-00 netopeer2-server[3875]: user "admin" commiting changes to running ...

>>> New config from ./case/infix_dhcp/server_subnets/test.py is applied <<<

Feb 11 13:53:24 test-00-01-00 kernel: 8021q: adding VLAN 0 to HW filter on device e5
Feb 11 13:53:24 test-00-01-00 kernel: 8021q: adding VLAN 0 to HW filter on device e6
Feb 11 13:53:24 test-00-01-00 kernel: br0: port 1(e5) entered blocking state
Feb 11 13:53:24 test-00-01-00 kernel: br0: port 1(e5) entered disabled state
Feb 11 13:53:24 test-00-01-00 kernel: virtio_net virtio4 e5: entered allmulticast mode
Feb 11 13:53:24 test-00-01-00 kernel: virtio_net virtio4 e5: entered promiscuous mode
Feb 11 13:53:24 test-00-01-00 kernel: br0: port 1(e5) entered blocking state
Feb 11 13:53:24 test-00-01-00 kernel: br0: port 1(e5) entered forwarding state
Feb 11 13:53:24 test-00-01-00 kernel: br0: port 2(e6) entered blocking state
Feb 11 13:53:24 test-00-01-00 kernel: br0: port 2(e6) entered disabled state
Feb 11 13:53:24 test-00-01-00 kernel: virtio_net virtio5 e6: entered allmulticast mode
Feb 11 13:53:24 test-00-01-00 kernel: virtio_net virtio5 e6: entered promiscuous mode
Feb 11 13:53:24 test-00-01-00 kernel: br0: port 2(e6) entered blocking state
Feb 11 13:53:24 test-00-01-00 kernel: br0: port 2(e6) entered forwarding state
Feb 11 13:53:24 test-00-01-00 kernel: 8021q: adding VLAN 0 to HW filter on device e7
Feb 11 13:53:24 test-00-01-00 dagger[3287]: Evolved to generation 3

>>> confd has finished setting up networking using 'dagger evolve' <<<

Feb 11 13:53:24 test-00-01-00 finit[1]: service_register():/etc/finit.d/enabled/ttyd.conf: skipping ttyd: No such file or directory
Feb 11 13:53:24 test-00-01-00 confd[3287]: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!! CALLED INITCTL RELOAD !!!!!!!!!!!!!!!!!!!!!!!!!!!!

>>> User debug instrumentation in confd:core.c <<<

Feb 11 13:53:24 test-00-01-00 netopeer2-server[3875]: user "admin" committed changes to running.
Feb 11 13:53:36 test-00-01-00 netopeer2-server[3875]: SSH: ssh_socket_exception_callback: Socket exception callback: 1 (0)
Feb 11 13:53:36 test-00-01-00 netopeer2-server[3875]: SSH: ssh_socket_exception_callback: Socket error: disconnected
Feb 11 13:53:36 test-00-01-00 netopeer2-server[3875]: Session 2: SSH channel poll error (Socket error: disconnected).

>>> Test b0rks here: possible bug in netopper2-server??? <<<

Feb 11 13:53:41 test-00-01-00 finit[3883]: Starting getty on /dev/hvc0

>>> User logs in here to debug. <<<

>>> User calls 'initctl reload' manually <<<

Feb 11 13:56:10 test-00-01-00 finit[1]: service_register():/etc/finit.d/enabled/ttyd.conf: skipping ttyd: No such file or directory
Feb 11 13:56:10 test-00-01-00 finit[1]: Restarting syslogd[2393], sending SIGHUP ...
Feb 11 13:56:10 server syslogd[2393]: syslogd v2.5.2: restart.

>>> Hostname from new config in test is applied to first daemon ... <<<<

Feb 11 13:56:10 server finit[1]: Stopping dnsmasq[5082], sending SIGTERM ...
Feb 11 13:56:10 server dnsmasq[5082]: exiting on receipt of SIGTERM
Feb 11 13:56:10 server dnsmasq[5809]: started, version 2.90 cachesize 150
Feb 11 13:56:10 server dnsmasq[5809]: compile time options: IPv6 GNU-getopt DBus no-UBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset no-nftset auth no-cryptohash no-DNSSEC loop-detect inotify dumpfile
Feb 11 13:56:10 server dnsmasq[5809]: DBus support enabled: connected to system bus
Feb 11 13:56:10 server dnsmasq-dhcp[5809]: DHCP, IP range 192.168.2.200 -- 192.168.2.200, lease time 1h
Feb 11 13:56:10 server dnsmasq-dhcp[5809]: DHCP, IP range 192.168.1.100 -- 192.168.1.100, lease time 1h
Feb 11 13:56:10 server dnsmasq[5809]: no servers found in /var/lib/misc/resolv.conf, will retry
Feb 11 13:56:10 server dnsmasq[5809]: read /etc/hosts - 2 names
Feb 11 13:56:10 server finit[1]: Starting dnsmasq[5809]
Feb 11 13:56:18 server dnsmasq-dhcp[5809]: DHCPDISCOVER(br0) 00:a0:85:00:02:06 
Feb 11 13:56:18 server dnsmasq-dhcp[5809]: DHCPOFFER(br0) 192.168.2.200 00:a0:85:00:02:06 
Feb 11 13:56:18 server dnsmasq-dhcp[5809]: Ignoring duplicate dhcp-option 6
Feb 11 13:56:21 server dnsmasq-dhcp[5809]: DHCPDISCOVER(e7) 00:a0:85:00:03:07 
Feb 11 13:56:21 server dnsmasq-dhcp[5809]: DHCPOFFER(e7) 192.168.1.100 00:a0:85:00:03:07 
Feb 11 13:56:21 server dnsmasq-dhcp[5809]: Ignoring duplicate dhcp-option 6
Feb 11 13:56:21 server dnsmasq-dhcp[5809]: DHCPDISCOVER(br0) 00:a0:85:00:04:05 
Feb 11 13:56:21 server dnsmasq-dhcp[5809]: DHCPOFFER(br0) 192.168.2.22 00:a0:85:00:04:05 
Feb 11 13:56:21 server dnsmasq-dhcp[5809]: Ignoring duplicate dhcp-option 6
Feb 11 13:56:21 server dnsmasq-dhcp[5809]: DHCPDISCOVER(br0) 00:a0:85:00:02:06 
Feb 11 13:56:21 server dnsmasq-dhcp[5809]: DHCPOFFER(br0) 192.168.2.200 00:a0:85:00:02:06 
Feb 11 13:56:21 server dnsmasq-dhcp[5809]: Ignoring duplicate dhcp-option 6
Feb 11 13:56:21 server dnsmasq-dhcp[5809]: DHCPREQUEST(br0) 192.168.2.200 00:a0:85:00:02:06 
Feb 11 13:56:21 server dnsmasq-dhcp[5809]: DHCPACK(br0) 192.168.2.200 00:a0:85:00:02:06 client3
Feb 11 13:56:21 server dnsmasq-dhcp[5809]: Ignoring duplicate dhcp-option 6
Feb 11 13:56:21 server dnsmasq-dhcp[5809]: DHCPDISCOVER(e7) 00:a0:85:00:03:07 
Feb 11 13:56:21 server dnsmasq-dhcp[5809]: DHCPOFFER(e7) 192.168.1.100 00:a0:85:00:03:07 
Feb 11 13:56:21 server dnsmasq-dhcp[5809]: Ignoring duplicate dhcp-option 6
Feb 11 13:56:21 server dnsmasq-dhcp[5809]: DHCPDISCOVER(br0) 00:a0:85:00:04:05 
Feb 11 13:56:21 server dnsmasq-dhcp[5809]: DHCPOFFER(br0) 192.168.2.22 00:a0:85:00:04:05 
Feb 11 13:56:21 server dnsmasq-dhcp[5809]: Ignoring duplicate dhcp-option 6
Feb 11 13:56:21 server dnsmasq-dhcp[5809]: DHCPDISCOVER(e7) 00:a0:85:00:03:07 
Feb 11 13:56:21 server dnsmasq-dhcp[5809]: DHCPOFFER(e7) 192.168.1.100 00:a0:85:00:03:07 
Feb 11 13:56:21 server dnsmasq-dhcp[5809]: Ignoring duplicate dhcp-option 6
Feb 11 13:56:21 server dnsmasq-dhcp[5809]: DHCPREQUEST(br0) 192.168.2.200 00:a0:85:00:02:06 
Feb 11 13:56:21 server dnsmasq-dhcp[5809]: DHCPACK(br0) 192.168.2.200 00:a0:85:00:02:06 client3
Feb 11 13:56:21 server dnsmasq-dhcp[5809]: Ignoring duplicate dhcp-option 6
Feb 11 13:56:21 server dnsmasq-dhcp[5809]: DHCPREQUEST(br0) 192.168.2.22 00:a0:85:00:04:05 
Feb 11 13:56:21 server dnsmasq-dhcp[5809]: DHCPACK(br0) 192.168.2.22 00:a0:85:00:04:05 client2
Feb 11 13:56:21 server dnsmasq-dhcp[5809]: Ignoring duplicate dhcp-option 6
Feb 11 13:56:21 server dnsmasq-dhcp[5809]: DHCPREQUEST(e7) 192.168.1.100 00:a0:85:00:03:07 
Feb 11 13:56:21 server dnsmasq-dhcp[5809]: DHCPACK(e7) 192.168.1.100 00:a0:85:00:03:07 client1
Feb 11 13:56:21 server dnsmasq-dhcp[5809]: Ignoring duplicate dhcp-option 6

@troglobit
Copy link
Contributor

Root cause found. Event ordering issue in Finit, see troglobit/finit#425 for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Pending investigation & classification (CCB)
Projects
Status: No status
Development

No branches or pull requests

2 participants