-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sled-agent] Use zone-network-setup service after underlay is up #7260
base: main
Are you sure you want to change the base?
Conversation
Tried this on London but it didn't work. What's weird is that the sled agent never reached the part where it detects a different address https://github.com/oxidecomputer/omicron/blob/main/sled-agent/src/services.rs#L4261-L4272 . This means it technically never even touched the new code? Unsure about what's going on here. Will look later |
I've tested on a4x2 and everything came up successfully. root@g3:~# cat $(svcs -L sled-agent) | grep refreshed | looker
04:36:11.926Z INFO SledAgent (ServiceManager): refreshed zone-network-setup service with new configuration
file = sled-agent/src/services.rs:4328
04:36:12.102Z INFO SledAgent (ServiceManager): refreshed dendrite service with new configuration
file = sled-agent/src/services.rs:4452
04:36:12.152Z INFO SledAgent (ServiceManager): refreshed lldpd service with new configuration
file = sled-agent/src/services.rs:4492
04:36:12.208Z INFO SledAgent (ServiceManager): refreshed MGS service with new configuration
file = sled-agent/src/services.rs:4380
04:36:12.331Z INFO SledAgent (ServiceManager): refreshed wicketd service with new configuration
file = sled-agent/src/services.rs:4470
04:36:12.577Z INFO SledAgent (ServiceManager): refreshed mgd service with new configuration
file = sled-agent/src/services.rs:4544
04:36:12.768Z INFO SledAgent (ServiceManager): refreshed mg-ddm service with new configuration
file = sled-agent/src/services.rs:4591
root@g3:~# zlogin oxz_switch
root@oxz_switch:~# svccfg -s svc:/oxide/zone-network-setup listprop config
config application
config/datalink astring oxControlService34
config/gateway astring fd00:1122:3344:104::1
config/static_addr astring fd00:1122:3344:104::2
root@oxz_switch:~# cat /var/svc/log/oxide-zone-network-setup:default.log
[ Dec 19 04:36:00 Enabled. ]
[ Dec 19 04:36:00 Rereading configuration. ]
[ Dec 19 04:36:00 Rereading configuration. ]
[ Dec 19 04:36:11 Executing start method ("/opt/oxide/zone-setup-cli/bin/zone-setup common-networking -d oxControlService34 -s ::1 -g "). ]
note: configured to log to "/dev/stderr"
{"msg":"Ensuring IP interface exists on datalink","v":0,"name":"zone-setup","level":30,"time":"2024-12-19T04:36:11.4686168Z","hostname":"oxz_switch","pid":3090,"datalink":"oxControlService34"}
{"msg":"Setting MTU to 9000 for IPv6 and IPv4","v":0,"name":"zone-setup","level":30,"time":"2024-12-19T04:36:11.534105902Z","hostname":"oxz_switch","pid":3090,"datalink":"oxControlService34"}
{"msg":"Setting TCP recv_buf size to 1 MB","v":0,"name":"zone-setup","level":30,"time":"2024-12-19T04:36:11.542986021Z","hostname":"oxz_switch","pid":3090}
{"msg":"Setting TCP congestion control algorithm to cubic","v":0,"name":"zone-setup","level":30,"time":"2024-12-19T04:36:11.546753526Z","hostname":"oxz_switch","pid":3090}
{"msg":"Static address is localhost; will not ensure it's set on the IP interface","v":0,"name":"zone-setup","level":30,"time":"2024-12-19T04:36:11.55043676Z","hostname":"oxz_switch","pid":3090}
{"msg":"Underlay is not available yet; will not ensure default route","v":0,"name":"zone-setup","level":30,"time":"2024-12-19T04:36:11.550488207Z","hostname":"oxz_switch","pid":3090}
{"msg":"Populating hosts file for zone","v":0,"name":"zone-setup","level":30,"time":"2024-12-19T04:36:11.550496408Z","hostname":"oxz_switch","pid":3090,"zonename":"\"oxz_switch\""}
[ Dec 19 04:36:11 Method "start" exited with status 0. ]
[ Dec 19 04:36:11 Rereading configuration. ]
[ Dec 19 04:36:11 Executing refresh method ("/opt/oxide/zone-setup-cli/bin/zone-setup common-networking -d oxControlService34 -s fd00:1122:3344:104::2 -g fd00:1122:3344:104::1"). ]
note: configured to log to "/dev/stderr"
{"msg":"Ensuring IP interface exists on datalink","v":0,"name":"zone-setup","level":30,"time":"2024-12-19T04:36:11.990693716Z","hostname":"oxz_switch","pid":3121,"datalink":"oxControlService34"}
{"msg":"Setting MTU to 9000 for IPv6 and IPv4","v":0,"name":"zone-setup","level":30,"time":"2024-12-19T04:36:11.995634708Z","hostname":"oxz_switch","pid":3121,"datalink":"oxControlService34"}
{"msg":"Setting TCP recv_buf size to 1 MB","v":0,"name":"zone-setup","level":30,"time":"2024-12-19T04:36:12.00612112Z","hostname":"oxz_switch","pid":3121}
{"msg":"Setting TCP congestion control algorithm to cubic","v":0,"name":"zone-setup","level":30,"time":"2024-12-19T04:36:12.010800129Z","hostname":"oxz_switch","pid":3121}
{"msg":"Ensuring static and auto-configured addresses are set on the IP interface","v":0,"name":"zone-setup","level":30,"time":"2024-12-19T04:36:12.016213931Z","hostname":"oxz_switch","pid":3121,"static address":"fd00:1122:3344:104::2","data link":"\"oxControlService34\""}
{"msg":"Ensuring there is a default route","v":0,"name":"zone-setup","level":30,"time":"2024-12-19T04:36:12.500717653Z","hostname":"oxz_switch","pid":3121,"gateway":"Ipv6(fd00:1122:3344:104::1)"}
{"msg":"Populating hosts file for zone","v":0,"name":"zone-setup","level":30,"time":"2024-12-19T04:36:12.540005264Z","hostname":"oxz_switch","pid":3121,"zonename":"\"oxz_switch\""}
[ Dec 19 04:36:12 Method "refresh" exited with status 0. ]
root@oxz_switch:~# cat /etc/inet/hosts
::1 localhost loghost
127.0.0.1 localhost loghost
fd00:1122:3344:104::2 oxz_switch.local oxz_switch
root@oxz_switch:~# netstat -rcnv
IRE Table: IPv4
Destination Mask Gateway Device MTU Ref Flg Out In/Fwd
-------------------- --------------- -------------------- ------ ----- --- --- ------ ------
127.0.0.1 255.255.255.255 127.0.0.1 lo0 8232 2 UH 146 146
198.51.101.4/30 255.255.255.252 198.51.101.6 tfportqsfp0_0 1500 3 U 4952 0
198.51.101.12/30 255.255.255.252 198.51.101.14 tfportqsfp1_0 1500 3 U 4953 0
IRE Table: IPv6
Destination/Mask Gateway If MTU Ref Flags Out In/Fwd
--------------------------- --------------------------- ----- ----- --- ----- ------ ------
::1 ::1 lo0 8252 36 UH 307206 307272
fdb0:a840:2500:7::/64 fdb0:a840:2500:7::2 oxBootstrap34 1500 3 U 981 0
fd00:1122:3344:104::/64 fd00:1122:3344:104::2 oxControlService34 9000 4 U 3791 0
fdb0::/16 fe80::8:20ff:feb6:f3cb oxBootstrap34 1500 8 UG 2838 0
fe80::/10 fe80::aa40:25ff:fe10:2680 tfportrear3_0 1500 3 U 3297 0
fe80::/10 fe80::aa40:25ff:fe10:267f tfportrear2_0 1500 3 U 3296 0
fe80::/10 fe80::aa40:25ff:fe10:267e tfportrear1_0 1500 3 U 3212 0
fe80::/10 fe80::aa40:25ff:fe10:267d tfportrear0_0 1500 3 U 3283 0
fe80::/10 fe80::aa40:25ff:fe10:267c tfportint0_0 1500 2 U 0 0
fe80::/10 fe80::8:20ff:fe77:5014 oxControlService34 9000 2 U 0 0
fe80::/10 fe80::8:20ff:fe99:833d oxBootstrap34 1500 3 U 2 0
default fd00:1122:3344:104::1 0 6 UG 65795 0
root@oxz_switch:~# ipadm
ADDROBJ TYPE STATE ADDR
lo0/v4 static ok 127.0.0.1/8
tfportqsfp0_0/uplink1 static ok 198.51.101.6/30
tfportqsfp1_0/uplink1 static ok 198.51.101.14/30
lo0/v6 static ok ::1/128
oxBootstrap34/ll addrconf ok fe80::8:20ff:fe99:833d%oxBootstrap34/10
oxBootstrap34/bootstrap6 static ok fdb0:a840:2500:7::2/64
oxControlService34/ll addrconf ok fe80::8:20ff:fe77:5014%oxControlService34/10
oxControlService34/omicron6 static ok fd00:1122:3344:104::2/64
tfportint0_0/ll addrconf ok fe80::aa40:25ff:fe10:267c%tfportint0_0/10
tfportrear0_0/ll addrconf ok fe80::aa40:25ff:fe10:267d%tfportrear0_0/10
tfportrear1_0/ll addrconf ok fe80::aa40:25ff:fe10:267e%tfportrear1_0/10
tfportrear2_0/ll addrconf ok fe80::aa40:25ff:fe10:267f%tfportrear2_0/10
tfportrear3_0/ll addrconf ok fe80::aa40:25ff:fe10:2680%tfportrear3_0/10
root@oxz_switch:~# dladm
LINK CLASS MTU STATE BRIDGE OVER
vioif0 phys 1500 up ? --
oxBootstrap34 vnic 1500 up -- ?
oxControlService34 vnic 9000 up -- ?
tfport0 tfport 1500 up -- vioif0
tfportint0_0 tfport 1500 up -- vioif0
tfportrear0_0 tfport 1500 up -- vioif0
tfportrear1_0 tfport 1500 up -- vioif0
tfportrear2_0 tfport 1500 up -- vioif0
tfportrear3_0 tfport 1500 up -- vioif0
tfportqsfp0_0 tfport 1500 up -- vioif0
tfportqsfp1_0 tfport 1500 up -- vioif0 I'm pretty confident this will work just fine, but as a precaution I'd like to run it on madrid or london before merging. Will do so once either are available to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just a few small questions, but mostly pretty minimal. Thanks!
Err(e) => { | ||
// If a property already doesn't exist we don't need to | ||
// return an error | ||
if !e.to_string().contains("No such property") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd hit this if one tried to delete a property that doesn't exist, right? As opposed to deleting a value that doesn't exist from a property that does. Is that what we want? I could see an argument for failing if we try to deleting a non-existent property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, yeah I see your point.
In this case, after deleting we run the svccfg addpropvalue
command with the value type, which means the property is created if it doesn't exist. So, it seems like it's not a big deal if we don't return an error if the property didn't exist. If the property wasn't created previously for whatever reason, it's OK because we're creating it anyway.
That said, returning the error like you propose, would guarantee that we are not adding a property where it's not meant to be.
I'm a little torn on this one. WDYT?
sled-agent/src/services.rs
Outdated
info!( | ||
nsmfh.addpropvalue_type( | ||
"config/gateway", | ||
&format!("{}", info.underlay_address), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method takes any type that impls ToString
, so we don't need to format here. We can pass &info.underlay_address
I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha! yes. good catch
@@ -165,6 +165,38 @@ impl<'t> SmfHelper<'t> { | |||
Ok(()) | |||
} | |||
|
|||
pub fn delpropvalue<P, V>(&self, prop: P, val: V) -> Result<(), Error> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this differ from the existing delpropvalue_default_instance
method below? There's only one (default) instance of this service in a zone AFAIU, so they seem like they should be equivalent. I could be missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're almost the same, yes. You can set properties on the instance itself and/or on the service.
Example:
coatlicue@centzon:~$ svccfg -s svc:/system/fmd:default listprop
general framework
general/comment astring
general/enabled boolean true
restarter framework NONPERSISTENT
restarter/logfile astring /var/svc/log/system-fmd:default.log
restarter/contract count 89
restarter/start_pid count 1516
restarter/start_method_timestamp time 1734493550.186287000
restarter/start_method_waitstatus integer 0
restarter/auxiliary_state astring dependencies_satisfied
restarter/next_state astring none
restarter/state astring online
restarter/state_timestamp time 1734493550.187733000
coatlicue@centzon:~$ svccfg -s svc:/system/fmd listprop
dumpadm-fmd dependency
dumpadm-fmd/entities fmri svc:/system/dumpadm
dumpadm-fmd/external boolean true
dumpadm-fmd/grouping astring require_all
dumpadm-fmd/restart_on astring none
dumpadm-fmd/type astring service
SUNWfmd dependency
SUNWfmd/entities fmri file://localhost/usr/lib/fm/fmd/fmd
SUNWfmd/grouping astring require_all
SUNWfmd/restart_on astring none
SUNWfmd/type astring path
<more properties>
The zone-network-service has it's properties set on the service directly so the other command would not delete the correct property values
<!-- We use the same command as the start method as it is safe to rerun. --> | ||
<exec_method type='method' name='refresh' | ||
exec='/opt/oxide/zone-setup-cli/bin/zone-setup common-networking -d %{config/datalink} -s %{config/static_addr} -g %{config/gateway}' | ||
timeout_seconds='0' /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the service is currently running when it's refreshed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing really. All of the commands on that service are idempotent (I also manually refreshed this service multiple times to see what happened).
Additionally, it's highly unlikely this would happen. Several other services on the switch zone depend on this service, which means we wouldn't even get to the point where the service is refreshed, if the zone-network-setup service hadn't exited with 0 (as far as I understand how the switch zone startup works, I could be wrong though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking the time to review! I've addressed your comments and still on the fence about one. Let me know what you think :)
sled-agent/src/services.rs
Outdated
info!( | ||
nsmfh.addpropvalue_type( | ||
"config/gateway", | ||
&format!("{}", info.underlay_address), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha! yes. good catch
@@ -165,6 +165,38 @@ impl<'t> SmfHelper<'t> { | |||
Ok(()) | |||
} | |||
|
|||
pub fn delpropvalue<P, V>(&self, prop: P, val: V) -> Result<(), Error> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're almost the same, yes. You can set properties on the instance itself and/or on the service.
Example:
coatlicue@centzon:~$ svccfg -s svc:/system/fmd:default listprop
general framework
general/comment astring
general/enabled boolean true
restarter framework NONPERSISTENT
restarter/logfile astring /var/svc/log/system-fmd:default.log
restarter/contract count 89
restarter/start_pid count 1516
restarter/start_method_timestamp time 1734493550.186287000
restarter/start_method_waitstatus integer 0
restarter/auxiliary_state astring dependencies_satisfied
restarter/next_state astring none
restarter/state astring online
restarter/state_timestamp time 1734493550.187733000
coatlicue@centzon:~$ svccfg -s svc:/system/fmd listprop
dumpadm-fmd dependency
dumpadm-fmd/entities fmri svc:/system/dumpadm
dumpadm-fmd/external boolean true
dumpadm-fmd/grouping astring require_all
dumpadm-fmd/restart_on astring none
dumpadm-fmd/type astring service
SUNWfmd dependency
SUNWfmd/entities fmri file://localhost/usr/lib/fm/fmd/fmd
SUNWfmd/grouping astring require_all
SUNWfmd/restart_on astring none
SUNWfmd/type astring path
<more properties>
The zone-network-service has it's properties set on the service directly so the other command would not delete the correct property values
Err(e) => { | ||
// If a property already doesn't exist we don't need to | ||
// return an error | ||
if !e.to_string().contains("No such property") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, yeah I see your point.
In this case, after deleting we run the svccfg addpropvalue
command with the value type, which means the property is created if it doesn't exist. So, it seems like it's not a big deal if we don't return an error if the property didn't exist. If the property wasn't created previously for whatever reason, it's OK because we're creating it anyway.
That said, returning the error like you propose, would guarantee that we are not adding a property where it's not meant to be.
I'm a little torn on this one. WDYT?
<!-- We use the same command as the start method as it is safe to rerun. --> | ||
<exec_method type='method' name='refresh' | ||
exec='/opt/oxide/zone-setup-cli/bin/zone-setup common-networking -d %{config/datalink} -s %{config/static_addr} -g %{config/gateway}' | ||
timeout_seconds='0' /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing really. All of the commands on that service are idempotent (I also manually refreshed this service multiple times to see what happened).
Additionally, it's highly unlikely this would happen. Several other services on the switch zone depend on this service, which means we wouldn't even get to the point where the service is refreshed, if the zone-network-setup service hadn't exited with 0 (as far as I understand how the switch zone startup works, I could be wrong though).
Thanks for the review! I'll merge once I've ran this on London or Madrid |
This commit removes the need to zlogin into the switch zone to set the underlay IP address and new static address after the underlay is up. Instead it adds the new values to the SMF properties and refreshes the zone-network-setup service. This service is safe to rerun as all it's commands ensure things are there, but does not throw an error if they are.
Closes: #6157