Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not expect set_technology to succeed #420

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

LostKobrakai
Copy link
Contributor

We're seeing a lot of noice on error tracking from vintage_net_mobile's signal monitor failing to start. I'm not sure it's a good idea to expect a technology supervision tree to successfully start. This is an attempt at this (and also to see if CI is happy about this). My mac surely didn't like having tests run on it.

@LostKobrakai LostKobrakai force-pushed the pr/failing-technology-start branch from 41b850a to ef9eb48 Compare August 22, 2022 17:00
@fhunleth
Copy link
Member

I think that I need more details on why the technology is failing to start. That really seems like a programming bug. The answer my be to handle it here, but I think that the handling would be to keep the interface down and not retry.

@LostKobrakai
Copy link
Contributor Author

13:05:34.991 [error] GenServer :"Elixir.VintageNetMobile.ExChat.ttyACM3" terminating
** (stop) :tty_error
Last message: {:continue, [speed: 115200, framing: {Circuits.UART.Framing.Line, [separator: "\r\n"]}, rx_framing_timeout: 500]}

13:05:35.023 [info]  Jul 13 13:05:35 pppd[21987]: Terminating on signal 15

13:05:35.023 [info]  Jul 13 13:05:35 pppd[21987]: Exit.

13:05:35.011 [error] GenStateMachine {VintageNet.Interface.Registry, "ppp0"} terminating
** (MatchError) no match of right hand side value: {:error, {{:shutdown, {:failed_to_start_child, VintageNetMobile.SignalMonitor, {:tty_error, {GenServer, :call, [:"Elixir.VintageNetMobile.ExChat.ttyACM3", {:register, "+CSQ", #Function<0.71622191/1 in VintageNetMobile.SignalMonitor.init/1>}, 5000]}}}}, {:child, :undefined, :technology, {Supervisor, :start_link, [[{MuonTrap.Daemon, ["pppd", ["connect", "chat -v -f /tmp/vintage_net/chatscript.ppp0", "ttyACM4", "115200", "noipdefault", "usepeerdns", "persist", "noauth", "nodetach", "debug"], [env: [{"PRIV_DIR", "/srv/erlang/lib/vintage_net_mobile-0.10.2/priv"}, {"LD_PRELOAD", "/srv/erlang/lib/vintage_net_mobile-0.10.2/priv/pppd_shim.so"}]]]}, {VintageNetMobile.ExChat, [tty: "ttyACM3", speed: 115200]}, {VintageNetMobile.SignalMonitor, [ifname: "ppp0", tty: "ttyACM3"]}, {Edith.VintageNetMobile.Modem.SierraModemInfo, [ifname: "ppp0", tty: "ttyACM3"]}, {Task, #Function<0.105909169/0 in Edith.VintageNetMobile.Modem.SierraHL8548.add_raw_config/3>}], [strategy: :one_for_all]]}, :permanent, 5000, :worker, [Supervisor]}}}
    (vintage_net 0.11.2) lib/vintage_net/interface/supervisor.ex:45: VintageNet.Interface.Supervisor.set_technology/3
    (vintage_net 0.11.2) lib/vintage_net/interface.ex:247: VintageNet.Interface.handle_event/4
    (stdlib 3.13) gen_statem.erl:1168: :gen_statem.loop_state_callback/11
    (stdlib 3.13) proc_lib.erl:226: :proc_lib.init_p_do_apply/3

That's the error we're seeing. Tbh I don't know enough about ppp/linux networking to judge if this is just a race condition or something we could control better.

@fhunleth
Copy link
Member

Got it. I think that ExChat needs to be improved. It sounds like there's a race condition between when the network interface is marked "up" and the UART appearing. That means that the code at https://github.com/nerves-networking/vintage_net_mobile/blob/main/lib/vintage_net_mobile/ex_chat.ex#L124 needs to retry after a timeout. I expect that one retry will suffice. A hack would be to add a short sleep right before the UART is opened.

The "better" fix would be for the appearance of the UART in the filesystem to trigger configuration, but that's a bigger change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants