Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wwan0 gets stuck on :lan connection status after a VintageNet.deconfigure/configure #124

Open
taun opened this issue Sep 26, 2022 · 4 comments

Comments

@taun
Copy link

taun commented Sep 26, 2022

Describe the bug
Modem: Quectel BG96
Starting with a good working :internet connection on wwan0

To Reproduce
Steps to reproduce the behavior:

  1. Get and save current configuration: iex>VintageNet.get_by_prefix(["interface", "wwan0", "config"])
  2. iex>VintageNet.deconfigure("wwan0", persist: false)
  3. iex>VintageNet.configure("wwan0", config)
  4. iex>VintageNet.info() # will say :lan and never change to :internet
  5. iex>ping("google.com", ifname: "wwan0") # responds with pings results that show there is internet

Expected behavior
It was expected that deconfigure followed by a configure with no other change would result in restoring the :internet connection for "wwan0"

VintageNet Info

Output of VintageNet.info

Note, we have given route priority to wwan0 over eth0.

Before deconfigure

iex([email protected])2> VintageNet.info
VintageNet 0.12.2

All interfaces:       ["eth0", "lo", "usb0", "wwan0"]
Available interfaces: ["wwan0", "eth0"]

Interface eth0
  Type: VintageNetEthernet
  Present: true
  State: :configured (0:01:11)
  Connection: :internet (54.6 s)
  Addresses: fd0f:849b:62fc:483c:3684:e4ff:fe1c:32cc/64, fe80::3684:e4ff:fe1c:32cc/64, 192.168.121.155/24
  MAC Address: "34:84:e4:1c:32:cc"
  Configuration:
    %{
      ipv4: %{method: :dhcp},
      mac_address: {MeshGateway.FirmwareKv, :ethernet_mac, []},
      type: VintageNetEthernet
    }

Interface wlan0
  Type: VintageNetWiFi
  Present: false
  Configuration:
    %{
      ipv4: %{method: :dhcp},
      type: VintageNetWiFi,
      vintage_net_wifi: %{
        networks: [
          %{key_mgmt: :wpa_psk, priority: 75, psk: "....", ssid: nil},
          %{key_mgmt: :wpa_psk, priority: 20, psk: "....", ssid: nil}
        ]
      }
    }

Interface wwan0
  Type: VintageNetQMI
  Present: true
  State: :configured (0:01:11)
  Connection: :internet (38.4 s)
  Addresses: 100.76.249.106/30, fe80::719b:17a6:45ca:413/64
  MAC Address: nil
  Configuration:
    %{
      type: VintageNetQMI,
      vintage_net_qmi: %{
        service_providers: [
          %{apn: "super", only_iccid_prefixes: ["8988307"]},
          %{apn: "fast.t-mobile.com", only_iccid_prefixes: ["890126"]},
          %{apn: "fast.t-mobile.com"}
        ]
      }
    }


:ok

After deconfigure

iex([email protected])8> VintageNet.deconfigure("wwan0", persist: false)
:ok
iex([email protected])9> VintageNet.info()                              
VintageNet 0.12.2

All interfaces:       ["eth0", "lo", "usb0", "wwan0"]
Available interfaces: ["eth0"]

Interface eth0
  Type: VintageNetEthernet
  Present: true
  State: :configured (0:03:37)
  Connection: :internet (0:03:21)
  Addresses: fd0f:849b:62fc:483c:3684:e4ff:fe1c:32cc/64, fe80::3684:e4ff:fe1c:32cc/64, 192.168.121.155/24
  MAC Address: "34:84:e4:1c:32:cc"
  Configuration:
    %{
      ipv4: %{method: :dhcp},
      mac_address: {MeshGateway.FirmwareKv, :ethernet_mac, []},
      type: VintageNetEthernet
    }

Interface wlan0
  Type: VintageNetWiFi
  Present: false
  Configuration:
    %{
      ipv4: %{method: :dhcp},
      type: VintageNetWiFi,
      vintage_net_wifi: %{
        networks: [
          %{key_mgmt: :wpa_psk, priority: 75, psk: "....", ssid: nil},
          %{key_mgmt: :wpa_psk, priority: 20, psk: "....", ssid: nil}
        ]
      }
    }

Interface wwan0
  Type: VintageNet.Technology.Null
  Present: true
  State: :configured (2.6 s)
  Connection: :disconnected (2.6 s)
  MAC Address: nil
  Configuration:
    %{type: VintageNet.Technology.Null}


:ok

After configure

iex([email protected])10> VintageNet.configure("wwan0", config)
:ok
iex([email protected])11> VintageNet.info()                    
VintageNet 0.12.2

All interfaces:       ["eth0", "lo", "usb0", "wwan0"]
Available interfaces: ["eth0", "wwan0"]

Interface eth0
  Type: VintageNetEthernet
  Present: true
  State: :configured (0:04:02)
  Connection: :internet (0:03:46)
  Addresses: fd0f:849b:62fc:483c:3684:e4ff:fe1c:32cc/64, fe80::3684:e4ff:fe1c:32cc/64, 192.168.121.155/24
  MAC Address: "34:84:e4:1c:32:cc"
  Configuration:
    %{
      ipv4: %{method: :dhcp},
      mac_address: {MeshGateway.FirmwareKv, :ethernet_mac, []},
      type: VintageNetEthernet
    }

Interface wlan0
  Type: VintageNetWiFi
  Present: false
  Configuration:
    %{
      ipv4: %{method: :dhcp},
      type: VintageNetWiFi,
      vintage_net_wifi: %{
        networks: [
          %{key_mgmt: :wpa_psk, priority: 75, psk: "....", ssid: nil},
          %{key_mgmt: :wpa_psk, priority: 20, psk: "....", ssid: nil}
        ]
      }
    }

Interface wwan0
  Type: VintageNetQMI
  Present: true
  State: :configured (2.3 s)
  Connection: :lan (1.9 s)
  Addresses: 100.76.249.106/30, fe80::719b:17a6:45ca:413/64
  MAC Address: nil
  Configuration:
    %{
      type: VintageNetQMI,
      vintage_net_qmi: %{
        service_providers: [
          %{apn: "super", only_iccid_prefixes: ["8988307"]},
          %{apn: "fast.t-mobile.com", only_iccid_prefixes: ["890126"]},
          %{apn: "fast.t-mobile.com"}
        ]
      }
    }


:ok

iex([email protected])18> ping("google.com", ifname: "wwan0")
Press enter to stop
Response from google.com (142.250.80.110:80): time=289.843ms
Response from google.com (142.250.80.110:80): time=111.614ms
Response from google.com (142.250.80.110:80): time=114.4ms
Response from google.com (142.250.80.110:80): time=108.359ms


Environment

  • Nerves environment: (mix nerves.env --info)
 mix nerves.env --info
==> nerves
warning: String.strip/1 is deprecated. Use String.trim/1 instead
  /Users/taun/Development/Client/ROAR/development/Purple/backend/elixir/purple_umbrella_nerves/apps/mesh_gateway/deps/poison/mix.exs:4: Poison.Mixfile

==> mesh_gateway
|nerves_bootstrap| Environment Package List

  Pkg:         bbb_purple
  Vsn:         2.15.2
  Type:        system
  BuildRunner: {Nerves.Artifact.BuildRunners.Docker, [make_args: ["source", "all", "legal-info"]]}

  Pkg:         nerves_system_br
  Vsn:         1.20.4
  Type:        system_platform
  BuildRunner: {nil, []}

  Pkg:         nerves_toolchain_ctng
  Vsn:         1.9.0
  Type:        toolchain_platform
  BuildRunner: {nil, []}

  Pkg:         nerves_toolchain_armv7_nerves_linux_gnueabihf
  Vsn:         1.6.0
  Type:        toolchain
  BuildRunner: {Nerves.Artifact.BuildRunners.Local, []}

|nerves_bootstrap| Loadpaths Start

|nerves_bootstrap| Precompile Start

==> nerves_toolchain_armv7_nerves_linux_gnueabihf
|nerves_bootstrap| Compile.NervesPackage start

|nerves_bootstrap| Compile.NervesPackage end

==> bbb_purple
|nerves_bootstrap| Compile.NervesPackage start

|nerves_bootstrap| Compile.NervesPackage end

==> mesh_gateway
|nerves_bootstrap| Precompile End


Nerves environment
  MIX_TARGET:   bbb_purple
  MIX_ENV:      dev

|nerves_bootstrap| Environment Variable List
  target:     bbb_purple
  toolchain:  /Users/taun/.nerves/artifacts/nerves_toolchain_armv7_nerves_linux_gnueabihf-darwin_arm-1.6.0
  system:     /Users/taun/.nerves/artifacts/bbb_purple-portable-2.15.2
  app:        .

|nerves_bootstrap| Loadpaths End
  • Additional information about your host, target hardware or environment that might help
  • target is a BeagleBone Green (not wireless or gateway)
  • The Quectel BG96 modem is on a PCIe to USB adapter and plugged into the BBG USB.

Additional context
Add any other context about the problem here.

  • We are occasionally (after hours or days) losing our LTE connection. It appears that part of the problem is when we lose and regain the connection, it is stuck in :lan. Similar to when performing the deconfigure/configure outlined above.
@taun taun changed the title wwan0 gets stuck on :lan status after a VintageNet.deconfigure/configure wwan0 gets stuck on :lan connection status after a VintageNet.deconfigure/configure Sep 26, 2022
@fhunleth
Copy link
Member

Hi @taun,

Thanks for reporting this and giving all of the detail. There's definitely an issue here.

I'm distracted by one thing in the report which is may be a red herring. Are you using a Twilio or T-Mobile SIM on the device that's losing its connection?

If you're using Twilio, then roaming needs to be allowed for the super SIM so the line should be:

%{apn: "super", roaming_allowed?: true, only_iccid_prefixes: ["8988307"]}

See the Twilio roaming document.

Of course, vintage_net shouldn't revert the interface to :lan, but please humor me with the roaming question.

@taun
Copy link
Author

taun commented Sep 27, 2022

Thanks for looking at it @fhunleth.

Yes, all of the above is with Twilio Super SIMs.

Roaming was not specifically enabled however I have been seeing "roaming", true as shown below. The following is from a test site where the uptime has been 4 days as of today. None of the other units are currently showing a roaming status although I have seen a roaming, true status for some of the others occasionally.

I will specifically enable roaming and continue to test. But as you say, that is a separate issue. Every little bit helps.

iex([email protected])1> VintageNet.get_by_prefix(["interface", "wwan0"])
[
  {["interface", "wwan0", "addresses"],
   [
     %{
       address: {100, 65, 198, 252},
       family: :inet,
       netmask: {255, 255, 255, 248},
       prefix_length: 29,
       scope: :universe
     },
     %{
       address: {65152, 0, 0, 0, 15176, 46037, 33688, 13116},
       family: :inet6,
       netmask: {65535, 65535, 65535, 65535, 0, 0, 0, 0},
       prefix_length: 64,
       scope: :link
     }
   ]},
  {["interface", "wwan0", "config"],
   %{
     type: VintageNetQMI,
     vintage_net_qmi: %{service_providers: [%{apn: "super"}]}
   }},
  {["interface", "wwan0", "connection"], :internet},
  {["interface", "wwan0", "hw_path"],
   "/devices/platform/ocp/47400000.target-module/47401c00.usb/musb-hdrc.1/usb1/1-1/1-1:1.4"},
  {["interface", "wwan0", "lower_up"], true},
  {["interface", "wwan0", "mobile", "access_technology"], :lte},
  {["interface", "wwan0", "mobile", "apn"], "super"},
  {["interface", "wwan0", "mobile", "band"], "LTE Band 12"},
  {["interface", "wwan0", "mobile", "channel"], 5035},
  {["interface", "wwan0", "mobile", "cid"], 37682181},
  {["interface", "wwan0", "mobile", "iccid"], "89883070000029937537"},
  {["interface", "wwan0", "mobile", "imei"], "860640055596581"},
  {["interface", "wwan0", "mobile", "imeisv_svn"], "7"},
  {["interface", "wwan0", "mobile", "lac"], 65534},
  {["interface", "wwan0", "mobile", "manufacturer"], "Quectel"},
  {["interface", "wwan0", "mobile", "mcc"], 234},
  {["interface", "wwan0", "mobile", "mnc"], 10},
  {["interface", "wwan0", "mobile", "model"], "0"},
  {["interface", "wwan0", "mobile", "network_datetime"],
   ~N[2022-09-23 20:21:20]},
  {["interface", "wwan0", "mobile", "provider"], "O2 - UK"},
  {["interface", "wwan0", "mobile", "roaming"], true},
  {["interface", "wwan0", "mobile", "signal_4bars"], 3},
  {["interface", "wwan0", "mobile", "signal_asu"], 24},
  {["interface", "wwan0", "mobile", "signal_dbm"], -65},
  {["interface", "wwan0", "mobile", "statistics"],
   %{
     rx_bytes: 38904,
     rx_drops: 0,
     rx_errors: 0,
     rx_overflows: 0,
     rx_packets: 510,
     timestamp: 347542328721809,
     tx_bytes: 39640,
     tx_drops: 0,
     tx_errors: 0,
     tx_overflows: 0,
     tx_packets: 522
   }},
  {["interface", "wwan0", "mobile", "std_offset"], 3600},
  {["interface", "wwan0", "mobile", "utc_offset"], -18000},
  {["interface", "wwan0", "present"], true},
  {["interface", "wwan0", "state"], :configured},
  {["interface", "wwan0", "type"], VintageNetQMI}
]

@taun
Copy link
Author

taun commented Oct 21, 2022

Roaming has been enabled and we are still seeing connection state issues. Meaning, the modem is connected and the internet is pingable when specifying the ifname but the VintageNet connection state is :lan or even :disconnected.


After more investigation, it seems our issues and this one might be related to :client_ids_exhausted

I have lots of logs but the current premise is:

  • The modem driver has a limited number of client_ids it can hand out.
  • QMI.ClientIDCache GenServer caches these ids to be efficient and prevent id exhaustion.
  • If QMI.ClientIDCache is terminated, the cache is lost and new clients ids need to be requested. HOWEVER, the modem has not restarted nor been aware of any of this.
  • After an unknown number of QMI.ClientIDCache crashes, the modem client_id pool is exhausted and VintageNetQMI stops working properly. Somethings work but anything requiring a client_id fails. For example, get properties works. deconfigure works. configure sort of works but the state never gets to :internet.
  • So far, the only way to fix this has been to restart the modem which gives a fresh client_id pool.

Maybe this is one case where "let it crash" should be avoided? I am still trying to determine the root cause of the crashes besides:

The below are just various crashes from the logs. Some like the :timeout are probably expected. I am trying to figure out which results in the cache being lost and then the client_id pool exhausted.

022-10-19 18:56:01.722 [error] GenServer :"Elixir.VintageNetQMI.Connection.wwan0" terminating
** (stop) exited in: GenServer.call(:"Elixir.VintageNetQMI.QMI.wwan0.ClientIDCache", {:get_client_id, :"Elixir.VintageNetQMI.QMI.wwan0", 1}, 5000)
    ** (EXIT) time out
    (elixir 1.13.2) lib/gen_server.ex:1030: GenServer.call/3
    (qmi 0.8.4) lib/qmi.ex:75: QMI.call/2
    (vintage_net_qmi 0.3.2) lib/vintage_net_qmi/connecion/configuration.ex:70: anonymous fn/3 in VintageNetQMI.Connection.Configuration.do_run_configurations/3
    (elixir 1.13.2) lib/enum.ex:4475: Enumerable.List.reduce/3
    (elixir 1.13.2) lib/enum.ex:2442: Enum.reduce_while/3
    (vintage_net_qmi 0.3.2) lib/vintage_net_qmi/connecion/configuration.ex:62: VintageNetQMI.Connection.Configuration.run_configurations/2
    (vintage_net_qmi 0.3.2) lib/vintage_net_qmi/connection.ex:72: VintageNetQMI.Connection.try_to_configure_modem/1
    (vintage_net_qmi 0.3.2) lib/vintage_net_qmi/connection.ex:127: VintageNetQMI.Connection.handle_info/2
Last message: :try_to_configure
 module=gen_server function=error_info/8 line=1399 
2022-10-19 18:56:39.927 [warn] QMI.ClientIDCache starting for Elixir.VintageNetQMI.QMI.wwan0
 module=QMI.ClientIDCache function=start_link/1 line=11 

2022-10-19 18:56:40.050 [info] NETWORK connection for interface wwan0 changed from :lan to :disconnected
 module=MeshGateway.MeshPublisher.NetworkHandler function=handle_network_interface_change/2 line=166 

2022-10-19 18:56:40.075 [error] GenServer :"Elixir.VintageNetQMI.QMI.wwan0.Driver" terminating
** (MatchError) no match of right hand side value: {nil, %{1 => {{#PID<0.2701.0>, [:alias | #Reference<0.1726387962.2148073473.84351>]}, %{decode: #Function<0.44348571/1 in QMI.Codec.Control.parse_get_client_id_response>, payload: <<34, 0, 4, 0, 1, 1, 0, 1>>, service_id: 0}, #Reference<0.1726387962.2148007937.84355>}}}
    (qmi 0.8.4) lib/qmi/driver.ex:219: QMI.Driver.fail_transaction_id/3
    (qmi 0.8.4) lib/qmi/driver.ex:215: QMI.Driver.handle_report/2
    (stdlib 4.0.1) gen_server.erl:1120: :gen_server.try_dispatch/4
    (stdlib 4.0.1) gen_server.erl:1197: :gen_server.handle_msg/6
    (stdlib 4.0.1) proc_lib.erl:240: :proc_lib.init_p_do_apply/3
Last message: {:dev_bridge, #Reference<0.1726387962.2148007937.84311>, :read, <<1, 26, 0, 128, 1, 24, 2, 9, 1, 32, 0, 14, 0, 2, 4, 0, 1, 0, 26, 0, 1, 4, 0, 0, 0, 0, 0>>}
 module=gen_server function=error_info/8 line=1399 

2022-10-19 18:56:40.090 [error] GenServer :"Elixir.VintageNetQMI.QMI.wwan0.ClientIDCache" terminating
** (stop) exited in: GenServer.call(:"Elixir.VintageNetQMI.QMI.wwan0.Driver", {:call, 0, %{decode: #Function<0.44348571/1 in QMI.Codec.Control.parse_get_client_id_response>, payload: <<34, 0, 4, 0, 1, 1, 0, 1>>, service_id: 0}, 4000}, 8000)
    ** (EXIT) an exception was raised:
        ** (MatchError) no match of right hand side value: {nil, %{1 => {{#PID<0.2701.0>, [:alias | #Reference<0.1726387962.2148073473.84351>]}, %{decode: #Function<0.44348571/1 in QMI.Codec.Control.parse_get_client_id_response>, payload: <<34, 0, 4, 0, 1, 1, 0, 1>>, service_id: 0}, #Reference<0.1726387962.2148007937.84355>}}}
            (qmi 0.8.4) lib/qmi/driver.ex:219: QMI.Driver.fail_transaction_id/3
            (qmi 0.8.4) lib/qmi/driver.ex:215: QMI.Driver.handle_report/2
            (stdlib 4.0.1) gen_server.erl:1120: :gen_server.try_dispatch/4
            (stdlib 4.0.1) gen_server.erl:1197: :gen_server.handle_msg/6
            (stdlib 4.0.1) proc_lib.erl:240: :proc_lib.init_p_do_apply/3
    (elixir 1.13.2) lib/gen_server.ex:1030: GenServer.call/3
    (qmi 0.8.4) lib/qmi/client_id_cache.ex:35: QMI.ClientIDCache.handle_call/3
    (stdlib 4.0.1) gen_server.erl:1146: :gen_server.try_handle_call/4
    (stdlib 4.0.1) gen_server.erl:1175: :gen_server.handle_msg/6
    (stdlib 4.0.1) proc_lib.erl:240: :proc_lib.init_p_do_apply/3
Last message (from :"Elixir.VintageNetQMI.Connection.wwan0"): {:get_client_id, :"Elixir.VintageNetQMI.QMI.wwan0", 1}
 module=gen_server function=error_info/8 line=1399 

2022-10-19 18:56:40.114 [warn] QMI.ClientIDCache starting for Elixir.VintageNetQMI.QMI.wwan0
 module=QMI.ClientIDCache function=start_link/1 line=11 

@fhunleth
Copy link
Member

Interesting. Thank you for posting the update. I'd like to offer some wisdom, but I have none at the moment. I do appreciate the update, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants