-
-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zwave binding does not come back online after hard reset #1871
Comments
Please post the full logs here as the logs you reference aren't super helpful (pasted below for reference). However it sounds like there's an issue with the udev links - the binding doesn't really have any control over these - it just tried to open the port that is defined - if the port isn't available, then it won't be able to open it, and this seems to be the case from your explanation above.
|
I did a fresh openhabian install for a simple reproduction of what I'm seeing. `BRING UP AND INITIAL INSTALL OF ZWAVE BINDING - ONLINE @ 16:18:10.547 2023-05-24 16:07:09.035 [INFO ] [.core.model.lsp.internal.ModelServer] - Started Language Server Protocol (LSP) service on port 5007 |
Hard reset and zwave does not come back `HARD RESET @ 16:20 - Initializing Zwave Serial Controller @ 16:22:48.792 but never comes ONLINE 2023-05-24 16:22:34.003 [INFO ] [.core.model.lsp.internal.ModelServer] - Started Language Server Protocol (LSP) service on port 5007 |
tried adding a udev rule ADDED A NEW UDEV RULE SUBSYSTEM=="tty", ATTRS{idVendor}=="0658", ATTRS{idProduct}=="0200", SYMLINK+="zwave" RELOADED RULES sudo udevadm control --reload-rules NEW DEVICE NOT IN /dev SYSLOG SHOWS May 24 16:27:53 openhabian karaf[611]: RXTX fhs_lock() Error: opening lock file: /var/lock/LCK..ttyACM0: File exists. It is NOT mine |
Do you know where the lock files come from or if they're meant to clean up on a reboot? I rebooted and got this log file `REBOOTED at 16:32 Initializing ZWAVE @ 16:33:30.457 ==> /var/log/openhab/openhab.log <== |
Let me know if there are any other test you'd like me to run. Install is on an RPi4 with the latest firmware. |
FWIW I tried deleting /var/lock/LCK…ttyACM0 and the binding can restart. Apparently there's a script floating around to delete the LCK files on system start. Would it make sense for the zwave binding to delete lock files on initialization? I have no idea if the binding can see the absolute path or if there's some other risk from ignoring lock file status. https://community.openhab.org/t/openhabian-rpi4-zwave-gen5-falls-offline-on-restart/146694/41 |
The binding shouldn't be deleting lock files - they are actually there for a reason. I thought there had been a change to the driver to resolve the lock file issue, but in any case I think we can close this as it's not a binding issue. |
Is there any valid reason for lock files older than current system uptime? I wonder if a middle ground is to delete lock files older than uptime? |
Again though - the binding shouldn't be trying to manage lock files. As I said earlier, I thought that there had been a fix to the lock file issue in the latest drivers, but this is in any case not something that bindings should be trying to resolve as COM ports are implemented differently for different environments. |
totally agree the binding shouldn't have to do this. But we can't seem to get anyone's attention to resolve lock files in the drivers. On the OH forum I'm getting pushback that recovering from a hard reset is a non-goal for the platform. But it's kind of hard to guarantee hard resets never happen IRL. I'm running 3.4.4 which I think is up-to-date. |
I agree that the system should start properly from a hard reset, but also don't want to venture into the messy area of COM port implementations in the binding. The binding doesn't know how a COM port is implemented - it's abstracted away by another OH serial port layer. So the binding can't know if there are lock files (windows doesn't use them at all) or where they are stored (different drivers put them in different places) so I don't think there's anything that can be done to resolve this in the binding without it being a big mess. |
Was afraid there might be some problem with too much abstraction. Do you have any ability to poke the serial port maintainers? I'm implementing a workaround with a shell script in the init.d directory, but that feels like an even bigger hack. I also think this issue is worth documenting in the binding because I'm neither the first nor last person who will trip over failure to reset |
And before I forget: thanks for the work you're doing on the binding! |
The intermediate layer is part of the OH core, but that's basically just a middle layer to allow the core to change the serial driver without impacting on the binding, so it just defers to another library. I don't mind if you want to add something to the doc of the binding. The other option is to add it to OH docs since ultimately this issue will impact all serial port users and possibly the best thing is to document it in the main OH docs, and then refer to that in the binding docs. https://www.openhab.org/docs/administration/serial.html#serial-port-configuration |
Is there a place a non-developer can open an issue with the serial core? I feel like I'm a little out of my depth since the symptom I observe relates to operation of the zwave binding and I don't want the core to just bounce it back to me as "User error". I'm also hesitant to add something to the serial OH docs since I've only observed the lockup problem in this one place and don't have the bandwidth right now to search for broader anomalous serial behavior. Unrelated but I had other issues with the serial binding. I couldn't pass control characters like "\n\r" and couldn't get any response on the OH forum, so I just gave up on the serial binding. |
Sure - you can open an issue here -: https://github.com/wborn/openhab-core/issues You don't need to be a developer - it's the same as the issue you opened here
I don't see why it would be bounced as user error. In any case, as discussed above, I don't think the binding has any way to manage the lock files, so discussing it in the OH core, where teh serial libraries are provided, seems to me the best place. Of course, they might also point you further down into the actual low level driver - I don't know.
Ok, that's fine. I was just suggesting adding your workaround regarding lock files to the docs. |
No response from the OH serial core group. Do you have any friends over there? |
Set up the binding on OH 3.4 on a Pi4 with an Aeotec Gen5+ zwave stick.
Binding comes up and shows status ONLINE using either device /dev/ttyACM0 or udev symlink /dev/zwave. Expected behavior is that the binding reverts on reboot to its configured operation.
Problem: on a hard reset of the Pi, the binding does not come back online. Looked at the debug messages and the code, and it seems like the binding cannot reopen the device.
If I restarted on /dev/ttyACM0 and change the device to /dev/zwave then it restarts successfully.
However the failure for the binding to connect survives through OH restart and Pi reboot.
If I add a new udev symlink then the binding can connect to the device under the new name.
I don't know whether this is a problem with the binding, the OH serial library or the Java serial library, but it seems like a lock file is getting stuck and the binding could probably do something to clear out a stale lock.
For what it's worth, I have a UPS on the Pi, but as you know it's impossible to guarantee a computer will never hang or crash. Forum response is mostly "Make sure your computer never crashes"
Debug traces are posted on this thread:
https://community.openhab.org/t/openhabian-rpi4-zwave-gen5-falls-offline-on-restart/146694/17
The text was updated successfully, but these errors were encountered: