-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add in option for btcpay_backup to backup full lnd state for migration #928
base: master
Are you sure you want to change the base?
Conversation
The deleted line removes the failure to backup lnd database files channel.db and sphinxreplay.db.
Thanks @johnheenan for all the debugging and figuring out. 💚 I think the CLN RPC data is not needed, there are other CLN data volumes backed up and it seems to work. But it seems for LND it either never worked or the graph data is something that is needed since a few versions. |
The problem with including the graph data is, that this ends up being rather large. For instance I have a node where — even though LND compacts the data — just this folder alone is > 20GB:
This makes it very unpractical for backups, especially if you are doing daily snapshots. I haven't read all of the debugging conversations, but we'd need to ensure what exactly was the problem here and whether or not this is data that needs to get backed up and cannot be replicated via sync. @johnheenan Can you provide us with a TL;DR version of the finding when debuggin this together with @damanic? |
I reached out to Oliver Gugger from Lightning Labs and he confirmed, that the graph data is needed, when restoring the backup on a new device. The whole process is described here: https://github.com/lightningnetwork/lnd/blob/master/docs/safety.md#migrating-a-node-to-a-new-device |
The TL;DR version. An out of date channel.db file makes a lnd node toxic. It is excluded from backup (as is any other files in graph directory). An up to date channels.backup file is useless for node restoration. It is backed up correctly. Its only use is to recover channel funds through co-operative close. I would suggest this as a solution to the large size of the folder issue:
Note just because lightning is disabled does not mean it is not in use. Lightning must be kept disabled to keep the backup of the graph directory (for lnd) non toxic. |
This is specific to migration of a node, not backup of a node where the backup is immediately useless when the node starts up again. For this PR I am not specifcally addressing migration. Currently if somone wants to migrate easily they need to:
I agree just backing up the graph directory as a backup and continuing to use lightning is pointless. At least we know backups cannot currently be used for migrtation and that the essential graph directory is mssing from backups. I don't know if the backup size issue is also shared with CLN and eclair. Maybe the easiest option is to offer a single option to also fully backup whatever lightning node data is in use and to leave the lightning node disabled after this backup. Perhaps also include an option to compact data. I understand compaction takes a long time but the effect is dramatic for lnd, such as 60GB to below 1GB. I tested compressing uncompacted channel.db on au8b with gz. Size wemt down from 145MB to 46MB, still too little when scaled up. |
Perhaps the backup/restore policy for LND on btcpay should be for recovery of funds only and the restore documentation should present steps to recover funds (close channels). Perhaps a tool offered to assist with the process. In most cases this would be the safest option to instruct and advise on rather than than risking toxic channel state on an out of date backup. Then create up a separate btcpay-migrate.sh script with migration documentation that warns of the issues and increased size of backup if using LND. The migrate script would be similar to backup except that it would:
|
instead of copying all of btcpay-backup.sh to btcpay-migrate.sh and just change a few lines we could also consider adding flags so you can run "full LND backup" and another option "do not start containers after backup" it will be non breaking from current backup but add the option to do a full backup. As for CLN, as I said there is afaik no such thing as graph data in CLN and I migrated 2 servers successfully with the current backup script. This seems to be an LND only issue. |
I have just added a second commit to to exclude backing up lnd database files by default. To backup lnd database files the option --lnd has to be included. There is also a help option with -h or --help. As far as I can tell, these are the minimum amount of code changes required to keep everyone happy. This commit is untested as I am not sure if I have done enough. I would appreciate comments as to this regard, particularly if the following lines are suffciient to disable lnd from coming backup just before
I realise that if if it is important to propagate a changed environment value back up for BTCPAYGEN_LIGHTNING for scripting purposes with btcpay_backup.sh then this script should be forced to be sourced, if BTCPAYGEN_LIGHTNING is to be changed by the script. |
…e container start/stop async issues.
Tested version of commit above. Backup tested only, not migration The I am having with these lines below to disable lnd fully before a backup is taken in the btcpay-backup.sh script. The Maybe the easiest solution is to include a sleep? Also the btcpay-setup.sh script does a check with github for updates, which is not appropriate. Is there a more suitable solution to above issues?
|
The latest commit above has fixed the synchronisation problem. The only issue remaining, before further testing, is what is a better high level script command than belwow that does not pull updates?
|
Would it not be simpler to just have a 'migrate' option on the backup script that applies all the considerations when running backup in migration context. Otherwise we may end up with a bunch of flags and confusing documentation as more considerations arise. When the migrate flag is enabled on backup:
When attempting to bring btcpay UP on a system that was brought down for migration
|
Completely agree. From the admin perspective it is a great long term goal that requires a lot of testing and commitment. Why not request they be added in as future enhancements? At the moment there is no commitment to be able to migrate, just to do backups. The requirement is fully on admins to test they work for the server they were taken on and for any other appropriate purposes for which a backup is taken, which you have done. From the perspective of where I am are now at a development level, the only goal is to provide a workable solution for |
Looking at your backup script modification the flag You have to look at the note on the flag to see it is intended for migration.
And the flag should only be used in migration context because you are modifying the install to remove LND and prevent it from starting which is a migration specific step:
IMO it would be better to not alter the install with |
Alternatively a separate flag for lnd, btcpay restart. Eg. |
I will alter the flags to make them clearer. However, the intention is for 'migration purposes', nor for 'migration'. It is backup script, not a migration script. |
Everything restarts except LND if full LND backup is taken. Once LND restarts the backup is useless. What about the following flags? For backup: For restore: The restore should also refuse to restore LND if LND is currently enabled. |
Also what about this addiitonal flag for restore?
With this flag for backup? |
I would really prefer to see both of these in a new separate command |
I think we need the option to include the extra LND data for migration purposes , and the option to not restart BTCPAY services when the backup context is migration. I don't see how anyone would want any of the btcpayservices to restart immediately after a backup when the reason for backup is migration. I think flags should be --include-lnd-graph --no-restart Then in a migration context I would run As you rightly said before, this is not a migration script, it is on the admin to know what they are doing. If we want to protect the user from start up of a migrated install then a migrate script that employs a method to indicate the server was shut down for migration and warns about restart would be an appropriate way to prevent btcpay from being restarted without proper consideration. |
If will put that in but I will still to disable LND. Meaning that if there is a subsequebt restart LND won't be coming up without addiitonal action. |
Changes made as per request. Easiest way to view entire changes is from https://github.com/btcpayserver/btcpayserver-docker/pull/928/files |
Looks already very good 👏 I agree with @damanic that So imho, not restarting all the services is fine. If people restart the server or manually start the docker containers again then we have to assume that they know what they are doing? @dennisreimann would be interested in your opinion on that |
I think if |
If someone If someone is going to migrate lnd, shouldn't they know the consequences of messing it up by restarting without lnd disabled? However, who reads command output and warnings? So which is better?
I don't have strong views either way. Ultimately it is the operators responsibility to know what they are doing. Also why can't we have a simple command to just enable or diable features without affecting files? Re-enabling lnd with |
If someone leaves btcpay down on old server then there are no problems. It is common practice with web migration to restart old servers with a maintenace page showing. The equvalent practice with lnd means potential loss of funds. There isn't even a maintenace page option with BTCPay server. Maybe there should be. If someone is not going to bother reading warnings then I would regard leaving lnd disabled as good practice as it forces them to read something to fix the problem. If someone has read warnings then they know what to expect and so know what to do or what look up. So, in summary, rather than a bad idea, I see disabling lnd as a good idea. But that is just an opinion and I really don't care. Whatever choice is taken it can be changed later on. |
IMO warning and no restart should suffice for the backup script. To add additional protections btcpay could choose not to document the |
…n cause fund loss with a restore
OK, changes made. Dire warnings show up if
Warnings given twice. Warning at end.
|
The deleted line removes the failure to backup lnd database files channel.db and sphinxreplay.db.
Please see lightningnetwork/lnd#9070 for issue and discussion.
I suspect the next line should also be deleted for CLN: