-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deploy Primary<->Secondary using LiteFS for replication #6
Comments
Spent a bit more time thinking about this problem and realized that I could functionally force connection to the primary container by putting the secondaries across the world. That sounds like fun and it will be less work because we don't have to write the manual failover code, so I'm going to do that. |
Reading through the issues, it looks like static is the way to go currently. I will proceed in that direction. |
We will use a volume snapshot for the secondary nodes, an environment variable to define the primary environment, and imperative commands to scale up after initial deploy. Deploys will have outages because there will only be one node in the candidate region which will be primary. |
Lots of progress in #19 . Need to write a readme with instructions on how to deploy using this code as well as confirm that my theory on how to failover works (swap the values in the primary and secondary regions). |
The current deployment will be hard down and incur data loss when Fly loses the host that our app is running on. They'll restore our persistent volume from the most recent snapshot.
Fly.io has recently released LiteFS to deliver distributed sqlite. LiteFS has an automatic failover functionality provided by consul or it can use static primary and secondaries. The application needs to be aware of LiteFS's primary file functionality in order to forward writes to the correct location so they get processed correctly. Modifying headscale to do this is beyond the scope of this effort, especially since they plan to release write forwarding functionality which won't need application modification soon.
We won't use the automated failover functionality because I can't figure out how to force a connection to the primary machine given current headscale + tailscale code. Instead, we will use two different apps deployed in different regions and mark one as primary and the other secondary and see if we can connect the volumes using the private network functionality. If that works, we will then create a callable workflow which we can use to trigger a manual failover between the two. We will use some external DNS to route to the static primary app.
The text was updated successfully, but these errors were encountered: