Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploy Primary<->Secondary using LiteFS for replication #6

Open
MarkIannucci opened this issue Jan 29, 2023 · 5 comments
Open

Deploy Primary<->Secondary using LiteFS for replication #6

MarkIannucci opened this issue Jan 29, 2023 · 5 comments

Comments

@MarkIannucci
Copy link
Owner

The current deployment will be hard down and incur data loss when Fly loses the host that our app is running on. They'll restore our persistent volume from the most recent snapshot.

Fly.io has recently released LiteFS to deliver distributed sqlite. LiteFS has an automatic failover functionality provided by consul or it can use static primary and secondaries. The application needs to be aware of LiteFS's primary file functionality in order to forward writes to the correct location so they get processed correctly. Modifying headscale to do this is beyond the scope of this effort, especially since they plan to release write forwarding functionality which won't need application modification soon.

We won't use the automated failover functionality because I can't figure out how to force a connection to the primary machine given current headscale + tailscale code. Instead, we will use two different apps deployed in different regions and mark one as primary and the other secondary and see if we can connect the volumes using the private network functionality. If that works, we will then create a callable workflow which we can use to trigger a manual failover between the two. We will use some external DNS to route to the static primary app.

@MarkIannucci
Copy link
Owner Author

Spent a bit more time thinking about this problem and realized that I could functionally force connection to the primary container by putting the secondaries across the world. That sounds like fun and it will be less work because we don't have to write the manual failover code, so I'm going to do that.

@MarkIannucci
Copy link
Owner Author

I couldn't figure out how to get the app to consistently start in one region which was implicitly required. See #13 , #14.

Instead we will use consul but deploy with the lease candidate functionality.

@MarkIannucci
Copy link
Owner Author

Reading through the issues, it looks like static is the way to go currently. I will proceed in that direction.

@MarkIannucci
Copy link
Owner Author

We will use a volume snapshot for the secondary nodes, an environment variable to define the primary environment, and imperative commands to scale up after initial deploy. Deploys will have outages because there will only be one node in the candidate region which will be primary.

@MarkIannucci
Copy link
Owner Author

Lots of progress in #19 . Need to write a readme with instructions on how to deploy using this code as well as confirm that my theory on how to failover works (swap the values in the primary and secondary regions).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant