Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dual controller] Work to be done #560

Open
10 of 11 tasks
JeffreyDevloo opened this issue Feb 5, 2018 · 8 comments
Open
10 of 11 tasks

[Dual controller] Work to be done #560

JeffreyDevloo opened this issue Feb 5, 2018 · 8 comments
Assignees
Labels
Milestone

Comments

@JeffreyDevloo
Copy link
Contributor

JeffreyDevloo commented Feb 5, 2018

Feature description

The Dual Controller would need two or more AlbaNodes working together in a master - slave fashion. The goal is to failover to one if the other would become unresponsive or enough OSDs are reporting failures

First iteration

This will only contain happy path changes
The changes to be made:

Second iteration

Complete reinstall of a Dual Controller AlbaNode should be possible

  • Sync all information from one node of the Controller to the other
  • Prevent that an ASD manager is part of 2 Dual controllers

Every step should be registered under it's own ticket. These steps will both include this repo and the alba-asdmanager repo

@wimpers
Copy link

wimpers commented Feb 6, 2018

Change the ASD Manager to work actively and passively

The active passive is on ASD level, not on ASD manager level.

@JeffreyDevloo
Copy link
Contributor Author

My intention was to leverage change all calls to support active/passive routes. That is what I meant with that :)

@wimpers wimpers added this to the K milestone Feb 6, 2018
@wimpers
Copy link

wimpers commented Feb 6, 2018

Implement new calls to start everything related to the passive OSDs on a AlbaNode
Mount
Deploy service

Can we not already have the service deployed but stopped? Any reason why you want to deploy it only now?

@JeffreyDevloo
Copy link
Contributor Author

Deploying the service when the filesystem is not mounted might cause some issues. We can already generate the servicefile though

@dejonghb
Copy link
Member

dejonghb commented Feb 6, 2018

IMO fstab should be prepared and entries should have the "noauto" option to prevent automounting at boot.

The mounting itself should be done when starting the service (in that service file or via dependencies in systemd). When mounting fails, the service cannot start.

When stopping the service, the device should be umounted again.

Starting/stopping the service should only be done after a check if it's safe to do so.

@JeffreyDevloo
Copy link
Contributor Author

Estimations

First iteration

This will only contain happy path changes
The changes to be made:

  • Implement a class which can combine AlbaNodes (AlbaNodeCluster) - 1 day
  • Change all AlbaNode related code to work with the potential AlbaNodeCluster relation - 3 days
  • Change the ASD Manager to work actively and passively - 3 days
  • Display these changes in the GUI - 3 days
  • Implement new calls to shutdown everything related to the OSDs on a specific AlbaNode (through IPMI in the first implementation) - 3 days
    • Stop services
    • Unmount
  • Implement new calls to start everything related to the passive OSDs on a AlbaNode - 3 days
    • Mount
    • Deploy service
    • Update OSD within Alba
  • Implement a heartbeat which fetches all OSD states (in a locked state to avoid choking Alba) from Alba and when a certain threshold is reached for a specific cluster, create a persistent lock and failover all active drives to the passive ones. (The reason behind the heartbeat is to avoid waiting too long on the ovs-workers to process a task) - 1 day
  • Implement an option to move OSDs back to another AlbaNode (switching master-slave roles for that OSD) - 2 days

Second iteration

Complete reinstall of a Dual Controller AlbaNode should be possible

  • Sync all information from one node of the Controller to the other - 1 week

@JeffreyDevloo
Copy link
Contributor Author

Waiting for testing hardware

@JeffreyDevloo JeffreyDevloo added the state_hold Ticket is put on hold label May 4, 2018
@wimpers wimpers modified the milestones: M, Roadmap Sep 14, 2018
@JeffreyDevloo JeffreyDevloo modified the milestones: Backlog, Icebox Nov 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants