Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate tablet repair scheduler #4188

Open
asias opened this issue Jan 6, 2025 · 7 comments
Open

Integrate tablet repair scheduler #4188

asias opened this issue Jan 6, 2025 · 7 comments
Assignees

Comments

@asias
Copy link

asias commented Jan 6, 2025

In scylla commit 0d2583600d1325f2064a0d5d776bcf50660a5a42 (Merge 'Add tablet repair scheduler support' from Asias He), the tablet repair scheduler is implemented. A new tablet repair api is added. With this new api, the request of the repairs will be scheduler by scylla core along with other tablet tasks, e.g., migration, rebuild. There is no need for the management tool to schedule repair tasks on different nodes any more. One can run this api to repair tablets of a given table on any of the nodes.

Note: Currently, node and dc selection are not supported yet. We might support it later, there is a PR: scylladb/scylladb#21985.

     {        
         "path":"/storage_service/tablets/repair",
         "operations":[
            {
               "nickname":"repair_tablet",
               "method":"POST",
               "summary":"Repair a tablet",
               "type":"void",
               "produces":[
                  "application/json"
               ],
               "parameters":[
                  {
                     "name":"ks",
                     "description":"Keyspace name to repair",
                     "required":true,
                     "allowMultiple":false,
                     "type":"string",
                     "paramType":"query"
                  },
                  {
                     "name":"table",
                     "description":"Table name to repair",
                     "required":true,
                     "allowMultiple":false,
                     "type":"string",
                     "paramType":"query"
                  },
                  {
                     "name":"tokens",
                     "description":"Tokens owned by the tablets to repair. Multiple tokens can be provided using a comma-separated list. When set to the special word 'all', all tablets will be repaired",
                     "required":true,
                     "allowMultiple":false,
                     "type":"string",
                     "paramType":"query"
                  }
               ]
            }
         ]
      },
@Michal-Leszczynski
Copy link
Collaborator

Michal-Leszczynski commented Jan 15, 2025

@asias Is there any possibility of tracking the repair progress (e.g. Task Manager API)?
Normally, SM tracked repair progress and informed the user about it.
In this case, do we assume that user does not need to know the repair progress and shouldn't take it into consideration, as the repair shouldn't impact cluster performance (and also because all tablets have tombstone_gc_mode=repair)?

EDIT: I just assumed it, but this API is async, right? We don't hang on the call until the repair is finished, right?

@asias
Copy link
Author

asias commented Jan 17, 2025

@asias Is there any possibility of tracking the repair progress (e.g. Task Manager API)? Normally, SM tracked repair progress and informed the user about it. In this case, do we assume that user does not need to know the repair progress and shouldn't take it into consideration, as the repair shouldn't impact cluster performance (and also because all tablets have tombstone_gc_mode=repair)?

EDIT: I just assumed it, but this API is async, right? We don't hang on the call until the repair is finished, right?

Currently, it is a sync API, we will add support for async api too. It is pretty trivial. It is in my queue. It is sync currently because the task manager api to wait for the task is not available when the tablet repair scheduler is merged.

When the api requests to repair multiple tablets, it makes sense to show the progress how many tablets have finished. Currently, task manager does not support it for tablet repair. @Deexie

As the initial integration, I think we can skip the detailed progress report and dcs/hosts selection. SM can still report some progress, e.g., n out of m tables have finished.

@Michal-Leszczynski
Copy link
Collaborator

@asias thanks for the explanation!

As the initial integration, I think we can skip the detailed progress report and dcs/hosts selection. SM can still report some progress, e.g., n out of m tables have finished.

So the suggestion is to repair all of the table's tablets in a single call (via the all keyword)?
SM could still repair tablets in batches in order increase progress granularity, which has two benefits:

  • more accurate progress display for the user
  • more efficient resume in case of user pause or connection break or timeout

But I guess that batching tablets would result in degraded performance (for the same reason why batching tokens was worse than sending them all with ranges_parallel).
Is this correct?

@asias
Copy link
Author

asias commented Jan 17, 2025

@asias thanks for the explanation!

As the initial integration, I think we can skip the detailed progress report and dcs/hosts selection. SM can still report some progress, e.g., n out of m tables have finished.

So the suggestion is to repair all of the table's tablets in a single call (via the all keyword)? SM could still repair tablets in batches in order increase progress granularity, which has two benefits:

* more accurate progress display for the user

* more efficient resume in case of user pause or connection break or timeout

But I guess that batching tablets would result in degraded performance (for the same reason why batching tokens was worse than sending them all with ranges_parallel). Is this correct?

The number of tablets of a given table changes from time to time, i.e., merge/split. It would be hard for manager to track what tablets need to be repaired and batch them.

Yes, if SM batches, it is possible that the cluster is not full utilized to repair even the cluster would repair more tablets.

When a tablet repair api is issued, it will retry itself in case of error when some tablets have error to repair. It is best we could have a pause api for a given request as well for the purpose of efficient resume.

@Michal-Leszczynski
Copy link
Collaborator

Got it, so to summarize, SM will use this API only for repairing tablet tables:

  1. Full tablet table repair:
  • SM won't stop/resume tablet migration during the repair
  • SM will simply issue a single sync call to this API with tokens=all
  1. Partial tablet table repair (e.g. when repair is scheduled with --dc flag):
  • SM will stop/resume tablet migration during repair (as it is done today)
  • SM will issue a single sync call to this API with tokens=<token_list>

@asias
Copy link
Author

asias commented Jan 21, 2025

Got it, so to summarize, SM will use this API only for repairing tablet tables:

  1. Full tablet table repair:
  • SM won't stop/resume tablet migration during the repair

Yes, the new tablet repair api will uses the tablet repair scheduler which integrates well with the tablet migrations. So no need to stop tablet migration during repair.

  • SM will simply issue a single sync call to this API with tokens=all

Yes, repair with tokens=all, but we are going to add a async api very soon.

scylladb/scylladb#22418

  1. Partial tablet table repair (e.g. when repair is scheduled with --dc flag):
  • SM will stop/resume tablet migration during repair (as it is done today)
  • SM will issue a single sync call to this API with tokens=<token_list>

We do not support dcs or hosts selection with the tablet repair api.
For now, we need to use the old repair api which does not use the tablet repair scheduler for dc and host selection.

After we have scylladb/scylladb#22417, we can switch the partial repair to use the tablet repair api to select dc and hosts.

Make sense to you?

@asias
Copy link
Author

asias commented Jan 21, 2025

For pause and resume a "large" tablet repair request: scylladb/scylladb#22419

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants