Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement][Worker]Introduce detached Task and manage life cycle #16717

Closed
3 tasks done
hanhanzhang opened this issue Oct 21, 2024 · 9 comments
Closed
3 tasks done

[Improvement][Worker]Introduce detached Task and manage life cycle #16717

hanhanzhang opened this issue Oct 21, 2024 · 9 comments
Labels
discussion discussion improvement make more easy to user or prompt friendly Stale

Comments

@hanhanzhang
Copy link

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

We use -d option parameter when deploying Flink tasks using the Dolphin framework. After Flink is submitted to the Yarn cluster, the lifecycle management of tasks is no longer managed by Dolphin. We hope that this Detached Task will also be managed by Dolphin, which is currently implemented internally:

  1. Determine whether the Task is Detached. After the Detached Task is submitted to external system, Worker thread is released, which can improve execution throughput of the worker node and report the task status (RUNNING) to the Master.
  2. Worker starts thread to periodically detect detached task state, and reports detached task state to the Master if task state is not expected.
  3. If worker failover, supervisory right transfer of detached task has not been done yet.

Share this transformation and listen to the community's suggestions for our renovation.

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@hanhanzhang hanhanzhang added improvement make more easy to user or prompt friendly Waiting for reply Waiting for reply labels Oct 21, 2024
@SbloodyS
Copy link
Member

We have no plans to support flink streaming task for now since this is a very huge task.

@SbloodyS SbloodyS added discussion discussion and removed Waiting for reply Waiting for reply labels Oct 21, 2024
@hanhanzhang
Copy link
Author

We have no plans to support flink streaming task for now since this is a very huge task.

Yeah, I understand. But for long running tasks(it doesn't have to be stream task), it is not possible to release the thread after task has been committed, and now submitting task thread will block waiting for Process to finish (timeout capability is currently supported).

@SbloodyS
Copy link
Member

Flink detached task usually used in streaming mode. So I think they are the same.

@hanhanzhang
Copy link
Author

Thanks. We do this because worker thread is occupied, so new tasks cannot be scheduled to the current node (which should conform to the design). However, worker tasks are submitted to yarn, and resource utilization of worker itself is not high.
image

@SbloodyS
Copy link
Member

I understand your purpose. This modification method only applies to your internal, not to the common goals of the community.

If it is to be realized, we should refactor task spi. Adding the life cycle management of task plugin in task spi, and provide it to each task plugin to realize life cycle management by itself. And then implement it in flink task plugin. The fundamental principle of task plugin is no invasion of worker.

@hanhanzhang
Copy link
Author

I agree, we are adding task state detection in task api and releasing Worker's task submit thread after task has been submitted.

@SbloodyS
Copy link
Member

SbloodyS commented Oct 22, 2024

I agree, we are adding task state detection in task api and releasing Worker's task submit thread after task has been submitted.

Are you saying that you want to refactor task api? If so, you should create an DSIP(#14102) issue first, and put full design detail in it for further discussion.

Copy link

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

@github-actions github-actions bot added the Stale label Nov 22, 2024
Copy link

This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion discussion improvement make more easy to user or prompt friendly Stale
Projects
None yet
Development

No branches or pull requests

2 participants