Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gateway scheduler refactor prototype #1084

Closed
wants to merge 1 commit into from

Conversation

akihikokuroda
Copy link
Collaborator

Summary

Prototype for #1033

Recreate on top of current main branch.

Details and comments

The changes are:

  1. wrap the entry point sent to Ray. it sends a notification to the gateway when the entry point completes execution. It eliminates the polling of the ray job status.
  2. put a signal listener for Job db changes. It can check if a Job to be scheduled and it some resources to be cleaned up.

@akihikokuroda
Copy link
Collaborator Author

Should we go this direction or is there something better way? @IceKhan13 @Tansito @psschwei WDYT?

@Tansito
Copy link
Member

Tansito commented Nov 13, 2023

Thank you @akihikokuroda ! My idea with the proposal is to create a real queue and separate it from our database / application. I will try to have a diagram along this week to discuss it with you 🙏

@psschwei
Copy link
Collaborator

real queue

Do you mean setting up some application like RabbitMQ / Kafka ?

@Tansito
Copy link
Member

Tansito commented Nov 14, 2023

Do you mean setting up some application like RabbitMQ / Kafka ?

If I can be honest I don't have it clear yet but definitely I think Kafka would be too much, probably. I was thinking in something nearer to Celery maybe. My core idea would be what I commented in the discussion basically, move us to a event/task/message pattern and reduce the pressure in the database to manage the queue to use RabbitMQ or Redis. It's a good effort but needed for the next year for production. That's the mean reason because I want to start conversations around this.

Just to clarify, I'm totally against the overengineer. I would like to reuse most of the logic that we are using right now if it's possible and find the simplest solution but with that idea in mind.

@IceKhan13
Copy link
Member

IceKhan13 commented Nov 14, 2023

I would like to challenge you guys with statement that we do not need "real" queue. Reason for this:

  • we do not have bottleneck on enqueuing jobs
  • it will be much harder to implement fair-share with "real" queues
  • keeping model based approach makes it easier to work with entires of a "queue"

What we need is event based handling of running jobs + better logs handling. Currently we are doing polls to check state and logs.

Basically we need to avoid polling running jobs for status and logs (if real queue can help somehow with that, then I'm going to be happy). Polling DB to schedule new jobs is ok and not a problem.

I hope I'm wrong 😄

@Tansito
Copy link
Member

Tansito commented Nov 14, 2023

I hope I'm wrong 😄

Nope, I think you are totally right. Probably my first error was to use the word "queue" here. But yeah, what I want to pursue is to change our model to something like event/task/message. I don't have it decided yet which one could be the best approach, Celery, gRPC, Q2...

@akihikokuroda
Copy link
Collaborator Author

I feel the real Q is too much at least for now. This prototype is to remove the necessity of polling loop.

@akihikokuroda akihikokuroda marked this pull request as ready for review November 16, 2023 19:12
@akihikokuroda akihikokuroda marked this pull request as draft November 16, 2023 20:52
@akihikokuroda akihikokuroda marked this pull request as ready for review November 23, 2023 18:49
@psschwei
Copy link
Collaborator

psschwei commented May 3, 2024

@akihikokuroda @Tansito is this one we're still looking to land?

@Tansito
Copy link
Member

Tansito commented May 3, 2024

Definitely not a priority right now with all the things that we have 😅 , It can be closed by now.

@akihikokuroda
Copy link
Collaborator Author

I close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants