Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate unique queries in the query node #81

Open
wants to merge 3 commits into
base: production
Choose a base branch
from

Conversation

besimray
Copy link

Closes #55

The original pr was made and reviewed in this fork

What is done?

  • Removed the synthetic generation from the control node
  • Removed periodic synthetic query generation
  • Replaced fetching the synthetic data from Redis in the query node with on-demand generation
  • Each text prompt in a synth query is now unique
  • Images are cached in Redis. Now that cache is accessed a lot, disk read might become a bottleneck. It makes sense to keep them in memory. Since the query nodes are going to scale, we keep the cache in Redis so they can all access it
  • tested both with unit tests and by running the control and query node with postgres and redis. Lots of things were commented out and fake tasks are made in the control node. Verified that the Redis is used as intended by putting log statements in the get_random_image_b64 func.

What could be improved

  • Markov model is trained and loaded for each query node replica. This is due to me not realizing the query node should scale up. The markov model could be trained only once. We could also move the synthetic generation out of the query node so it can be scaled up/down independently. We could then use redis queues to pass the generated data to the query node
  • profiling the current generation would expose the bottlenecks
  • the masks are decoded and encoded each time. We could at least avoid decoding by caching the image size, but also encoding by caching the mask

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Synthetic data generation
1 participant