Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synthetic data generation #55

Open
namoray opened this issue Oct 11, 2024 · 1 comment · May be fixed by #81
Open

Synthetic data generation #55

namoray opened this issue Oct 11, 2024 · 1 comment · May be fixed by #81

Comments

@namoray
Copy link
Collaborator

namoray commented Oct 11, 2024

Instead of synthetic generation of payloads being on the control nodes on a fixed schedule, the synthetic generation of the payloads should be dynamic for each query. This means all synthetic queries can be unique.

Synthetic generations of payloads will need to be really quick, else we might not be able to keep up. Imagine there are 10 image-to-image requests per second you must send off. We naturally can't get a new image for each payload, or do any sort of heavy processing else we wont be able to keep up with demand. So some components might need to be cached, but every query should be unique.

So, instead of pulling a synthetic query payload from Redis, we could generate the payload on the query node directly, for example. Something to think about there - what if the dataset needed is a few hundred MB, will this cause issues?

See: https://sn19.ai/pdfs/sn19-deck.pdf for info of the mechanism

The docs should guide you very well. For testing, you can probably get away with doing this without creating a wallet, since you don't need to interact with miners. You might need to comment various bits of the flow out to get it to work in that case, but is probably more effecient.

@tripathiarpan20
Copy link
Collaborator

tripathiarpan20 commented Oct 21, 2024

Adding a bit more context:

The control node executes continuously_fetch_synthetic_data_for_tasks to update the synthetic data payload for all tasks in separate Redis keys.
Meanwhile in query node, message.query_payload = await putils.get_synthetic_payload(config.redis_db, task) fetches the synthetic data payload from the associated task key in Redis.
The same Redis system is shared over control node and query node through docker network.

Although the above guarantees that synthetic data is readily fetchable by query node (as control node periodically refreshes the synthetic data key in Redis), it leads to Redis db operations being slowed down.

The main idea is to readily generate the synthetic data on query node and disposing the continuously_fetch_synthetic_data_for_tasks event loop in control node, but the synthetic data generation needs to be quick, in production serving a least 6-8 synthetic requests per second (this is the total over all tasks)

@besimray besimray linked a pull request Oct 21, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants