Synthetic data generates on demand rather than on schedule #63

yaroslavyaroslav · 2024-10-13T10:44:04Z

PR Overview

TLDR

This PR planned to provide on demand synthetic data calculation, where on demands stands for in addition to request received.

Implementation details

This PR consists from two parts:

The main one, like the logic of heavy computational tasks caching and the modified logic of the scheduling from schedule loop to expanding data of any given task received.
And the work around one, that is required to make this behavior feasible in isolated environment (with nodes, miners and all the stuff disabled).

Main part of PR

Main part consists of the following changes:

refresh_sythetic_data.py has been (well it actually wasn't completely but supposed to be) rewritten to store nothing but a cache warmup function.
schedule_synthetic_queries.py is the main place where the most changes have been made
- _fetch_tasks_in_chunks function reads a chunk of new organic tasks to proceed, if any.
- _enhance_task function enhances the initial raw task with the unique data. Currently implemented just for chat and text-to-image tasks, because of it's nothing more but a switch by task kind it can be expanded effortlessly.
- _push_to_query_node function pushes enhanced tasks to a query_node
- warmup_function function implements logic of a service warmup in sake of code intents clarification
- schedule_synthetics_until_done function has been rewritten to reflect the new approach — enhance any given tasks amount with synthetic data
execution_cycle.py — warmup_function call added on a service start.

Work around part of PR

cli.py cli tool that allows developer to manually send arbitrary amount of tasks of any kind to the control_node to check the whole -> contol_node -> query_node pipeline.
redis_constants.py — CONTROL_NODE_QUEUE_KEY and CONTROL_TASK_CHUNK_SIZE constants have been added to be used in cli.py and in the queue that is modified schedule_synthetics_until_done function reads raw tasks from.

Closes: #55

Redis constants enhanced execution_cycle updated to enable warmup stage.

… convenience.

namoray

it's really hard to review this PR because a massive amount of code has been refactored which (a) did not need to be, and (b) is not relevant to the changes we're tryng to make here

I'm also not sure what the PR description means, or how this solution solves the problem. I believe it changes functionality of the whole system too much

namoray · 2024-10-14T11:40:59Z

validator/control_node/src/cycle/execute_cycle.py

@@ -43,25 +44,25 @@ async def _post_vali_stats(config: Config):
    )


-async def get_nodes_and_contenders(config: Config) -> list[Contender] | None:
+async def get_nodes_and_contenders(config: Config) -> List[Contender] | None:


Why refactor typehints? Prefer the lowercase

namoray · 2024-10-14T11:41:08Z

validator/control_node/src/cycle/execute_cycle.py

    logger.info("Starting cycle...")
    if config.refresh_nodes:
-        logger.info("First refreshing metagraph and storing the nodes")


Why is all this refactored?

namoray · 2024-10-14T11:41:22Z

validator/control_node/src/cycle/execute_cycle.py

@@ -70,29 +71,25 @@ async def main(config: Config) -> None:
    time_to_sleep_if_no_contenders = 20
    contenders = await get_nodes_and_contenders(config)

-    # date_to_delete = datetime(2024, 10, 13, 30)


Why is this refactored?

namoray · 2024-10-14T11:41:35Z

validator/control_node/src/cycle/execute_cycle.py

-    #     await delete_task_data_older_than_date(connection, date_to_delete)
-    #     await delete_contender_history_older_than(connection, date_to_delete)
-    #     await delete_reward_data_older_than(connection, date_to_delete)
+    await warmup_function(config=config)


'warmup_function' is a bad name -> what is it doing?

namoray · 2024-10-14T11:42:07Z

validator/control_node/src/cycle/schedule_synthetic_queries.py

    remaining_requests: int

-    def __lt__(self, other: "TaskScheduleInfo"):


Why is this refactoreD?

namoray · 2024-10-14T11:44:05Z

validator/control_node/src/cycle/schedule_synthetic_queries.py

            break

-        if i % 100 == 0:


Why has all this logic gone? We've completely changed how we schedule synthetic queries now right? All this logic must remain

yaroslavyaroslav added 5 commits October 12, 2024 16:13

Schedule modified

f75ffa0

Refresher now just warms up caches at start

a952ae1

Runner updated to reflect scheduling changes

3ac5348

Cli tool

79dd283

Scheduling tasks pipeline enhanced

8de421a

Redis constants enhanced execution_cycle updated to enable warmup stage.

yaroslavyaroslav marked this pull request as ready for review October 13, 2024 18:53

Cli to manually schedule synthetic tasks added in sake of development…

2344de6

… convenience.

namoray requested changes Oct 14, 2024

View reviewed changes

yaroslavyaroslav closed this Oct 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synthetic data generates on demand rather than on schedule #63

Synthetic data generates on demand rather than on schedule #63

yaroslavyaroslav commented Oct 13, 2024 •

edited

Loading

namoray left a comment

namoray Oct 14, 2024

namoray Oct 14, 2024

namoray Oct 14, 2024

namoray Oct 14, 2024

namoray Oct 14, 2024

namoray Oct 14, 2024

		remaining_requests: int

		def __lt__(self, other: "TaskScheduleInfo"):

Synthetic data generates on demand rather than on schedule #63

Synthetic data generates on demand rather than on schedule #63

Conversation

yaroslavyaroslav commented Oct 13, 2024 • edited Loading

PR Overview

TLDR

Implementation details

Main part of PR

Work around part of PR

namoray left a comment

Choose a reason for hiding this comment

namoray Oct 14, 2024

Choose a reason for hiding this comment

namoray Oct 14, 2024

Choose a reason for hiding this comment

namoray Oct 14, 2024

Choose a reason for hiding this comment

namoray Oct 14, 2024

Choose a reason for hiding this comment

namoray Oct 14, 2024

Choose a reason for hiding this comment

namoray Oct 14, 2024

Choose a reason for hiding this comment

yaroslavyaroslav commented Oct 13, 2024 •

edited

Loading