Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Monitoring follower #119

Merged
merged 14 commits into from
Feb 17, 2025
Merged

feat: Monitoring follower #119

merged 14 commits into from
Feb 17, 2025

Conversation

usmanmani1122
Copy link
Contributor

@usmanmani1122 usmanmani1122 commented Jan 8, 2025

Description

This PR adds a new task spwanAcceptance which will run a follower for a chain and can be somewhat controlled by the validator on when to stop. More details in the updated README.

@usmanmani1122 usmanmani1122 self-assigned this Jan 8, 2025
Copy link

@Muneeb147 Muneeb147 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@usmanmani1122
Can you point to the file(s) having actual changes and where loadgen code is removed?

For example one of them is: tasks/testnet.js where usage of makeLoadgenTask is removed.

Also let's have a follow-up issue/task to actually clean this up and remove redundant unused code.

Copy link

@Muneeb147 Muneeb147 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to also expose monitor in package.json scripts just like we have for loadgen and runner?

"runner": "runner/bin/loadgen-runner",
    "loadgen": "agoric deploy loadgen/loop.js"
    "monitor": ...

@usmanmani1122
Copy link
Contributor Author

Do we need to also expose monitor in package.json scripts just like we have for loadgen and runner?

"runner": "runner/bin/loadgen-runner",
    "loadgen": "agoric deploy loadgen/loop.js"
    "monitor": ...

No, since monitor uses yarn version 4.3.5 (opposed to 1.22.xx in root workspace)

@usmanmani1122
Copy link
Contributor Author

@usmanmani1122 Can you point to the file(s) having actual changes and where loadgen code is removed?

For example one of them is: tasks/testnet.js where usage of makeLoadgenTask is removed.

Also let's have a follow-up issue/task to actually clean this up and remove redundant unused code.

Only monitor/src/main.js and monitor/src/tasks/testnet.js have any changes
Rest of the files are copied
Changes outside the monitor folder were made in an attempt to pass the CI but it is still failing and I think the CI tests won't pass without making changes in the CI pipeline
@muhammadahmadasifbhatti can you make the follow up task (or add the link here if already created)?

@muhammadahmadasifbhatti

@usmanmani1122 Can you point to the file(s) having actual changes and where loadgen code is removed?
For example one of them is: tasks/testnet.js where usage of makeLoadgenTask is removed.
Also let's have a follow-up issue/task to actually clean this up and remove redundant unused code.

Only monitor/src/main.js and monitor/src/tasks/testnet.js have any changes Rest of the files are copied Changes outside the monitor folder were made in an attempt to pass the CI but it is still failing and I think the CI tests won't pass without making changes in the CI pipeline @muhammadahmadasifbhatti can you make the follow up task (or add the link here if already created)?

I will create a task and add the link here. Can you please provide me with the JIRA project link?

@usmanmani1122
Copy link
Contributor Author

I will create a task and add the link here. Can you please provide me with the JIRA project link?

Use github issues

@usmanmani1122 usmanmani1122 force-pushed the usman/monitoring-follower branch from f50cdf7 to 3952279 Compare February 3, 2025 09:29
@usmanmani1122 usmanmani1122 reopened this Feb 3, 2025
@Muneeb147
Copy link

@usmanmani1122

This PR adds a monitoring follower (which is just the runner code copied with loadgen code commented out)

^ This is not happening now, right?

@usmanmani1122
Copy link
Contributor Author

@usmanmani1122

This PR adds a monitoring follower (which is just the runner code copied with loadgen code commented out)

^ This is not happening now, right?

Yes

@usmanmani1122 usmanmani1122 marked this pull request as ready for review February 8, 2025 22:14
@usmanmani1122 usmanmani1122 enabled auto-merge (squash) February 12, 2025 05:53
};

if (currentStage === 1)
await writeMessageToMessageFile(stageConsole, 'ready');

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor suggestion:
Can we keep constants of messages?

const MESSAGES = {
READY: 'ready',
EXIT: 'exit code..'
...
}

tasks.push(spawnClient, spawnLoadgen);
}
if (!chainOnly) tasks.push(spawnClient, spawnLoadgen);
else if (integrateAcceptance) tasks.push(spwanAcceptance);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that we should only push spwanAcceptance when we have messageFilePath in env and fail early if not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the flag should contain the path to the message file to avoid having 2 places to check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: not a fan of lack of curly braces here. Does the prettier / lint config even allow this?

@turadg turadg mentioned this pull request Feb 13, 2025
Copy link
Member

@mhofman mhofman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From integrating against a3p, I remember we needed to increase the trust period for state sync. Also I suspect that to support invoking the runner without an agoric install in the root, we'll need the patch I wrote to lookup for binaries in SDK_SRC. Both commits are in my standalone-runner branch

I would also love to see an end-to-end CI run with an updated Agoric/agoric-sdk#10816 (unless there is another agoric-sdk PR now?)

},
});
const messageFilePath = process.env.MESSAGE_FILE_PATH;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would somewhat prefer to pass this path to the command line flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

/**
* @param {RegExp} [regex]
*/
const waitForMessageFromMessageFile = async (regex) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this one to the top where writeMessageToMessageFile is defined, and make it accept a console, Both these functions should actually use a new console:

const { console: acceptanceConsole } = makeConsole('acceptance', out, err);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

tasks.push(spawnClient, spawnLoadgen);
}
if (!chainOnly) tasks.push(spawnClient, spawnLoadgen);
else if (integrateAcceptance) tasks.push(spwanAcceptance);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the flag should contain the path to the message file to avoid having 2 places to check.

tasks.push(spawnClient, spawnLoadgen);
}
if (!chainOnly) tasks.push(spawnClient, spawnLoadgen);
else if (integrateAcceptance) tasks.push(spwanAcceptance);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: not a fan of lack of curly braces here. Does the prettier / lint config even allow this?

logPerfEvent('finish', { stats: runStats });
if (integrateAcceptance)
writeMessageToMessageFile(topConsole, `exit code ${Number(!!error)}`);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm skeptical of using the script itself to write the outcome to the message file. My concern is that if the runner fails for some reason that doesn't go through this, we may get the a3p test stuck.

Now of course if this is an optimistic report and we have a backup one in the launcher, I'm ok with this, but would prefer a comment to that effect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed
I have moved it to the host script which will invoke the runner

/**
* @type {Task}
*
* This task will always run in three stages
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* This task will always run in three stages
* This task should always run in three stages

It doesn't seem to hurt if more stages are run, we just immediately complete the stage (after chain catch up of course)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@usmanmani1122
Copy link
Contributor Author

usmanmani1122 commented Feb 14, 2025

From integrating against a3p, I remember we needed to increase the trust period for state sync. Also I suspect that to support invoking the runner without an agoric install in the root, we'll need the patch I wrote to lookup for binaries in SDK_SRC. Both commits are in my standalone-runner branch

I would also love to see an end-to-end CI run with an updated Agoric/agoric-sdk#10816 (unless there is another agoric-sdk PR now?)

Done
I have triggered a new build with the changes, will add the link here once it passes
A build before the latest changes can be found here:
https://github.com/Agoric/agoric-sdk/actions/runs/13134138797/job/36645647192

@usmanmani1122
Copy link
Contributor Author

usmanmani1122 commented Feb 14, 2025

Comment on lines +171 to +179
const cliHelpers = await import(srcHelpers)
.catch(() => import(libHelpers))
.catch((e) =>
process.env.SDK_SRC
? import(
`${process.env.SDK_SRC}/packages/agoric-cli/${helpersSource}`
)
: Promise.reject(e),
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because SDK_SRC is option someone has to enable, I would expect it to take precedence.

also I would suggest refactoring to,

Suggested change
const cliHelpers = await import(srcHelpers)
.catch(() => import(libHelpers))
.catch((e) =>
process.env.SDK_SRC
? import(
`${process.env.SDK_SRC}/packages/agoric-cli/${helpersSource}`
)
: Promise.reject(e),
);
const cliHelpers = await getCliHelpers();

It's a shame we have to deep import instead of having real exports of that package, but even if we added good exports we'd still need backwards compatibility so it would only add another case to this code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the past I've thought of adding a command to the agoric-cli to return these path. However I don't think it would help in this case since agoric would not be on the PATH. I'm open to suggestion on how to find entrypoint paths in agoric-sdk

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because SDK_SRC is option someone has to enable, I would expect it to take precedence.

Yeah you're right. I originally implemented it as a fallback, but it should take precedence.

Copy link
Member

@mhofman mhofman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits about curly brace style, and the unexplained removal of change to the cometbft config to restore the log_level to default.

Besides that looks good for this side.

Comment on lines +306 to +308
if (regex && !regex.test(fileContent))
console.warn('Ignoring unsupported file content: ', fileContent);
else return fileContent;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: curly braces.

configP2p.persistent_peers = peers.join(',');
configP2p.seeds = seeds.join(',');
configP2p.addr_book_strict = false;
delete config.log_level;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this line removed?

@@ -83,3 +83,123 @@ export interface OrchestratorTasks {
runClient(options: TaskSwingSetOptions): Promise<RunClientResult>;
runLoadgen(options: TaskBaseOptions): Promise<RunLoadgenResult>;
}

/* eslint-disable camelcase */
export type CometBFTConfig = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, where is this coming from?

@usmanmani1122 usmanmani1122 merged commit cbae159 into main Feb 17, 2025
3 checks passed
@usmanmani1122 usmanmani1122 deleted the usman/monitoring-follower branch February 17, 2025 05:29
@mhofman
Copy link
Member

mhofman commented Feb 17, 2025

Shoot this was on automerge and I didn't see it.

@usmanmani1122
Copy link
Contributor Author

Shoot this was on automerge and I didn't see it.

Yeah I also forgot to turn it off. Will address the remaining comments in a separate PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants