-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prototype of the remote boefje runner #3223
Conversation
No idea why this suddenly gave an issue, might have had to do with the system not having enough space
When rescheduling a task the task gets created from the p_item's attributes. This made an issue where rescheduled tasks weren't properly marked to be ran remote
TODO: remove debug logs
def pop_non_remote_item(self, queue: str) -> QueuePrioritizedItem | None: | ||
non_remote_filter = QueuePopModel(filters=[Filter(column="remote", operator="==", value=False)]) | ||
response = self._session.post(f"/queues/{queue}/pop", json=non_remote_filter.model_dump()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its better to keep the pop_item name the same, but configure the self._session (eg, the schedulerclient) to have a different url on which it can find the scheduler for remote runners.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of creating a new url or changing the method's name, I have thought of adding a new parameter to pop_item
that would change the query. The change would look like something like this:
def pop_non_remote_item(self, queue: str) -> QueuePrioritizedItem | None:
non_remote_filter = QueuePopModel(filters=[Filter(column="remote", operator="==", value=False)])
response = self._session.post(f"/queues/{queue}/pop", json=non_remote_filter.model_dump())
=>
def pop_item(self, queue: str, remote: bool = False) -> QueuePrioritizedItem | None:
non_remote_filter = QueuePopModel(filters=[Filter(column="remote", operator="==", value=remote)])
response = self._session.post(f"/queues/{queue}/pop", json=non_remote_filter.model_dump())
(added a new parameter used for the filter and changed the name back)
Or do you think it'd be better to add a new URLs to the scheduler which would only pop non-remote tasks and remote tasks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its the other way round.
The Scheduler is located on one location. and has one url (possibly two if you where to use a local url and an external url).
It's the boefjes runners that are hosted somewhere else. They need to contact the scheduler for Jobs. to do so, they need to know the url for the scheduler (eg, remote accessible, or local url).
With that request for jobs they need to send any limitations they themselves know about. Eg, Can they reach network X, can they reach internet, Can they perform ipv6 lookups etc.
Which limitations they have, and where the scheduler can be found needs to be configurable per Boefje Runner in a config file they can read locally (or via Env vars)
The scheduler then returns this filtered list of jobs, and the boefje runner starts processing them.
Once done, the boefje needs to return it's raw files to Bytes. For that a url/path to bytes also needs to be present, much in the same way the url/path to the scheduler needs to be present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Scheduler is located on one location. and has one url (possibly two if you where to use a local url and an external url).
I don't understand what you mean with the scheduler potentially having 2 URLs.
Do you mean with "local url" and "external url" for example http://localhost:8004
and https://www.example.com/scheduler
?
With that request for jobs they need to send any limitations they themselves know about. Eg, Can they reach network X, can they reach internet, Can they perform ipv6 lookups etc.
Would the information about those jobs all be inside the same tasks-queue that the current boefje-runner uses?
(With "boefjes runner" you mean the BoefjeHandler
I assume).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, for some the scheduler can be found on localhost, or on a local IP, for others it might be located on https://somesite.com/scheduler A boefje runner needs one to contact the scheduler for jobs, and one to send raw files to bytes once the jobs are done.
The task-queue exists in the scheduler, the boefje runner just picks up jobs from the scheduler and executes them locally. A boefje runner does not have more than one job active per thread at the moment, so no queue exists there.
Cleaned the code for unneeded log statements. Removed unused remote-runner ( which is implemented elsewhere.
Since the boefje `remote-scanner` never runs locally. This image doesnt have to be created.
@@ -20,6 +20,7 @@ class Plugin(BaseModel): | |||
environment_keys: list[str] = Field(default_factory=list) | |||
related: list[str] | None = None | |||
enabled: bool = False | |||
remote: bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not the boefje that is 'remote', its the job in a specific context (eg, on a given IP, or connected to a specific network) that needs/can be ran remotely.
Jobs for Nmap-tcp that are scheduled for an IP in a network|Soufyan-home should not be returned to the job-runner who's limitations only mention Network|Internet, they should be returned to the boefje-runner that asks for jobs with the filter: network|Soufyan-home
N.B. This scoping on networks still needs to be build in the scheduler.
A simpler limitation/job filter could be the following:
The Nmap-TCP boefje can create two types of jobs for Internet connected IP's, one for ipv4 and one for ipv6 input objects.
To be able to succesfully run all Nmap-TCP jobs, the boefje runner needs to be able to access ipv4 and ipv6 'internet'.
I can see a way forward where there's two specific NMAP-TCP boefjes configured, one for each protocol. The Boefje manifest then lists the following needed Traits: ipv4-connectivity, ipv6-connectivity.
The boefje-runner can then from its config file read that it can only reach ipv4 networks, it can reach the network Internet and produces a job-query for the scheduler to receive only jobs that have the ipv4-connectivity trait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not the boefje that is 'remote', its the job in a specific context (eg, on a given IP, or connected to a specific network) that needs/can be ran remotely.
This has been implemented this way because of the fact that tasks currently created with information from the boefje. (so my idea was to create boefjes that state that all tasks created for this type of boefje will have to be ran remotely).
they should be returned to the boefje-runner that asks for jobs with the filter: network|Soufyan-home
Do you have an idea when these tasks should be created? Would the user manually have to create these tasks and mention in what network they should run?
To be able to succesfully run all Nmap-TCP jobs, the boefje runner needs to be able to access ipv4 and ipv6 'internet'.
I can see a way forward where there's two specific NMAP-TCP boefjes configured, one for each protocol.
Do you want OpenKAT give the option to run the same boefje multiple times (with each their own settings)?
The boefje-runner can then from its config file read that it can only reach ipv4 networks, it can reach the network Internet and produces a job-query for the scheduler to receive only jobs that have the ipv4-connectivity trait.
I don't understand what you mean with a boefje-runner only being able to reach ipv4 networks. Shouldn't the boefje decide what kind of IPAddress
it can take in?
Right now I have only focused on implementing a "remote" boefje that can request a job whenever that boefje is ready from the outside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not the boefje that is 'remote', its the job in a specific context (eg, on a given IP, or connected to a specific network) that needs/can be ran remotely.
This has been implemented this way because of the fact that tasks currently created with information from the boefje. (so my idea was to create boefjes that state that all tasks created for this type of boefje will have to be ran remotely).
The way you implemented the Remote trait now, can work, but I would add a more general traits system to the boefje config and allow filtering on any/all of them.
they should be returned to the boefje-runner that asks for jobs with the filter: network|Soufyan-home
Do you have an idea when these tasks should be created? Would the user manually have to create these tasks and mention in what network they should run?
Ideally the scheduler just makes jobs, one for each boefje+input-ooi combination. Where they are executed is not up to the user. The user can add their own jobs, but again they won't be able to steer which runner runs these jobs.
All boefjes runners wlll try to execute jobs from the queue as created by the scheduler, but since not all jobs can be excecuted from everywhere, we need to make sure we have enough logic in the scheduler to give the correct jobs to the correct runner based on what networks/hosts they can reach (scopes), and what traits they have (ipv4, ipv6)
To be able to succesfully run all Nmap-TCP jobs, the boefje runner needs to be able to access ipv4 and ipv6 'internet'.
I can see a way forward where there's two specific NMAP-TCP boefjes configured, one for each protocol.Do you want OpenKAT give the option to run the same boefje multiple times (with each their own settings)?
The boefje-runner can then from its config file read that it can only reach ipv4 networks, it can reach the network Internet and produces a job-query for the scheduler to receive only jobs that have the ipv4-connectivity trait.
I don't understand what you mean with a boefje-runner only being able to reach ipv4 networks. Shouldn't the boefje decide what kind of
IPAddress
it can take in?
A boefje like Nmap can consume all IP-addresses and should list both types as valid input-oois. However, if the machine running that specific boefje does not have ipv6 connectivity, running Nmap on an ipv6 address just spits out errors. The same would be true for trying to download an ipv6 version of a website.
Right now I have only focused on implementing a "remote" boefje that can request a job whenever that boefje is ready from the outside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am starting to understand it better, thank you 👍.
I will try to find a way of adding flags to tasks which can be filtered by the boefje runners.
It is still unclear how these tasks should be created with these flags. Should the nmap boefje specify that it will potentially run with IPv6 addresses or should the nmap boefje see that it's runner is not able to run with IPv6 addresses and skip those OOIs?
Changes
Please describe the essence of this PR in a few sentences. Mention any breaking changes or required configuration steps.
Issue link
Closes #3222
Demo
Please add some proof in the form of screenshots or screen recordings to show (off) new functionality, if there are interesting new features for end-users.
QA notes
Please add some information for QA on how to test the newly created code.
Code Checklist
.env
changes files if required and changed the.env-dist
accordingly.Checklist for code reviewers:
Checklist for QA:
make reset
.feature
orhotfix
works as intended.What works:
What doesn't work:
Bug or feature?: