Prototype of the remote boefje runner #3223

Souf149 · 2024-07-10T11:00:40Z

Changes

Please describe the essence of this PR in a few sentences. Mention any breaking changes or required configuration steps.

Issue link

Closes #3222

Demo

Please add some proof in the form of screenshots or screen recordings to show (off) new functionality, if there are interesting new features for end-users.

QA notes

Please add some information for QA on how to test the newly created code.

Code Checklist

All the commits in this PR are properly PGP-signed and verified.
This PR only contains functionality relevant to the issue.
I have written unit tests for the changes or fixes I made.
I have checked the documentation and made changes where necessary.
I have performed a self-review of my code and refactored it to the best of my abilities.

Tickets have been created for newly discovered issues.
For any non-trivial functionality, I have added integration and/or end-to-end tests.
I have informed others of any required .env changes files if required and changed the .env-dist accordingly.
I have included comments in the code to elaborate on what is not self-evident from the code itself, including references to issues and discussions online, or implicit behavior of an interface.

Checklist for code reviewers:

The code does not violate Model-View-Template and our other architectural principles.
The code prioritizes readability over performance where appropriate.
The code does not bypass authentication or security mechanisms.
The code does not introduce any dependency on a library that has not been properly vetted.
The code contains docstrings, comments, and documentation where needed.

Checklist for QA:

I have checked out this branch, and successfully ran a fresh make reset.
I confirmed that there are no unintended functional regressions in this branch:
I have managed to pass the onboarding flow
Objects and Findings are created properly
Tasks are created and completed properly
I confirmed that the PR's advertised feature or hotfix works as intended.
I checked the logs for errors and/or warnings and made issues where necessary

What works:

bullet point + screenshot (if useful) per tested functionality

What doesn't work:

bullet point + screenshot (if useful) per tested functionality

Bug or feature?:

bullet point + screenshot (if useful) if it is unclear whether something is a bug or an intended feature.

No idea why this suddenly gave an issue, might have had to do with the system not having enough space

When rescheduling a task the task gets created from the p_item's attributes. This made an issue where rescheduled tasks weren't properly marked to be ran remote

TODO: remove debug logs

underdarknl · 2024-07-10T11:15:42Z

boefjes/boefjes/clients/scheduler_client.py

+    def pop_non_remote_item(self, queue: str) -> QueuePrioritizedItem | None:
+        non_remote_filter = QueuePopModel(filters=[Filter(column="remote", operator="==", value=False)])
+        response = self._session.post(f"/queues/{queue}/pop", json=non_remote_filter.model_dump())


I think its better to keep the pop_item name the same, but configure the self._session (eg, the schedulerclient) to have a different url on which it can find the scheduler for remote runners.

Instead of creating a new url or changing the method's name, I have thought of adding a new parameter to pop_item that would change the query. The change would look like something like this:

def pop_non_remote_item(self, queue: str) -> QueuePrioritizedItem | None: non_remote_filter = QueuePopModel(filters=[Filter(column="remote", operator="==", value=False)]) response = self._session.post(f"/queues/{queue}/pop", json=non_remote_filter.model_dump())

=>

def pop_item(self, queue: str, remote: bool = False) -> QueuePrioritizedItem | None: non_remote_filter = QueuePopModel(filters=[Filter(column="remote", operator="==", value=remote)]) response = self._session.post(f"/queues/{queue}/pop", json=non_remote_filter.model_dump())

(added a new parameter used for the filter and changed the name back)

Or do you think it'd be better to add a new URLs to the scheduler which would only pop non-remote tasks and remote tasks?

Its the other way round.
The Scheduler is located on one location. and has one url (possibly two if you where to use a local url and an external url).

It's the boefjes runners that are hosted somewhere else. They need to contact the scheduler for Jobs. to do so, they need to know the url for the scheduler (eg, remote accessible, or local url).
With that request for jobs they need to send any limitations they themselves know about. Eg, Can they reach network X, can they reach internet, Can they perform ipv6 lookups etc.
Which limitations they have, and where the scheduler can be found needs to be configurable per Boefje Runner in a config file they can read locally (or via Env vars)

The scheduler then returns this filtered list of jobs, and the boefje runner starts processing them.
Once done, the boefje needs to return it's raw files to Bytes. For that a url/path to bytes also needs to be present, much in the same way the url/path to the scheduler needs to be present.

The Scheduler is located on one location. and has one url (possibly two if you where to use a local url and an external url).

I don't understand what you mean with the scheduler potentially having 2 URLs.
Do you mean with "local url" and "external url" for example http://localhost:8004 and https://www.example.com/scheduler?

With that request for jobs they need to send any limitations they themselves know about. Eg, Can they reach network X, can they reach internet, Can they perform ipv6 lookups etc.

Would the information about those jobs all be inside the same tasks-queue that the current boefje-runner uses?
(With "boefjes runner" you mean the BoefjeHandler I assume).

Yup, for some the scheduler can be found on localhost, or on a local IP, for others it might be located on https://somesite.com/scheduler A boefje runner needs one to contact the scheduler for jobs, and one to send raw files to bytes once the jobs are done.
The task-queue exists in the scheduler, the boefje runner just picks up jobs from the scheduler and executes them locally. A boefje runner does not have more than one job active per thread at the moment, so no queue exists there.

Cleaned the code for unneeded log statements. Removed unused remote-runner ( which is implemented elsewhere.

Since the boefje `remote-scanner` never runs locally. This image doesnt have to be created.

underdarknl · 2024-07-10T12:59:51Z

boefjes/boefjes/katalogus/models.py

@@ -20,6 +20,7 @@ class Plugin(BaseModel):
    environment_keys: list[str] = Field(default_factory=list)
    related: list[str] | None = None
    enabled: bool = False
+    remote: bool = False


Its not the boefje that is 'remote', its the job in a specific context (eg, on a given IP, or connected to a specific network) that needs/can be ran remotely.

Jobs for Nmap-tcp that are scheduled for an IP in a network|Soufyan-home should not be returned to the job-runner who's limitations only mention Network|Internet, they should be returned to the boefje-runner that asks for jobs with the filter: network|Soufyan-home

N.B. This scoping on networks still needs to be build in the scheduler.

A simpler limitation/job filter could be the following:
The Nmap-TCP boefje can create two types of jobs for Internet connected IP's, one for ipv4 and one for ipv6 input objects.
To be able to succesfully run all Nmap-TCP jobs, the boefje runner needs to be able to access ipv4 and ipv6 'internet'.
I can see a way forward where there's two specific NMAP-TCP boefjes configured, one for each protocol. The Boefje manifest then lists the following needed Traits: ipv4-connectivity, ipv6-connectivity.
The boefje-runner can then from its config file read that it can only reach ipv4 networks, it can reach the network Internet and produces a job-query for the scheduler to receive only jobs that have the ipv4-connectivity trait.

Its not the boefje that is 'remote', its the job in a specific context (eg, on a given IP, or connected to a specific network) that needs/can be ran remotely.

This has been implemented this way because of the fact that tasks currently created with information from the boefje. (so my idea was to create boefjes that state that all tasks created for this type of boefje will have to be ran remotely).

they should be returned to the boefje-runner that asks for jobs with the filter: network|Soufyan-home

Do you have an idea when these tasks should be created? Would the user manually have to create these tasks and mention in what network they should run?

To be able to succesfully run all Nmap-TCP jobs, the boefje runner needs to be able to access ipv4 and ipv6 'internet'.
I can see a way forward where there's two specific NMAP-TCP boefjes configured, one for each protocol.

Do you want OpenKAT give the option to run the same boefje multiple times (with each their own settings)?

The boefje-runner can then from its config file read that it can only reach ipv4 networks, it can reach the network Internet and produces a job-query for the scheduler to receive only jobs that have the ipv4-connectivity trait.

I don't understand what you mean with a boefje-runner only being able to reach ipv4 networks. Shouldn't the boefje decide what kind of IPAddress it can take in?

Right now I have only focused on implementing a "remote" boefje that can request a job whenever that boefje is ready from the outside.

Its not the boefje that is 'remote', its the job in a specific context (eg, on a given IP, or connected to a specific network) that needs/can be ran remotely.

This has been implemented this way because of the fact that tasks currently created with information from the boefje. (so my idea was to create boefjes that state that all tasks created for this type of boefje will have to be ran remotely).

The way you implemented the Remote trait now, can work, but I would add a more general traits system to the boefje config and allow filtering on any/all of them.

they should be returned to the boefje-runner that asks for jobs with the filter: network|Soufyan-home

Do you have an idea when these tasks should be created? Would the user manually have to create these tasks and mention in what network they should run?

Ideally the scheduler just makes jobs, one for each boefje+input-ooi combination. Where they are executed is not up to the user. The user can add their own jobs, but again they won't be able to steer which runner runs these jobs.
All boefjes runners wlll try to execute jobs from the queue as created by the scheduler, but since not all jobs can be excecuted from everywhere, we need to make sure we have enough logic in the scheduler to give the correct jobs to the correct runner based on what networks/hosts they can reach (scopes), and what traits they have (ipv4, ipv6)

To be able to succesfully run all Nmap-TCP jobs, the boefje runner needs to be able to access ipv4 and ipv6 'internet'.
I can see a way forward where there's two specific NMAP-TCP boefjes configured, one for each protocol.

Do you want OpenKAT give the option to run the same boefje multiple times (with each their own settings)?

The boefje-runner can then from its config file read that it can only reach ipv4 networks, it can reach the network Internet and produces a job-query for the scheduler to receive only jobs that have the ipv4-connectivity trait.

I don't understand what you mean with a boefje-runner only being able to reach ipv4 networks. Shouldn't the boefje decide what kind of IPAddress it can take in?

A boefje like Nmap can consume all IP-addresses and should list both types as valid input-oois. However, if the machine running that specific boefje does not have ipv6 connectivity, running Nmap on an ipv6 address just spits out errors. The same would be true for trying to download an ipv6 version of a website.

Right now I have only focused on implementing a "remote" boefje that can request a job whenever that boefje is ready from the outside.

I am starting to understand it better, thank you 👍.

I will try to find a way of adding flags to tasks which can be filtered by the boefje runners.

It is still unclear how these tasks should be created with these flags. Should the nmap boefje specify that it will potentially run with IPv6 addresses or should the nmap boefje see that it's runner is not able to run with IPv6 addresses and skip those OOIs?

Souf149 added 12 commits June 19, 2024 12:22

Made a prototype of the remote boefje runner

d4f4b90

Made a simple boefje which can be ran remotely

8ab0f5e

The remote boefje can now successfully connect

56a603a

Working version of OpenKAT with task containing remote

72651f3

Gave tasks in the postgres a remote attribute

56a0286

Removed unused package from remote-scanner

d7ac79e

No idea why this suddenly gave an issue, might have had to do with the system not having enough space

Added remote property inside p_item.data

b2fed3b

When rescheduling a task the task gets created from the p_item's attributes. This made an issue where rescheduled tasks weren't properly marked to be ran remote

Removed debugging logs

fce778d

Made it so that boefjes don't run on tasks containing remote

56c8594

Fixed the testing of test_api.py

cd8885d

TODO: remove debug logs

Fixed the tests so they run successfully with remote boefjes

702df58

Removed the docker build command for remote-scanner

293113d

Souf149 changed the title ~~Made a prototype of the remote boefje runner~~ Prototype of the remote boefje runner Jul 10, 2024

underdarknl reviewed Jul 10, 2024

View reviewed changes

Souf149 added 4 commits July 10, 2024 12:18

Cleaned up for pull request

8f2fbc6

Cleaned the code for unneeded log statements. Removed unused remote-runner ( which is implemented elsewhere.

Removed the docker image for remote-scanner

df76b8a

Since the boefje `remote-scanner` never runs locally. This image doesnt have to be created.

Unremoved blank line

3605ab4

Removed unneeded print statement

6163cab

underdarknl reviewed Jul 10, 2024

View reviewed changes

Added example remote task in boefje's tests WIP

925b570

Souf149 closed this Jul 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prototype of the remote boefje runner #3223

Prototype of the remote boefje runner #3223

Souf149 commented Jul 10, 2024 •

edited

Loading

underdarknl Jul 10, 2024

Souf149 Jul 10, 2024

underdarknl Jul 10, 2024

Souf149 Jul 10, 2024

underdarknl Jul 10, 2024

underdarknl Jul 10, 2024

Souf149 Jul 10, 2024

underdarknl Jul 10, 2024

Souf149 Jul 11, 2024

Prototype of the remote boefje runner #3223

Prototype of the remote boefje runner #3223

Conversation

Souf149 commented Jul 10, 2024 • edited Loading

Changes

Issue link

Demo

QA notes

Code Checklist

Checklist for code reviewers:

Checklist for QA:

What works:

What doesn't work:

Bug or feature?:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Souf149 commented Jul 10, 2024 •

edited

Loading