Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype of the remote boefje runner #3223

Closed
wants to merge 17 commits into from

Conversation

Souf149
Copy link
Contributor

@Souf149 Souf149 commented Jul 10, 2024

Changes

Please describe the essence of this PR in a few sentences. Mention any breaking changes or required configuration steps.

Issue link

Closes #3222

Demo

Please add some proof in the form of screenshots or screen recordings to show (off) new functionality, if there are interesting new features for end-users.

QA notes

Please add some information for QA on how to test the newly created code.


Code Checklist

  • All the commits in this PR are properly PGP-signed and verified.
  • This PR only contains functionality relevant to the issue.
  • I have written unit tests for the changes or fixes I made.
  • I have checked the documentation and made changes where necessary.
  • I have performed a self-review of my code and refactored it to the best of my abilities.
  • Tickets have been created for newly discovered issues.
  • For any non-trivial functionality, I have added integration and/or end-to-end tests.
  • I have informed others of any required .env changes files if required and changed the .env-dist accordingly.
  • I have included comments in the code to elaborate on what is not self-evident from the code itself, including references to issues and discussions online, or implicit behavior of an interface.

Checklist for code reviewers:

  • The code does not violate Model-View-Template and our other architectural principles.
  • The code prioritizes readability over performance where appropriate.
  • The code does not bypass authentication or security mechanisms.
  • The code does not introduce any dependency on a library that has not been properly vetted.
  • The code contains docstrings, comments, and documentation where needed.

Checklist for QA:

  • I have checked out this branch, and successfully ran a fresh make reset.
  • I confirmed that there are no unintended functional regressions in this branch:
  • I have managed to pass the onboarding flow
  • Objects and Findings are created properly
  • Tasks are created and completed properly
  • I confirmed that the PR's advertised feature or hotfix works as intended.
  • I checked the logs for errors and/or warnings and made issues where necessary

What works:

  • bullet point + screenshot (if useful) per tested functionality

What doesn't work:

  • bullet point + screenshot (if useful) per tested functionality

Bug or feature?:

  • bullet point + screenshot (if useful) if it is unclear whether something is a bug or an intended feature.

@Souf149 Souf149 changed the title Made a prototype of the remote boefje runner Prototype of the remote boefje runner Jul 10, 2024
Comment on lines +98 to +100
def pop_non_remote_item(self, queue: str) -> QueuePrioritizedItem | None:
non_remote_filter = QueuePopModel(filters=[Filter(column="remote", operator="==", value=False)])
response = self._session.post(f"/queues/{queue}/pop", json=non_remote_filter.model_dump())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its better to keep the pop_item name the same, but configure the self._session (eg, the schedulerclient) to have a different url on which it can find the scheduler for remote runners.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of creating a new url or changing the method's name, I have thought of adding a new parameter to pop_item that would change the query. The change would look like something like this:

def pop_non_remote_item(self, queue: str) -> QueuePrioritizedItem | None:
	non_remote_filter = QueuePopModel(filters=[Filter(column="remote", operator="==", value=False)])
	response = self._session.post(f"/queues/{queue}/pop", json=non_remote_filter.model_dump())

=>

def pop_item(self, queue: str, remote: bool = False) -> QueuePrioritizedItem | None:
	non_remote_filter = QueuePopModel(filters=[Filter(column="remote", operator="==", value=remote)])
	response = self._session.post(f"/queues/{queue}/pop", json=non_remote_filter.model_dump())

(added a new parameter used for the filter and changed the name back)

Or do you think it'd be better to add a new URLs to the scheduler which would only pop non-remote tasks and remote tasks?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its the other way round.
The Scheduler is located on one location. and has one url (possibly two if you where to use a local url and an external url).

It's the boefjes runners that are hosted somewhere else. They need to contact the scheduler for Jobs. to do so, they need to know the url for the scheduler (eg, remote accessible, or local url).
With that request for jobs they need to send any limitations they themselves know about. Eg, Can they reach network X, can they reach internet, Can they perform ipv6 lookups etc.
Which limitations they have, and where the scheduler can be found needs to be configurable per Boefje Runner in a config file they can read locally (or via Env vars)

The scheduler then returns this filtered list of jobs, and the boefje runner starts processing them.
Once done, the boefje needs to return it's raw files to Bytes. For that a url/path to bytes also needs to be present, much in the same way the url/path to the scheduler needs to be present.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Scheduler is located on one location. and has one url (possibly two if you where to use a local url and an external url).

I don't understand what you mean with the scheduler potentially having 2 URLs.
Do you mean with "local url" and "external url" for example http://localhost:8004 and https://www.example.com/scheduler?

With that request for jobs they need to send any limitations they themselves know about. Eg, Can they reach network X, can they reach internet, Can they perform ipv6 lookups etc.

Would the information about those jobs all be inside the same tasks-queue that the current boefje-runner uses?
(With "boefjes runner" you mean the BoefjeHandler I assume).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, for some the scheduler can be found on localhost, or on a local IP, for others it might be located on https://somesite.com/scheduler A boefje runner needs one to contact the scheduler for jobs, and one to send raw files to bytes once the jobs are done.
The task-queue exists in the scheduler, the boefje runner just picks up jobs from the scheduler and executes them locally. A boefje runner does not have more than one job active per thread at the moment, so no queue exists there.

Souf149 added 4 commits July 10, 2024 12:18
Cleaned the code for unneeded log statements. Removed unused
remote-runner ( which is implemented elsewhere.
Since the boefje `remote-scanner` never runs locally. This image doesnt
have to be created.
@@ -20,6 +20,7 @@ class Plugin(BaseModel):
environment_keys: list[str] = Field(default_factory=list)
related: list[str] | None = None
enabled: bool = False
remote: bool = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not the boefje that is 'remote', its the job in a specific context (eg, on a given IP, or connected to a specific network) that needs/can be ran remotely.

Jobs for Nmap-tcp that are scheduled for an IP in a network|Soufyan-home should not be returned to the job-runner who's limitations only mention Network|Internet, they should be returned to the boefje-runner that asks for jobs with the filter: network|Soufyan-home

N.B. This scoping on networks still needs to be build in the scheduler.

A simpler limitation/job filter could be the following:
The Nmap-TCP boefje can create two types of jobs for Internet connected IP's, one for ipv4 and one for ipv6 input objects.
To be able to succesfully run all Nmap-TCP jobs, the boefje runner needs to be able to access ipv4 and ipv6 'internet'.
I can see a way forward where there's two specific NMAP-TCP boefjes configured, one for each protocol. The Boefje manifest then lists the following needed Traits: ipv4-connectivity, ipv6-connectivity.
The boefje-runner can then from its config file read that it can only reach ipv4 networks, it can reach the network Internet and produces a job-query for the scheduler to receive only jobs that have the ipv4-connectivity trait.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not the boefje that is 'remote', its the job in a specific context (eg, on a given IP, or connected to a specific network) that needs/can be ran remotely.

This has been implemented this way because of the fact that tasks currently created with information from the boefje. (so my idea was to create boefjes that state that all tasks created for this type of boefje will have to be ran remotely).

they should be returned to the boefje-runner that asks for jobs with the filter: network|Soufyan-home

Do you have an idea when these tasks should be created? Would the user manually have to create these tasks and mention in what network they should run?

To be able to succesfully run all Nmap-TCP jobs, the boefje runner needs to be able to access ipv4 and ipv6 'internet'.
I can see a way forward where there's two specific NMAP-TCP boefjes configured, one for each protocol.

Do you want OpenKAT give the option to run the same boefje multiple times (with each their own settings)?

The boefje-runner can then from its config file read that it can only reach ipv4 networks, it can reach the network Internet and produces a job-query for the scheduler to receive only jobs that have the ipv4-connectivity trait.

I don't understand what you mean with a boefje-runner only being able to reach ipv4 networks. Shouldn't the boefje decide what kind of IPAddress it can take in?


Right now I have only focused on implementing a "remote" boefje that can request a job whenever that boefje is ready from the outside.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not the boefje that is 'remote', its the job in a specific context (eg, on a given IP, or connected to a specific network) that needs/can be ran remotely.

This has been implemented this way because of the fact that tasks currently created with information from the boefje. (so my idea was to create boefjes that state that all tasks created for this type of boefje will have to be ran remotely).

The way you implemented the Remote trait now, can work, but I would add a more general traits system to the boefje config and allow filtering on any/all of them.

they should be returned to the boefje-runner that asks for jobs with the filter: network|Soufyan-home

Do you have an idea when these tasks should be created? Would the user manually have to create these tasks and mention in what network they should run?

Ideally the scheduler just makes jobs, one for each boefje+input-ooi combination. Where they are executed is not up to the user. The user can add their own jobs, but again they won't be able to steer which runner runs these jobs.
All boefjes runners wlll try to execute jobs from the queue as created by the scheduler, but since not all jobs can be excecuted from everywhere, we need to make sure we have enough logic in the scheduler to give the correct jobs to the correct runner based on what networks/hosts they can reach (scopes), and what traits they have (ipv4, ipv6)

To be able to succesfully run all Nmap-TCP jobs, the boefje runner needs to be able to access ipv4 and ipv6 'internet'.
I can see a way forward where there's two specific NMAP-TCP boefjes configured, one for each protocol.

Do you want OpenKAT give the option to run the same boefje multiple times (with each their own settings)?

The boefje-runner can then from its config file read that it can only reach ipv4 networks, it can reach the network Internet and produces a job-query for the scheduler to receive only jobs that have the ipv4-connectivity trait.

I don't understand what you mean with a boefje-runner only being able to reach ipv4 networks. Shouldn't the boefje decide what kind of IPAddress it can take in?

A boefje like Nmap can consume all IP-addresses and should list both types as valid input-oois. However, if the machine running that specific boefje does not have ipv6 connectivity, running Nmap on an ipv6 address just spits out errors. The same would be true for trying to download an ipv6 version of a website.

Right now I have only focused on implementing a "remote" boefje that can request a job whenever that boefje is ready from the outside.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am starting to understand it better, thank you 👍.

I will try to find a way of adding flags to tasks which can be filtered by the boefje runners.

It is still unclear how these tasks should be created with these flags. Should the nmap boefje specify that it will potentially run with IPv6 addresses or should the nmap boefje see that it's runner is not able to run with IPv6 addresses and skip those OOIs?

@Souf149 Souf149 closed this Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

There is currently no way of having a boefje gather information outside the OpenKAT instance.
2 participants