Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add chrome with mitmproxy crawler and report #1888

Open
wants to merge 40 commits into
base: main
Choose a base branch
from
Open

Conversation

floort
Copy link

@floort floort commented Oct 9, 2023

No description provided.

@floort floort requested a review from a team as a code owner October 9, 2023 20:47
@Darwinkel
Copy link
Contributor

Checklist for QA:

  • I have checked out this branch, and successfully ran a fresh make reset.
  • I confirmed that there are no unintended functional regressions in this branch:
    • I have managed to pass the onboarding flow
    • Objects and Findings are created properly
    • Tasks are created and completed properly
  • I confirmed that the PR's advertised feature or hotfix works as intended.

I just want to say I think it's awesome to see that the work we did for the containerized boefjes and the the new report generation is paying off! 🎉

What works:

  • Boefje and normalizer output looks good
  • Report generation works as intended
  • Tested on a bunch of ugly websites, looks like the retrieved information is correct (plus no crashes)

What doesn't work:

  • Image does not exist yet and requires running docker build -t lapje boefjes/images/lapje. I think we should modify and generalize masscan's workflow before we merge this (shouldn't be to much work).

  • Minor overflow bug on the reports page when dealing with long cookie values:
    (not a bug introduced by this PR though, we should fix this in a separate PR)
    image

Bug or feature?:

  • Chrome is currently not pinned. To prevent unexpected regressions like the ones we've seen in the past, we probably should. I don't think this is possible through Google's repository though. Why don't we use Debian's chromium package?

  • From the report page, it is not entirely clear why some records have no values/content, and only a hostname. Some more info (maybe in Lapje's katalogus detail page) about what Lapje does exactly, and how its objects should be interpreted, would be nice.
    image

@dekkers
Copy link
Contributor

dekkers commented Oct 31, 2023

I renamed the boefje and normalizer.

Image does not exist yet and requires running docker build -t lapje boefjes/images/lapje. I think we should modify and generalize masscan's workflow before we merge this (shouldn't be to much work).

I added the github workflow, but that currently fails because external contributors PRs can't push to the container registry. Should be fixed when the PR is merged.

Minor overflow bug on the reports page when dealing with long cookie values:
(not a bug introduced by this PR though, we should fix this in a separate PR)

This has been fixed.

Chrome is currently not pinned. To prevent unexpected regressions like the ones we've seen in the past, we probably should. I don't think this is possible through Google's repository though. Why don't we use Debian's chromium package?

Chrome is used because we are not sure that Debian's chromium doesn't have any modifications that impact the results or might get those in the future. I think getting accurate results using the browser that most people use is more important than the risk of breakage from newer versions.

From the report page, it is not entirely clear why some records have no values/content, and only a hostname. Some more info (maybe in Lapje's katalogus detail page) about what Lapje does exactly, and how its objects should be interpreted, would be nice.

The records have no value because they are external requests that don't set a cookie. The fact that there is an external request is already useful information.

@underdarknl
Copy link
Contributor

Multiple pages inside a HAR file can be dealt with in a separate issue:
#1996


logger = logging.getLogger(__name__)

OCI_IMAGE = "ghcr.io/minvws/nl-kat-chrome-crawler-mitmproxy:latest"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we pin this to a version once we have this PR merged, and a version build and uploaded? While I understand the flexibility in building the image from the latest upstream chome version, I'd want to avoid a situation where the day to day usage of a boefje changes based on a new version of the used container if we can.

@stephanie0x00
Copy link
Contributor

Checklist for QA:

  • I have checked out this branch, and successfully ran a fresh make reset.
  • I confirmed that there are no unintended functional regressions in this branch:
    • I have managed to pass the onboarding flow
    • [] Objects and Findings are created properly
    • Tasks are created and completed properly
  • I confirmed that the PR's advertised feature or hotfix works as intended.

What works:

  • Objects are created.
  • Report can be generated which shows a nice table with all the cookies found and its settings.

What doesn't work:

  • It appears that no Findings are currently being created when cookies are insufficient, e.g. when the Secure or HttpOnly flag is missing. This is something I'd expect as a minimum feature.
  • The output is stored as json/dictionairy-format under Objects. This does not match how other boefjes display their output. (See screenshot below)
  • It does not appear to crawl past the root of an URL, at least not for the various domains I've tried.
  • The explanation in the Katalogus should be improved. Currently the name of this boefje suggests that it crawls web applications for various paths (aka dirbuster-alike), it doesn't mention anything about crawling for cookies. (See screenshot below)

Bug or feature?:

Nothing found yet that hasn't been mentioned before.

Screenshots

mitproxy-objects

mitmproxy-katalogus

@underdarknl
Copy link
Contributor

Lets create a HAR file boefje from this PR, and move the normalisers to a separate issues / PR once we figure out what kind of objects we want in the graph (specifically, how fine grained we want the cookies)

@underdarknl underdarknl removed this from the OpenKAT v1.17 milestone Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
boefjes Issues related to boefjes
Projects
Status: Backlog / To do
Development

Successfully merging this pull request may close these issues.

7 participants