-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add chrome with mitmproxy crawler and report #1888
base: main
Are you sure you want to change the base?
Conversation
Run headless Default to 3 visits 1 concurrent browser Single url as -u parameter instead of file with multiple urls
Run headless Default to 3 visits 1 concurrent browser Single url as -u parameter instead of file with multiple urls
boefjes/images/lapje/sitecrawler/crawloutput/2023-10-08_134122.mproxy
Outdated
Show resolved
Hide resolved
I renamed the boefje and normalizer.
I added the github workflow, but that currently fails because external contributors PRs can't push to the container registry. Should be fixed when the PR is merged.
This has been fixed.
Chrome is used because we are not sure that Debian's chromium doesn't have any modifications that impact the results or might get those in the future. I think getting accurate results using the browser that most people use is more important than the risk of breakage from newer versions.
The records have no value because they are external requests that don't set a cookie. The fact that there is an external request is already useful information. |
Multiple pages inside a HAR file can be dealt with in a separate issue: |
|
||
logger = logging.getLogger(__name__) | ||
|
||
OCI_IMAGE = "ghcr.io/minvws/nl-kat-chrome-crawler-mitmproxy:latest" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we pin this to a version once we have this PR merged, and a version build and uploaded? While I understand the flexibility in building the image from the latest upstream chome version, I'd want to avoid a situation where the day to day usage of a boefje changes based on a new version of the used container if we can.
Checklist for QA:
What works:
What doesn't work:
Bug or feature?:Nothing found yet that hasn't been mentioned before. Screenshots |
Lets create a HAR file boefje from this PR, and move the normalisers to a separate issues / PR once we figure out what kind of objects we want in the graph (specifically, how fine grained we want the cookies) |
No description provided.