Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move CDXJ file handling from bin/cli.js to WACZ class #120

Merged
merged 5 commits into from
Aug 15, 2024

Conversation

tw4l
Copy link
Collaborator

@tw4l tw4l commented Aug 14, 2024

Follow-up PR for issue #88

This change makes it so that other applications like Browsertrix Crawler that want to use js-wacz as a library can pass in existing CDXJ files simply by setting the right values during the WACZ class initialization rather than having to duplicate code from the js-wacz CLI module to add each line of the provided CDXJ files to the right b-tree manually.

This will allow other applications like Browsertrix Crawler that
want to pass in existing CDXJ files be able to do so simply by
setting the right values in the WACZ class initialization rather
than having to duplicate code from the js-wacz CLI.
@tw4l tw4l requested a review from matteocargnelutti August 14, 2024 16:58
Copy link
Collaborator

@matteocargnelutti matteocargnelutti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there @tw4l ! Great PR, thank you.

I added docs for this new cdjx option in types.js and made a few other related edits - would you mind having a look?

Thank you 👋

types.js Outdated
* @property {?string} url - If set, will be added to datapackage.json as `mainPageUrl`.
* @property {?string} ts - If set, will be added to datapackage.json as `mainPageDate`. Can be any value that `Date()` can parse.
* @property {?string} title - If set, will be added to datapackage.json as `title`.
* @property {?string} description - If set, will be added to datapackage.json as `description`.
* @property {?string} signingUrl - If set, will be used to try and sign the resulting archive.
* @property {?string} signingToken - Access token to be used in combination with `signingUrl`.
* @property {?Object} datapackageExtras - If set, will be appended to datapackage.json under `extras`.
* @property {?string} cdxj - If set, skips indexing and allows for passing CDXJ files "as is". Path to a folder containing CDXJ files. Allows
Copy link
Collaborator Author

@tw4l tw4l Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding these! Looks like maybe some text got cut off at the end here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woops, thanks!

@matteocargnelutti
Copy link
Collaborator

@tw4l Merging now to make upcoming work on #119 easier :) thanks again!

@matteocargnelutti matteocargnelutti merged commit 5014fff into main Aug 15, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants