Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When retrieving a remote index for replay, validate the cdxj syntax #86

Open
machawk1 opened this issue Jan 19, 2017 · 5 comments
Open

Comments

@machawk1
Copy link
Member

Use https://github.com/oduwsdl/ORS/wiki/CDXJ as a basis, except there is a newer version that has a different symbol (# ?) at the start of the metadata line. Does a library or explicit BNF exist for this @ibnesayeed ? This might work better as its own Py library to be utilized elsewhere, too.

@machawk1
Copy link
Member Author

I'm guessing one does not exist given the query for "cdxj" in pypi directs the user to ipwb.

@machawk1
Copy link
Member Author

Or rather, the functionality of pywb could be reused since, y'know, we are already using that very module.

@machawk1
Copy link
Member Author

On a second look, it does not appear that pywb has any sort of CDXJ validation. There is a parse_cdxj() that might throw an exception we can catch.

@machawk1 machawk1 self-assigned this Jan 22, 2017
@machawk1
Copy link
Member Author

Started on this and wrote tests in branch issue-86 but broke something where even sample cdxj won't validate. Will fix here before merging.

@ibnesayeed
Copy link
Member

No, we don't have any (E)BNF for CDXJ yet, but we will.

@machawk1 machawk1 changed the title When retrieving a remote index for replay, validate that cdxj syntax When retrieving a remote index for replay, validate the cdxj syntax Jan 30, 2017
@machawk1 machawk1 removed their assignment Aug 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants