Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WARC 1.1 timestamp precision support #46

Merged
merged 3 commits into from
Oct 9, 2018
Merged

WARC 1.1 timestamp precision support #46

merged 3 commits into from
Oct 9, 2018

Conversation

ikreymer
Copy link
Member

@ikreymer ikreymer commented Oct 7, 2018

  • Add support for writing WARC/1.1 records specified via warc_version=WARC/1.1 or warc_version=1.1

  • When using WARC/1.1, use millisecond precision for WARC-Date

  • timeutils: datetime_to_iso_date() includes use_millis boolean to indicate including the milliseconds.

  • timeutils: iso_date_to_datetime() now also parses the microsecond fraction, if available.

  • tests: add tests for WARC/1.1, additional test for open()

  • Update README with WARC 1.1 info

Further addresses #37

ikreymer and others added 2 commits October 6, 2018 22:57
…addresses #37) (reading already possible)

- use full millis precision for WARC-Date when using WARC/1.1
- timeutils: iso_date_to_datetime() supports parsing millis param
- timeutils: datetime_to_iso_date() supports 'use_millis' param which includes a millis fraction (as prt ISO 8601)
- record_http: pass extra args to base warcwriter, supports 'warc_version' param
- warc version: can be '1.0' or '1.1', converted to 'WARC/1.0' and 'WARC/1.1' respectively
- tests: test warc 1.1 writing directly, through record_http, also add test for utils.open()
- warcwriter: curr_warc_date() returns a second precsion (default) to millis precision based on current WARC version
@ikreymer ikreymer requested a review from N0taN3rd October 7, 2018 18:20
@coveralls
Copy link

coveralls commented Oct 7, 2018

Coverage Status

Coverage increased (+0.002%) to 99.844% when pulling e650f56 on warc-1.1-date into 3f6e2d7 on develop.

@ikreymer ikreymer merged commit e8b5219 into develop Oct 9, 2018
@ikreymer ikreymer deleted the warc-1.1-date branch October 9, 2018 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants