- Sponsor
-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for WARCs based on version 1.1 of the spec? #37
Comments
In selectively importing parts of the warcio API, I can persuade the below to process examples with the above scenario: from warcio.archiveiterator import ArchiveIterator
from warcio.recordloader import ArcWarcRecordLoader
ArcWarcRecordLoader.WARC_TYPES.append('WARC/1.1')
warc11 = '(pathtomywarc)'
with open(warc11, 'rb') as stream:
for record in ArchiveIterator(stream):
if record.rec_type == 'response':
print(record.rec_headers.get_header('WARC-Date')) ...but this is a dirty hack and does not account for the other features of the 1.1 spec. With that also in mind, My above question (plans?) still remains. I am hoping to finally get around to integrating warcio into ipwb for oduwsdl/ipwb#380 and oduwsdl/ipwb#374. |
Similar issue:
The mentioned work-around (add |
@sebastian-nagel yeah, i think you're right, while we've been cautious to start writing 1.1 WARCs, we should definitely support reading |
…sing of WARCs using the latest WARC spec Bumped version to 1.5.4 per this change. Fixes webrecorder#37
Any news on Warc 1.1 support? |
…addresses #37) (reading already possible) - use full millis precision for WARC-Date when using WARC/1.1 - timeutils: iso_date_to_datetime() supports parsing millis param - timeutils: datetime_to_iso_date() supports 'use_millis' param which includes a millis fraction (as prt ISO 8601) - record_http: pass extra args to base warcwriter, supports 'warc_version' param - warc version: can be '1.0' or '1.1', converted to 'WARC/1.0' and 'WARC/1.1' respectively - tests: test warc 1.1 writing directly, through record_http, also add test for utils.open() - warcwriter: curr_warc_date() returns a second precsion (default) to millis precision based on current WARC version
* warc/1.1 support! add ability to more easily write WARC/1.1 records (addresses #37) (reading already possible) - use microsecond precision for WARC-Date when using WARC/1.1 - timeutils: iso_date_to_datetime() supports parsing microsecond param - timeutils: datetime_to_iso_date() supports 'use_micros' param which includes a microsecond fraction (as prt ISO 8601) - record_http: pass extra args to base warcwriter, supports 'warc_version' param - warc version: can be '1.0' or '1.1', converted to 'WARC/1.0' and 'WARC/1.1' respectively - tests: test warc 1.1 writing directly, through record_http, also add test for utils.open() - warcwriter: curr_warc_date() returns a second precsion (default) to microsecond precision based on current WARC version - Update README to mention WARC/1.1 support
Support for reading and writing WARC 1.1 added in warcio 1.6.0 |
I am creating some test cases for https://github.com/oduwsdl/ipwb and want to use the feature of the WARC/1.1 specification that allows for WARC-Date precision on the sub-second scale.
The sample WARCs I have generated process fine with warcio unless I use the
WARC/1.1
first line of a WARC record. Are there plans to allow records using this version of the spec to be processed by warcio?The text was updated successfully, but these errors were encountered: