Add information in the README about the ability to pipe output of indexer directly to the replay system #110

machawk1 · 2017-02-15T17:54:44Z

For example, one can:

ipwb index myCapture.warc | ipwb replay

Replay will read stdin if a CDXJ is not specified and thus process the CDXJ resulting from the ipwb indexer immediately instead of relying on the contents of a file.

The text was updated successfully, but these errors were encountered:

ibnesayeed · 2017-02-15T18:07:43Z

I am not comfortable with this method and perhaps would not advertise it even if it is supported for small testing. With the binary search in multiple CDXJ files to server TimeMap or Memento, it would become an overhead to maintain and wont scale well.

machawk1 · 2017-02-15T18:11:54Z

I have a few use cases where this feature is handy for small collections. Can you provide an example (w/ sample data) where this would not scale so we can account for these circumstances?

This ticket is about documenting how to use an existing feature. If others have small collections, it would informative to let them know that this feature is available.

ibnesayeed · 2017-02-15T18:33:41Z

The real problem is the fact that the replay is allowing the index to be read from STDIN which is essential to support pipes. Although, it looks like a handy feature, but it won't scale well. Piping is handy and efficient when the consumer end of the pipe processes the data as it arrives and then gets done with it. In this case though, the index data supplied in the pipe will persist for the lifetime of the replay process and will be looked up (scanned) each time a request hits the replay. This persistence will happen in memory, not on the disc, which means for any fairly large dataset the system can run out of memory very quickly.

I am afraid that once advertised, it will be difficult to step back. Hence, even if you want to keep this feature for some time to make the tests handy, you should not document it as it might go away soon.

machawk1 added the Documentation label Feb 15, 2017

machawk1 added this to the 2.0 (Extended more featureful implementation) milestone Feb 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add information in the README about the ability to pipe output of indexer directly to the replay system #110

Add information in the README about the ability to pipe output of indexer directly to the replay system #110

machawk1 commented Feb 15, 2017 •

edited

Loading

ibnesayeed commented Feb 15, 2017

machawk1 commented Feb 15, 2017

ibnesayeed commented Feb 15, 2017

Add information in the README about the ability to pipe output of indexer directly to the replay system #110

Add information in the README about the ability to pipe output of indexer directly to the replay system #110

Comments

machawk1 commented Feb 15, 2017 • edited Loading

ibnesayeed commented Feb 15, 2017

machawk1 commented Feb 15, 2017

ibnesayeed commented Feb 15, 2017

machawk1 commented Feb 15, 2017 •

edited

Loading