You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Replay will read stdin if a CDXJ is not specified and thus process the CDXJ resulting from the ipwb indexer immediately instead of relying on the contents of a file.
The text was updated successfully, but these errors were encountered:
I am not comfortable with this method and perhaps would not advertise it even if it is supported for small testing. With the binary search in multiple CDXJ files to server TimeMap or Memento, it would become an overhead to maintain and wont scale well.
I have a few use cases where this feature is handy for small collections. Can you provide an example (w/ sample data) where this would not scale so we can account for these circumstances?
This ticket is about documenting how to use an existing feature. If others have small collections, it would informative to let them know that this feature is available.
The real problem is the fact that the replay is allowing the index to be read from STDIN which is essential to support pipes. Although, it looks like a handy feature, but it won't scale well. Piping is handy and efficient when the consumer end of the pipe processes the data as it arrives and then gets done with it. In this case though, the index data supplied in the pipe will persist for the lifetime of the replay process and will be looked up (scanned) each time a request hits the replay. This persistence will happen in memory, not on the disc, which means for any fairly large dataset the system can run out of memory very quickly.
I am afraid that once advertised, it will be difficult to step back. Hence, even if you want to keep this feature for some time to make the tests handy, you should not document it as it might go away soon.
For example, one can:
Replay will read stdin if a CDXJ is not specified and thus process the CDXJ resulting from the ipwb indexer immediately instead of relying on the contents of a file.
The text was updated successfully, but these errors were encountered: