-
Notifications
You must be signed in to change notification settings - Fork 99
IoReader enhancements #80
IoReader enhancements #80
Conversation
This should trigger if anything goes wrong with the management of buffers for infinite lists during the re-initialization (eg. truncating the buffer is missed).
Looks good to me! But I found the code complicated to understand. Would mind to add some comments to the reader? |
) -> Result<Reference<'de>> { | ||
assert!(scratch.len() == scratch_offset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this always the case? If it is, we should just remove the scratch_offset parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, at least while the Deserializer uses it as it does now.
I haven't changed the internal API part yet because there are two ways of going forward here (drop scratch_offset, or drop the buf.clear() in the reader and make scratch_offset a boolean "start anew"), see d1903ff -- but I think those are better explored in a separate change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok cool - it seems fine to follow up in a separate change.
This adds the comments requested in pyfisch#80 (comment)
I've pushed the requested additional comments along with a typo as fixups, which I'd squash once the PR is good to go otherwise (or earlier if you prefer that workflow). There's an additional commit on top now that does away with the "to_read be only up to 16k" loop, because I've read up on the read_to_end function that's now being used, and that already does something similar internally. The requested size truncated to 16k is still requested upfront to utilize the knowledge we have of the expected size, but if it's actually more, read_to_end will do about as well as the loop had. |
Good work. I think this is good to go. Do you have any more comments @sfackler? |
scratch_offset = scratch.len(); | ||
if let Some(ch) = self.ch.take() { | ||
scratch.push(ch); | ||
n -= 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we ever call this method with n == 0
?
LGTM other than the possible need for a check for |
The provided read_exact method already does exactly what read_into would do, including looping while ErrorKind::Interrupted is returned.
The IoRead::read implementation used to manage the length of the vector and its content independently, necessitating a zero-initialization. In this change, only the capacity of the vec is set in advance, and the vec is fed the input data from a taken read slice's read_to_end method, which populates the vector and updates its length in (optimized) lockstep. As a result, the buffer has the nice property of always being the right length, so scratch_offset has become a bit redundant -- but not yet been removed, because there's two directions to go: Either * do away with scratch_offset (as the information is already transfered in the length of the vec), or * to move the buf (whose management is currently shared responsibility between deserializer and reader) completely into the reader, so `.read(n, scratch, scratch_offset)` becomes `.clear_buffer(); .read_into_buffer(n)`, and the fully populated buffer can then be fetched as `.read_buffer<'a>() -> &'a [u8]`. Such a scheme would be helpful in implementations for no_std environments (either without alloc or with fallible allocation), and help reduce the number of #[cfg()] around the deserializer's infinite string handling there.
The splitting of reads into 16k chunks to avoid resource exhaustion has become obsolete since .take() / .read_to_end() is used, as its implementation already takes care of allocating in sane chunk sizes. The .reserve() call for the small values ensures that we use the available knowledge in the typical cases. (Given indenting changes, this commit is best viewed with --ignore-all-space).
Those are pointless to add in real data, but trigger code paths otherwise less trod.
a9ec6ae
to
6814cb9
Compare
Squashed into sensible commits (ie. all fixups resolved). The n==0 was a good point, it's now fixed, commented and tested for. (Though the peek-and-then-becomes-0 branch is something I wouldn't know how to trigger w/o testing deep into the internals.) |
Rebasing the other branch, I noted that the promised fix for n == 0 got lost somehow during the squashing or merging; I've picked it from the reflogs and have it as the first commit in the next series upcoming for #81 (where I'm ironing out a few readability issues). |
That case is unlikely to cause trouble (it only would if there were a peek to occur within a indefinite byte/string), but is still a pitfall to future developers. See-Also: pyfisch#80 (review)
That case is unlikely to cause trouble (it only would if there were a peek to occur within a indefinite byte/string), but is still a pitfall to future developers. See-Also: #80 (review)
These patches are by-catch of my work on #79, but should can stand on their own; in summary:
.read_into()
by using a provided method of the Read trait that does mostly the same (what's left is EOF error wrapping).read()
, useRead::take()
(withRead::by_ref()
as in its example to keep the original reader in place) to directly put the read data into the buffer vector without zeroing out a fresh area beforeI haven't run any benchmarks on them, but I expect the read_into change to result in the same machine under optimization, and the read change to negligibly remove a cheap memory clearing -- but the former reduces code complexity, and the latter uses the vec's length cursor (which is available anyway) to more clearly express which portion of the scratch buffer is usable.