IoReader enhancements #80

chrysn · 2018-11-16T10:23:22Z

These patches are by-catch of my work on #79, but should can stand on their own; in summary:

Add more test cases for indefinite length strings (to catch what I think would be the most likely to go wrong with the following changes)
Simplify .read_into() by using a provided method of the Read trait that does mostly the same (what's left is EOF error wrapping)
In .read(), use Read::take() (with Read::by_ref() as in its example to keep the original reader in place) to directly put the read data into the buffer vector without zeroing out a fresh area before

I haven't run any benchmarks on them, but I expect the read_into change to result in the same machine under optimization, and the read change to negligibly remove a cheap memory clearing -- but the former reduces code complexity, and the latter uses the vec's length cursor (which is available anyway) to more clearly express which portion of the scratch buffer is usable.

This should trigger if anything goes wrong with the management of buffers for infinite lists during the re-initialization (eg. truncating the buffer is missed).

pyfisch · 2018-11-16T19:10:30Z

Looks good to me! But I found the code complicated to understand. Would mind to add some comments to the reader?

sfackler · 2018-11-16T19:24:40Z

src/read.rs

    ) -> Result<Reference<'de>> {
+        assert!(scratch.len() == scratch_offset);


Is this always the case? If it is, we should just remove the scratch_offset parameter.

Yes, at least while the Deserializer uses it as it does now.

I haven't changed the internal API part yet because there are two ways of going forward here (drop scratch_offset, or drop the buf.clear() in the reader and make scratch_offset a boolean "start anew"), see d1903ff -- but I think those are better explored in a separate change.

Ok cool - it seems fine to follow up in a separate change.

This adds the comments requested in pyfisch#80 (comment)

chrysn · 2018-11-17T12:58:15Z

I've pushed the requested additional comments along with a typo as fixups, which I'd squash once the PR is good to go otherwise (or earlier if you prefer that workflow).

There's an additional commit on top now that does away with the "to_read be only up to 16k" loop, because I've read up on the read_to_end function that's now being used, and that already does something similar internally. The requested size truncated to 16k is still requested upfront to utilize the knowledge we have of the expected size, but if it's actually more, read_to_end will do about as well as the loop had.

pyfisch · 2018-11-19T19:20:11Z

Good work. I think this is good to go.

Do you have any more comments @sfackler?

sfackler · 2018-11-20T18:56:29Z

src/read.rs

-            scratch_offset = scratch.len();
+        if let Some(ch) = self.ch.take() {
+            scratch.push(ch);
+            n -= 1;


Can we ever call this method with n == 0?

sfackler · 2018-11-20T18:58:58Z

LGTM other than the possible need for a check for n == 0.

The provided read_exact method already does exactly what read_into would do, including looping while ErrorKind::Interrupted is returned.

The IoRead::read implementation used to manage the length of the vector and its content independently, necessitating a zero-initialization. In this change, only the capacity of the vec is set in advance, and the vec is fed the input data from a taken read slice's read_to_end method, which populates the vector and updates its length in (optimized) lockstep. As a result, the buffer has the nice property of always being the right length, so scratch_offset has become a bit redundant -- but not yet been removed, because there's two directions to go: Either * do away with scratch_offset (as the information is already transfered in the length of the vec), or * to move the buf (whose management is currently shared responsibility between deserializer and reader) completely into the reader, so `.read(n, scratch, scratch_offset)` becomes `.clear_buffer(); .read_into_buffer(n)`, and the fully populated buffer can then be fetched as `.read_buffer<'a>() -> &'a [u8]`. Such a scheme would be helpful in implementations for no_std environments (either without alloc or with fallible allocation), and help reduce the number of #[cfg()] around the deserializer's infinite string handling there.

The splitting of reads into 16k chunks to avoid resource exhaustion has become obsolete since .take() / .read_to_end() is used, as its implementation already takes care of allocating in sane chunk sizes. The .reserve() call for the small values ensures that we use the available knowledge in the typical cases. (Given indenting changes, this commit is best viewed with --ignore-all-space).

Those are pointless to add in real data, but trigger code paths otherwise less trod.

chrysn · 2018-11-21T15:47:25Z

Squashed into sensible commits (ie. all fixups resolved).

The n==0 was a good point, it's now fixed, commented and tested for. (Though the peek-and-then-becomes-0 branch is something I wouldn't know how to trigger w/o testing deep into the internals.)

chrysn · 2018-11-23T15:32:28Z

Rebasing the other branch, I noted that the promised fix for n == 0 got lost somehow during the squashing or merging; I've picked it from the reflogs and have it as the first commit in the next series upcoming for #81 (where I'm ironing out a few readability issues).

That case is unlikely to cause trouble (it only would if there were a peek to occur within a indefinite byte/string), but is still a pitfall to future developers. See-Also: pyfisch#80 (review)

That case is unlikely to cause trouble (it only would if there were a peek to occur within a indefinite byte/string), but is still a pitfall to future developers. See-Also: #80 (review)

tests: Add case with multiple infinite (byte) strings

519906b

This should trigger if anything goes wrong with the management of buffers for infinite lists during the re-initialization (eg. truncating the buffer is missed).

sfackler reviewed Nov 16, 2018

View reviewed changes

chrysn added a commit to chrysn-pull-requests/cbor that referenced this pull request Nov 17, 2018

fixup! IO Reader: Avoid clearing the scratch buffer before writing

b8e504b

This adds the comments requested in pyfisch#80 (comment)

chrysn mentioned this pull request Nov 20, 2018

[WIP] Move scratch buffers into readers #81

Merged

sfackler reviewed Nov 20, 2018

View reviewed changes

chrysn added 5 commits November 21, 2018 16:34

tests: Add a case for various EOFs inside indefinite strings

1107667

IO Reader: Use provided read_exact function

9cca031

The provided read_exact method already does exactly what read_into would do, including looping while ErrorKind::Interrupted is returned.

tests: Add zero-length byte strings into indefinite sequences

6814cb9

Those are pointless to add in real data, but trigger code paths otherwise less trod.

chrysn force-pushed the ioreader-enhancements branch from a9ec6ae to 6814cb9 Compare November 21, 2018 15:46

pyfisch merged commit e1205e1 into pyfisch:master Nov 23, 2018

chrysn deleted the ioreader-enhancements branch November 23, 2018 14:35

chrysn mentioned this pull request Dec 19, 2018

no_std support #79

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IoReader enhancements #80

IoReader enhancements #80

chrysn commented Nov 16, 2018

pyfisch commented Nov 16, 2018

sfackler Nov 16, 2018

chrysn Nov 17, 2018

sfackler Nov 20, 2018

chrysn commented Nov 17, 2018

pyfisch commented Nov 19, 2018

sfackler Nov 20, 2018

sfackler commented Nov 20, 2018

chrysn commented Nov 21, 2018

chrysn commented Nov 23, 2018

		) -> Result<Reference<'de>> {
		assert!(scratch.len() == scratch_offset);

IoReader enhancements #80

IoReader enhancements #80

Conversation

chrysn commented Nov 16, 2018

pyfisch commented Nov 16, 2018

sfackler Nov 16, 2018

Choose a reason for hiding this comment

chrysn Nov 17, 2018

Choose a reason for hiding this comment

sfackler Nov 20, 2018

Choose a reason for hiding this comment

chrysn commented Nov 17, 2018

pyfisch commented Nov 19, 2018

sfackler Nov 20, 2018

Choose a reason for hiding this comment

sfackler commented Nov 20, 2018

chrysn commented Nov 21, 2018

chrysn commented Nov 23, 2018