Skip to content
This repository has been archived by the owner on Aug 15, 2021. It is now read-only.

IoReader enhancements #80

Merged
merged 6 commits into from
Nov 23, 2018

Conversation

chrysn
Copy link
Contributor

@chrysn chrysn commented Nov 16, 2018

These patches are by-catch of my work on #79, but should can stand on their own; in summary:

  • Add more test cases for indefinite length strings (to catch what I think would be the most likely to go wrong with the following changes)
  • Simplify .read_into() by using a provided method of the Read trait that does mostly the same (what's left is EOF error wrapping)
  • In .read(), use Read::take() (with Read::by_ref() as in its example to keep the original reader in place) to directly put the read data into the buffer vector without zeroing out a fresh area before

I haven't run any benchmarks on them, but I expect the read_into change to result in the same machine under optimization, and the read change to negligibly remove a cheap memory clearing -- but the former reduces code complexity, and the latter uses the vec's length cursor (which is available anyway) to more clearly express which portion of the scratch buffer is usable.

This should trigger if anything goes wrong with the management of
buffers for infinite lists during the re-initialization (eg. truncating
the buffer is missed).
@pyfisch
Copy link
Owner

pyfisch commented Nov 16, 2018

Looks good to me! But I found the code complicated to understand. Would mind to add some comments to the reader?

) -> Result<Reference<'de>> {
assert!(scratch.len() == scratch_offset);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this always the case? If it is, we should just remove the scratch_offset parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, at least while the Deserializer uses it as it does now.

I haven't changed the internal API part yet because there are two ways of going forward here (drop scratch_offset, or drop the buf.clear() in the reader and make scratch_offset a boolean "start anew"), see d1903ff -- but I think those are better explored in a separate change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok cool - it seems fine to follow up in a separate change.

chrysn added a commit to chrysn-pull-requests/cbor that referenced this pull request Nov 17, 2018
@chrysn
Copy link
Contributor Author

chrysn commented Nov 17, 2018

I've pushed the requested additional comments along with a typo as fixups, which I'd squash once the PR is good to go otherwise (or earlier if you prefer that workflow).

There's an additional commit on top now that does away with the "to_read be only up to 16k" loop, because I've read up on the read_to_end function that's now being used, and that already does something similar internally. The requested size truncated to 16k is still requested upfront to utilize the knowledge we have of the expected size, but if it's actually more, read_to_end will do about as well as the loop had.

@pyfisch
Copy link
Owner

pyfisch commented Nov 19, 2018

Good work. I think this is good to go.

Do you have any more comments @sfackler?

scratch_offset = scratch.len();
if let Some(ch) = self.ch.take() {
scratch.push(ch);
n -= 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we ever call this method with n == 0?

@sfackler
Copy link
Collaborator

LGTM other than the possible need for a check for n == 0.

The provided read_exact method already does exactly what read_into would
do, including looping while ErrorKind::Interrupted is returned.
The IoRead::read implementation used to manage the length of the vector
and its content independently, necessitating a zero-initialization. In
this change, only the capacity of the vec is set in advance, and the vec
is fed the input data from a taken read slice's read_to_end method,
which populates the vector and updates its length in (optimized)
lockstep.

As a result, the buffer has the nice property of always being the right
length, so scratch_offset has become a bit redundant -- but not yet been
removed, because there's two directions to go: Either

* do away with scratch_offset (as the information is already transfered
  in the length of the vec), or

* to move the buf (whose management is currently shared responsibility
  between deserializer and reader) completely into the reader, so
  `.read(n, scratch, scratch_offset)` becomes `.clear_buffer();
  .read_into_buffer(n)`, and the fully populated buffer can then be
  fetched as `.read_buffer<'a>() -> &'a [u8]`.

  Such a scheme would be helpful in implementations for no_std
  environments (either without alloc or with fallible allocation), and
  help reduce the number of #[cfg()] around the deserializer's infinite
  string handling there.
The splitting of reads into 16k chunks to avoid resource exhaustion has
become obsolete since .take() / .read_to_end() is used, as its
implementation already takes care of allocating in sane chunk sizes.

The .reserve() call for the small values ensures that we use the
available knowledge in the typical cases.

(Given indenting changes, this commit is best viewed with
--ignore-all-space).
Those are pointless to add in real data, but trigger code paths
otherwise less trod.
@chrysn chrysn force-pushed the ioreader-enhancements branch from a9ec6ae to 6814cb9 Compare November 21, 2018 15:46
@chrysn
Copy link
Contributor Author

chrysn commented Nov 21, 2018

Squashed into sensible commits (ie. all fixups resolved).

The n==0 was a good point, it's now fixed, commented and tested for. (Though the peek-and-then-becomes-0 branch is something I wouldn't know how to trigger w/o testing deep into the internals.)

@pyfisch pyfisch merged commit e1205e1 into pyfisch:master Nov 23, 2018
@chrysn chrysn deleted the ioreader-enhancements branch November 23, 2018 14:35
@chrysn
Copy link
Contributor Author

chrysn commented Nov 23, 2018

Rebasing the other branch, I noted that the promised fix for n == 0 got lost somehow during the squashing or merging; I've picked it from the reflogs and have it as the first commit in the next series upcoming for #81 (where I'm ironing out a few readability issues).

chrysn added a commit to chrysn-pull-requests/cbor that referenced this pull request Nov 23, 2018
That case is unlikely to cause trouble (it only would if there were a
peek to occur within a indefinite byte/string), but is still a pitfall
to future developers.

See-Also: pyfisch#80 (review)
@chrysn chrysn mentioned this pull request Dec 19, 2018
pyfisch pushed a commit that referenced this pull request Dec 19, 2018
That case is unlikely to cause trouble (it only would if there were a
peek to occur within a indefinite byte/string), but is still a pitfall
to future developers.

See-Also: #80 (review)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants