Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Endianess enum for runtime cases #95

Open
Xion opened this issue Sep 7, 2017 · 28 comments
Open

Endianess enum for runtime cases #95

Xion opened this issue Sep 7, 2017 · 28 comments

Comments

@Xion
Copy link

Xion commented Sep 7, 2017

From a discussion on #rust-beginners there seems to be a need for ByteOrder implementation that dispatches to LE/BE based on runtime information.

Essentially something like this:

enum Endianess { Little, Big }
impl ByteOrder for Endianess {
    // boilerplate methods with `match` that dispatch to LE or BE
}

let endianess = get_endianess_at_runtime();
endianess.read_i32(&some_bytes);

The byteorder docs don't seem to say that the crate is focused solely on static/type-level checking, so I'm guessing this would be in scope for the library.

Of course this isn't strictly necessary, as you can probably just write reading/writing code generically and simply move the LE/BE decision to a higher level, but it may simplify some use cases regardless.

@BurntSushi
Copy link
Owner

I would like to see use cases for this. How many people using byteorder have had to invent this abstraction (or something similar) on their own?

@Kixunil
Copy link

Kixunil commented Sep 10, 2017

I had to yesterday, but obviously I did it for generating the code, not in the actual run time.

@Xion the ByteOrder trait doesn't accept &self, so there's no way to dispatch. One would have to add another trait, which does.

@Kixunil
Copy link

Kixunil commented Sep 20, 2017

I've found another interesting use case. When implementing (de)serialization manually (e.g. to support large numbers), this would be beneficial:

trait ByteOrder {
    const ENDIANESS: Endianess;

    // The rest of the ByteOrder trait
}

fn read_from_bytes<BO: ByteOrder>(bytes: &[u8]) -> MyType {
    // since the condition is const expr, the unreachable branch will be eliminated by compiler
    if BO::ENDIANESS == Endianess::BigEndian {
        // big endian implementation
    } else {
        // little endian implementation
    }
}

This allows more powerful generic programming. While run-time enum is not strictly necessary (one may use const IS_BIG_ENDIAN: bool), I believe it's much clearer than using bool or any other type.

@BurntSushi
Copy link
Owner

@Kixunil Could you elaborate a bit more on your use case? I don't think I understand it.

@Kixunil
Copy link

Kixunil commented Sep 23, 2017

Look at FromBytesOrdered trait in my struct_deser crate. It's easy to implement such trait for all primitive types, but how would one implement it for his own type? (e.g. u256 - realistic cryptography scenario). There's no clean way to dispatch the implementation based on type except for this hack:

if BO::read_u16(&[42, 0]) == 42 {
// Big endian
} else {
// Little endian
}

While this works, it's less straightforward, harder to understand and compiler must work harder to optimize it. At first I even thought it's impossible - only after more thinking I invented this hack. I wouldn't be surprised if someone else didn't see it and gave up.

Is that more clear now?

@clarfonthey
Copy link

Are there any supported Rust targets that have runtime endianness?

@Enet4
Copy link
Contributor

Enet4 commented Oct 6, 2017

I would certainly be interested in this. My published nifti crate has a runtime endianness type because NIfTI files can be stored in any of the two byte orders. In a similar fashion to the one mentioned by @Kixunil, we are supposed to detect byte order by reading and observing an integer in the file's header. If it's off the expected range, then the order needs to be swapped on nearly all other values in the file. Rather than swapping everything in one go, values in the NIfTI volume are only byte-swapped when necessary, so this information needs to be retained at runtime.

My WIP library dicom also has an Endianness type, as the DICOM standard specifies multiple attribute encoding formats (called transfer syntaxes), each in either little endian or big endian. One of my lower layers of abstraction here, BasicDecoder, roughly resembles the behaviour of a runtime Endianness construct.

@matthieu-m
Copy link

I have worked with a number of serialization frameworks which serialized in "natural order". This is rather handy as serializing an array of i32 is a single memcpy call.

On the other hand, it means that at deserialization time the program needs to be able to handle both little and big endian files regardless of the endiannes of the actual hardware on which it runs.

@BurntSushi
Copy link
Owner

Maybe it's because I'm tired, but I've read this whole thread, and I still don't understand what folks are asking for here. I'd appreciate it if I wasn't sent to some crate to look at a nearly undocumented trait for a use case. Can someone lay this out for me in plain terms with an example? It would help to show the code that is needed today, and then to show the improvement if byteorder had this feature. Thanks.

@Enet4
Copy link
Contributor

Enet4 commented Nov 3, 2017

I can only speak for myself here, but a runtime endianness object Endianness enables reading and writing words of data with a byte order that is only decided at run time. For the sake of simplicity, consider this iterator of floats read from a generic input source:

struct FloatReader<R> {
    order: Endianness,
    src: R,
}

impl<R: Read> Iterator for FloatReader<R> {
    type Item = IoResult<f32>;

    fn next(&mut self) -> Option<IoResult<f32>> {
        match self.order.read_f32(&mut self.src) {
            Ok(v) => Some(Ok(v)),
            Err(ref e) if e.kind() == ErrorKind::UnexpectedEof => None,
            e @ Err(_) => Some(e),
        }
    }
}

The decision of whether the floats are in LE or BE would be made on a producer of some sort, potentially from a header portion of the raw data.

fn produce_reader<R: Read>(src: R) -> IoResult<FloatReader<R>> {
    let e: Endianness = unimplemented!();
    Ok(FloatReader {
        order: e,
        src, 
    });
}

The main difference between what byteorder has today and this Endianness type is in how they are materialized. Both LittleEndian and BigEndian are just to be used as types. One could consider enclosing an instance of either one in a Box and call that a run-time byte order decision, but that cannot be fed to the ReadBytesExt API.

show the improvement if byteorder had this feature

In the end, we might consent with each of us having their own util module with more than 100 lines of Endianness boilerplate. But keeping this in a well maintained crate is always a plus.

@BurntSushi
Copy link
Owner

@Enet4 Thank you so much for taking the time to lay that out for me. That made it much easier to understand this ticket. :-)

@shepmaster
Copy link

Add one more request to the pile. I have almost the same usecase — reading a data file which contains a header which determines if the subsequent data is little- or big-endian. I'll probably copy-and-paste @Enet4's implementation, but it would be nice to have here 😇

@taralx
Copy link

taralx commented Dec 13, 2017

+1 to this. I'm handling old mainframe datasets, and there's a bit that says whether the structures are big-endian or little-endian.

This is what I have:

struct WrapOrder<O: ByteOrder> {
    marker: std::marker::PhantomData<O>,
}

trait WrappedOrder {
    fn read_u16(&self, buf: &[u8]) -> u16;
    fn read_u32(&self, buf: &[u8]) -> u32;
}

impl<O: ByteOrder> WrappedOrder for WrapOrder<O> {
    fn read_u16(&self, buf: &[u8]) -> u16 { O::read_u16(buf) }
    fn read_u32(&self, buf: &[u8]) -> u32 { O::read_u32(buf) }
}

@Enet4
Copy link
Contributor

Enet4 commented Dec 13, 2017

I did some research on the crates available for this, and none of them seems to cover this use case particularly well. What do you folks think of creating a new crate to cover this run-time endianness artifact? Later on, upon an appropriate agreement, it could be re-exported here.

@Kixunil
Copy link

Kixunil commented Feb 10, 2018

I just ran into a real-world case when this is useful: tiff file format. It has marker at the beginning telling reader whether it's big endian or little endian. Every single value in the file is then ordered accordingly.

@palfrey
Copy link

palfrey commented May 26, 2018

Another use case: I want to make a "generic" Diesel backend (so I can swap it between a real and test one), but the Backend trait includes ByteOrder, so I can't make this without limiting it to only one byte ordering. If it wasn't for the Sealed trait I'd just build my own...

@yageek
Copy link

yageek commented Jul 16, 2018

Could we help to solve this issue? Is anything planned or in progress?

@Enet4
Copy link
Contributor

Enet4 commented Jul 16, 2018

@yageek I believe one of the main concerns here is that there is no general consensus on how the API should work here. A run-time Endianness API such as the one I proposed above would not be fully compatible with some parts of the API shown here (e.g. ReadExt receives the byte order as a type parameter only, not as a method parameter). For the time being, it might be best to nurture this idea into a dedicated repository and crate. I'll try my best to attend to this concern and keep this thread up to date.

@yageek
Copy link

yageek commented Jul 16, 2018

@Enet4 I’m glad you’re working on it. I saw that the nightly includes some primitives dealing with endianness too. Make it sense to base current work on those instead of using this crate ?

@BurntSushi
Copy link
Owner

Which things did you see in nightly?

@yageek
Copy link

yageek commented Jul 16, 2018

I think I saw some from_be/to_be/from_le/to_le functions marked as nightly in the documentation of integers primitives types. I did not check for other primitives

@shepmaster
Copy link

Those are stable methods.

@BurntSushi
Copy link
Owner

Yeah byteorder already uses those. I'm not aware of anything stable or about to be stable that has any significance to this particular issue.

@yageek
Copy link

yageek commented Jul 17, 2018

I mixed up with the from_bytes methods :( My bad!

@Enet4
Copy link
Contributor

Enet4 commented Jul 17, 2018

from_bytes seems to be for converting a fixed size array into a primitive integral value, which this crate already does in its own way. It does not aim to address this feature request either.

@Enet4
Copy link
Contributor

Enet4 commented Jul 22, 2018

A tiny update, I pushed byteorder_runtime with a proof of concept to a new repo. It lacks tests and still holds many open API-related issues, but I would suggest directing all related discussion there.


Renamed the concept crate to byteordered

@khuey
Copy link

khuey commented May 20, 2020

The X11 protocol also has byte order that is selectable at runtime.

@maow-dev
Copy link

maow-dev commented Jan 4, 2025

This issue is quite old by now but I would like to add that the ELF file format selects byte order at runtime: https://refspecs.linuxfoundation.org/elf/elf.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.