Refactor Codec/Serializers to reduce new object contstruction #164

sappenin · 2018-07-30T15:30:15Z

There are three problems with the codec/serializer framework as it stands now.

First, the codecs rely upon holding "state" until the Serializer can write (or the inverse in the case of a read). The net effect of this is that for every serialization operation, there is at least 1 new object construction operation (to create a new codec), but in general it's much worse that this.

For example, consider the AsnInterledgerPreparePacketDataCodec, which is engaged for every prepare packet. Here, we have at least 5 new object instantiations for every deserialize operation. So, if we process 5m payments (prepare[5] + fulfill[2]), we would expect at least 35 million new object creations, which wastes memory, cpu cycles, etc.

Second, the current system attempts to automatically map from Java objects to ASN.1 OER. For the most part, this is a fine default setting, but creates complexity inside of the framework when we need to create custom mappings. For example, currently the CodecContextFactory maps a BigInteger to AsnUint64Codec, which makes sense for the ILP codecs. However, for some CCP codecs, BigInteger can sometimes be required to map to AsnUintCodec instead. This is not a showstopper -- we might simply try to create a more intelligent Codec perhaps, but this new codec would likely not suit all use-cases. A better system would be one where a developer can have more fine-grained control over the Codec/Serializer that should be used, despite what the default factory has configured.

Third, is clarity of code. There appears to be overlap between the Serializers and the Codecs. For example, it's nearly impossible to create a good unit test of a serializer because it's tightly coupled to its codec (a proper unit test here can be quite complicated, typically because creating a codec-mock is non-trivial). Additionally, it can be unclear which layer is responsible for what.

This ticket proposes to solve both performance and clarity issues by combining the codecs and serializers into a single layer, with an API that allows for simple encoding/decoding functionality (like the current API) but also allows for fine-grained read... and write... methods, similar to how an InputStream/OutputStream would allow by providing direct access to read/write methods.

The text was updated successfully, but these errors were encountered:

sublimator · 2018-07-31T06:47:09Z

A few thoughts to share:

Generally, the high level structs seem to have to declare exactly which codec they require for each field, so I don't really see much problem with the defaults (re: BigInteger <-> Uint64) .
Actually I was looking at using this library in an android background service for a rpc via Bluetooth Low Energy framework. I found declaring the structs to be quite repetitive and cumbersome.

I was messing around a while back with custom annotations, which allowed the use of reflection/Proxy to auto generate codecs for serializing/deserializing Immutable structs. See sketch here: sublimator@b086d2e#diff-e71e8189355a88ef639759191518ecfaR72

Obviously not very performant. I think the auto generated sequence codecs were around 35% slower.
Essentially:

AnInterface.builder().from(DecoderProxy.from(AnInterface.class, ...)).build()

Whatever way it's implemented, it might be nice to have a separate package of annotations one could apply that codec-framework could understand, and auto generate codecs, which could be replaced with hand written codecs as and when performance is required. It would aid in documenting the structs, and the auto generated codecs could be compared against the hand written ones.

I agree the stateful codecs are a bit less than ideal. On the topic of performance, there's a lot of concurrent map usage inside the context which perhaps could be better solved with builders/ immutable maps and copies when registering?

I think I recall @adrianhopebailie at one time expressing that a simpler Reader/Writer pattern could be more performant?

adrianhopebailie · 2018-08-01T11:06:09Z

Yeh, this framework was definitely designed with "align to ASN.1" in mind.

The goal was a separation of the concepts:

Definition of data structures using ASN.1
Serializing data that conforms to those structures according to a set of encoding rules (in this case OER)

So you could have an ASN.1 modelling framework that allowed someone to write codecs to convert between business objects and an ASN.1 representation and then separately encode and decode these using a specific set of encoding rules.

Technically you should be able to write a different set of serializers for DER, BER etc if you needed an alternative encoding for the same ASN.1 defined objects.

The overhead of creating new objects each time was intentional as it made things simpler to implement (thread-safe without the need to enforce immutability) and I figured the "framework" approach was not the right tool for a high perf system anyway.

If you want performance you probably want use-case specific readers/writers that don't need to translate your business objects into an intermediate ASN.1 form. If we get to that then I think we want to look at a variety of other techniques for improving performance like re-using buffers for ILP packets and doing in place replacement of fixed length data (like amounts and expiries).

In a high perf ILP connector you could read a prepare packet, modify it, and forward it on with a new amount and expiry without ever needing to fully deserialize it. Maybe we can hold off on refactoring the codecs until we want to try something like that and ditch the framework altogether?

sublimator · 2018-08-01T11:33:51Z

hold off on refactoring the codecs

Yeah, they'll rewrite themselves if something actually uses them and needs the perf right?

sublimator · 2018-08-01T11:56:48Z

thread-safe without the need to enforce immutability

Not sure if I really follow :) If codecs were stateless it wouldn’t really matter right? In and outs?

sappenin added enhancement codecs labels Jul 30, 2018

adrianhopebailie mentioned this issue Aug 6, 2018

Change number-type-mapping for AsnUint8Codec and AsnUint16Codec #166

Closed

sappenin mentioned this issue Jun 25, 2019

Remove BigInteger from InterledgerPreparePacket and replace with long or UnsignedLong #201

Closed

sappenin mentioned this issue Apr 29, 2020

Add Benchmark measurements to Codecs #458

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Codec/Serializers to reduce new object contstruction #164

Refactor Codec/Serializers to reduce new object contstruction #164

sappenin commented Jul 30, 2018

sublimator commented Jul 31, 2018

adrianhopebailie commented Aug 1, 2018

sublimator commented Aug 1, 2018

sublimator commented Aug 1, 2018 via email

Refactor Codec/Serializers to reduce new object contstruction #164

Refactor Codec/Serializers to reduce new object contstruction #164

Comments

sappenin commented Jul 30, 2018

sublimator commented Jul 31, 2018

adrianhopebailie commented Aug 1, 2018

sublimator commented Aug 1, 2018

sublimator commented Aug 1, 2018 via email