-
Notifications
You must be signed in to change notification settings - Fork 81
Refactor Codec/Serializers to reduce new object contstruction #164
Comments
A few thoughts to share: Generally, the high level structs seem to have to declare exactly which codec they require for each field, so I don't really see much problem with the defaults (re: BigInteger <-> Uint64) . I was messing around a while back with custom annotations, which allowed the use of reflection/Proxy to auto generate codecs for serializing/deserializing Immutable structs. See sketch here: sublimator@b086d2e#diff-e71e8189355a88ef639759191518ecfaR72 Obviously not very performant. I think the auto generated sequence codecs were around 35% slower. AnInterface.builder().from(DecoderProxy.from(AnInterface.class, ...)).build() Whatever way it's implemented, it might be nice to have a separate package of annotations one could apply that codec-framework could understand, and auto generate codecs, which could be replaced with hand written codecs as and when performance is required. It would aid in documenting the structs, and the auto generated codecs could be compared against the hand written ones. I agree the stateful codecs are a bit less than ideal. On the topic of performance, there's a lot of concurrent map usage inside the context which perhaps could be better solved with builders/ immutable maps and copies when registering? I think I recall @adrianhopebailie at one time expressing that a simpler Reader/Writer pattern could be more performant? |
Yeh, this framework was definitely designed with "align to ASN.1" in mind. The goal was a separation of the concepts:
So you could have an ASN.1 modelling framework that allowed someone to write codecs to convert between business objects and an ASN.1 representation and then separately encode and decode these using a specific set of encoding rules. Technically you should be able to write a different set of serializers for DER, BER etc if you needed an alternative encoding for the same ASN.1 defined objects. The overhead of creating new objects each time was intentional as it made things simpler to implement (thread-safe without the need to enforce immutability) and I figured the "framework" approach was not the right tool for a high perf system anyway. If you want performance you probably want use-case specific readers/writers that don't need to translate your business objects into an intermediate ASN.1 form. If we get to that then I think we want to look at a variety of other techniques for improving performance like re-using buffers for ILP packets and doing in place replacement of fixed length data (like amounts and expiries). In a high perf ILP connector you could read a prepare packet, modify it, and forward it on with a new amount and expiry without ever needing to fully deserialize it. Maybe we can hold off on refactoring the codecs until we want to try something like that and ditch the framework altogether? |
Yeah, they'll rewrite themselves if something actually uses them and needs the perf right? |
thread-safe without the need to enforce immutability
Not sure if I really follow :) If codecs were stateless it wouldn’t really
matter right? In and outs?
|
There are three problems with the codec/serializer framework as it stands now.
First, the codecs rely upon holding "state" until the Serializer can write (or the inverse in the case of a read). The net effect of this is that for every serialization operation, there is at least
1
new object construction operation (to create a new codec), but in general it's much worse that this.For example, consider the AsnInterledgerPreparePacketDataCodec, which is engaged for every
prepare
packet. Here, we have at least 5 new object instantiations for every deserialize operation. So, if we process 5m payments (prepare[5] + fulfill[2]), we would expect at least 35 million new object creations, which wastes memory, cpu cycles, etc.Second, the current system attempts to automatically map from Java objects to ASN.1 OER. For the most part, this is a fine default setting, but creates complexity inside of the framework when we need to create custom mappings. For example, currently the CodecContextFactory maps a
BigInteger
toAsnUint64Codec
, which makes sense for the ILP codecs. However, for some CCP codecs, BigInteger can sometimes be required to map toAsnUintCodec
instead. This is not a showstopper -- we might simply try to create a more intelligent Codec perhaps, but this new codec would likely not suit all use-cases. A better system would be one where a developer can have more fine-grained control over the Codec/Serializer that should be used, despite what the default factory has configured.Third, is clarity of code. There appears to be overlap between the Serializers and the Codecs. For example, it's nearly impossible to create a good unit test of a serializer because it's tightly coupled to its codec (a proper unit test here can be quite complicated, typically because creating a codec-mock is non-trivial). Additionally, it can be unclear which layer is responsible for what.
This ticket proposes to solve both performance and clarity issues by combining the codecs and serializers into a single layer, with an API that allows for simple encoding/decoding functionality (like the current API) but also allows for fine-grained
read...
andwrite...
methods, similar to how anInputStream
/OutputStream
would allow by providing direct access to read/write methods.The text was updated successfully, but these errors were encountered: