-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pallas Evolutions #599
Comments
I would love if #[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Clone)]
pub struct KeepRaw<'b, T> {
raw: Cow<'b, [u8]>,
inner: T,
}
impl<T> KeepRaw<'_, T> {
pub fn to_owned(self) -> KeepRaw<'static, T> {
KeepRaw {
raw: Cow::Owned(self.raw.into_owned()),
inner: self.inner,
}
}
} |
In theory, we could also snapshot the final versions of these crates before a hard fork and publish them as |
I believe there are use-cases for something like pallas-traverse and multi-era representations; especially if you're building indexers and apps that use historical data. |
@KtorZ I agree with all of the pain points and with some of the proposed solutions, but I have counter-proposals for a few items. As a prelude to my comments, lets keep in mind that we have a short-term goal of stabilizing the api for a v1 and leave drastic api changes for a v2. Owned memoizationThe On a longer term, the approach I want to explore is having a heap-based, semantic cache for block bytes. This means having something like a process-wide map of block_hash -> Arc<Box<block_bytes>> that sub-structures can reference freely. The KeepRaw would have an option to point to this cache, to owned bytes or to a slice in the stack. This would probably go in a v2. De-couple Rust from CDDL/CBORI would split the requirements in the following way:
My understanding is that you're advocating to fulfill each requirement using the following:
My concern is that I don't see the value of having both pallas-latest and pallas-traverse. To me, requirements 2 & 3 can be fulfilled by the same api. What's the downside of pallas-traverse? If the downside is having Rust lifetimes, we're fixing that already by having the owned bytes option. My proposal is to simplify pallas-primitives by removing a lot of unnecessary complexity but still focus exclusively on requirement 1:
I acknowledge that I might be missing something here, I'm happy to have an in-depth discussion to explore this particular topic. Full-deserializationThe fact that addresses and similar structs are kept as bytes is a design decision:
Regarding your concerns:
My counter proposal: we remain with the lazy parsing approach but we add memoization in the ergonomic api layer (either pallas-traverse or pallas-latest). In this way, the overall performance hit is still paid once, but only by those who need it. Stronger type safetyAgreed, there's a half-baked attempt to force these struct-level business invariants in the primitives crate. We started by adding the validation logic at deserialization-time, but we immediately hit reality: testnet data was not being compliant so our pipelines halted. This is the exact same issue as with eager address parsing: the enforcement of any business invariant hidden inside deserialization code is too constraining. My proposal here is to have primitives that provide the semantic context for the data, but enforce the business invariants as part of the validation code that happens during phase-1 or similar. These struct might even have the validation code embedded in them (not to my liking but willing to compromise), but the check would be enforced imperatively by the user after the deserialization phase. pallas-primitives extensionsNo counter proposal here. This needs to happen, we just need to decide if is a change reasonable for v1 or if it should wait for v2. |
Thanks for the reply @scarmuega;
I believe they address two different concerns, and thus deserves two different APIs (because of different trade-offs). The problem with pallas-traverse aren't the lifetimes actually, but the need to constantly have to deal with multi-era objects. For example, if I parse a Conway block, which I know to be a Conway block; it doesn't make sense for me to lift it into a Cardano block and later in the app, have to deal with the unmatched eras cases because I only care about Conway. Pallas-traverse is great if you're building solutions that needs to span over multiple eras (e.g. chain indexers) where you do want logic that must handle multiple eras in an agnostic way. So, I need an API where I can fix an era and work with it. And so far, the only option is to keep things at the level of Sprinkling some Rust-idiomatic Iterators & whatnot to
Allow me to slightly disagree here; while not CBOR, the format of addresses is clearly specified. And things like rewards accounts are in fact not only bytes. Often enough, they are CBOR-in-CBOR bytes, and they require by design another level of serializations.
It depends where you draw the line I guess. You say CBOR-decoding, I say binary-decoding. That it's a mix of CBOR, Address and Flat shouldn't be much a concern of the consumer. It's about going from serialized bytes to a higher level Rust representation. Now, I understand that this may be a different design goal. But then, I am curious of how we can resolve that. Perhaps a useful case for the minicbor decoder contexts? Allowing different deserialization behaviours specified by the caller?
I assume you're referring to addresses with extra bytes at the end, back when the ledger wouldn't check that all bytes in an addresses had been consumed? If that's the only case then, that should be easily addressable by the library (allowing or not extra bytes). I am pretty certain though that outside of those and semantically invalid pointer addresses (though still well-formed from a binary perspective), there no addresses not abiding by CIP-19; Would you be okay if I gave a try to the suggestion above perhaps (i.e. let the caller decides whether to perform a full deserialization or a partial one; while leaving the default to be partial/incremental)? |
(I'll be splitting topics moving forward)
Not sure what you mean. Can you give me an example? The fact that the internal representation of the multiera object is sum type is hidden from the user. For example, lets say you want to interact with a Conway block to iterate over each tx, each output and each asset: // since you know you're dealing with conway, the runtime overhead here is
// roughly the same as using a newtype.
let block = MultiEraBlock::from(conway_block);
for tx in block.txs() {
for txo in tx.outputs() {
let coin = txo.value().coin();
for multi_asset in txo.value().assets() {
let policy = multi_asset.policy();
for asset in multi_asset.assets() {
let name = asset.name();
let coin = asset.coin();
}
}
}
} What's wrong with this api? where is the multiera bothering you? What would the difference be compared to a pallas-latest? |
The difference lies here: let block = MultiEraBlock::from(conway_block); I have a
So all-in-all, Pallas-traverse is adding runtime overhead and forcing the use of My suggestion would be to leverage traits and static dispatch as much as possible to avoid any runtime overhead (which is where Rust really shines IMO). And the good thing is that we don't even have to think too hard about how to break down the API since it's already been done in the Haskell codebase, e.g.: |
There's no dynamic dispatch in pallas-traverse, I think you're confusing terms. However, there are some heap allocations that were added to keep clippy happy, but these could be removed. Using downcasting methods such as If your concern is dealing with the latest era, returning placeholder structs for unsupported eras shouldn't be a problem . If you somehow enter one of these placeholder code branch, returning empty vecs is cheap since they don't allocate anything in the heap. I agree that losing resolution in the api surface (your "fee" example) is not good. The policy should be that the api surface must reflect the natural shape of the latest era. These cases should be refactored. Having a trait as a mechanism to represent the desired api surface will introduce the same overhead of a function call that you're describing in pallas-traverse. At the end of the day, I believe that the only real overhead of pallas-traverse is the pattern matching. So, we have these options:
|
I am not; I didn't say that pallas-traverse had dynamic dispatch; I said that it's how one would typically deal with pallas-traverse when using it. But, whatever 😅...
As said earlier, I think this makes sense because they would serve different purposes.
It is unfortunately not so easy I think. It's usually painful because of And don't get me wrong, I am not saying that the pallas-traverse API is wrong. It is perfectly fine in the context of exploring the chain and processing data from it. But it isn't from a perspective of validation, carrying guarantees obtained from deserialising.
I would argue that this isn't mutually exclusive with the first and second options, and could be done where we see fit (and I assume you also mean it this way 👍). |
Sorry, I don't see how this answer addresses my previous concerns, but I'm ok if you want to try prototyping this approach within Amaru and then move it upstream when we have a tangible example. Moving forward on this topic:
|
Hello! It's been a while now that I've been using Pallas for a variety of projects and got a good grasp of it. I'd like to propose a few changes (which I am volunteering to implement as well, but would like to open a discussion first). Some of them are potential breaking changes, though they feel necessary to me to bring Pallas to the next level.
Note
I am making a single issue now to discuss those points, but I am open to splitting it into different issues if that makes it easier.
Owned memoization ?
pallas-primitives comes with various
Minted
wrappers used to keep a reference to original bytes from which values are deserialized. For example:pallas/pallas-primitives/src/babbage/model.rs
Line 265 in adab71f
pallas/pallas-primitives/src/babbage/model.rs
Lines 362 to 363 in adab71f
pallas/pallas-primitives/src/babbage/model.rs
Lines 716 to 720 in adab71f
Fundamentally, those are mostly generic structures that leverage the
KeepRaw
type:pallas/pallas-codec/src/utils.rs
Lines 1107 to 1111 in adab71f
The main downside of this structure is that it comes with a lifetime and doesn't own the bytes. This has proven challenging in numerous places when drilling through the structure and passing data around. However, it does solve an important problem: it avoids re-serialisation altogether, which can be quite error-prone in the context of Cardano since the CBOR encoders/decoders are non-canonical and because the slightest byte divergence will invalidate all hashes and signatures based on the original data.
Given that it's also a prevalent type likely used by many application downstream, simply removing them is probably not a good idea. One immediate naive suggestion would be to introduce a new
Memoize
type similar toKeepRaw
, but that takes ownership of the bytes. Although, the problem with such type is that it can lead to a quadratic memory size growth (since the size of each sub-element and their children will now be stored twice). I am not sure it's much of a big deal though, because we do not have many of such types needing to own their serialization bytes.In addition, for many of them, we can also store hashes and sizes instead of bytes since it is often what we actually want (e.g. for blocks or transactions). So there are perhaps subtle variations of
KeepRaw
that would be more practical than keeping full bytes.De-couple Rust from CDDL/CBOR ?
pallas-primitives currently keeps a Rust representation of all types that is mimicking the on-the-wire format. In a sense, pallas-primitives is Rust eDSL to writing CBOR specific to Cardano.
I understand very much that
pallas-traverse
is meant to be used as an API layer on top of pallas-primitives that ensures many of the annoying CBOR-specifics are hidden away from users. Yet,pallas-traverse
forces one to think in terms of multi-era objects, whereas in many occasions, one truly only care about a single era.So I see two complementary options here:
Have
pallas-primitives
come with some extra quality-of-life additions, for example, more dogmatic iterators on all iterable structures or, accessors for most common fields (e.g. pulling out Ada value out of an output). Of course, I totally get the desire to keep pallas-primitive as "raw" and as close to the wire format as possible. So it boils down to finding the right balance, and months of usage has convinced me that it isn't there -- so I am willing to have a go at it.Introduce perhaps a new
pallas-latest
crate that provides this layer on top of pallas-primitives, with a focus on a single era (the latest) and with more a opinionated interface for Pallas' internals.I see both as complementary, and designing both at the same time can perhaps help iterating more quickly, as we can move out of
pallas-primitives
intopallas-latest
was is deemed too high level forpallas-primitives
Full-deserialisation
Some data-structure in pallas-primitives do not deserialize their internals fully. This is at least the case for types holding addresses and reward accounts. Those are deserialised to
Bytes
instead of structured objects, which has three main issues:It prevents an early failing in case where one these nested type is in fact ill-formed. One needs a second round of deserialisation to fully validate a deserialised object.
It comes with a performance hit since we need to traverse the structures an additional time to deserialise those leftovers.
It's overall unpractical, since those elements are not generic we can't simply replace them once parsed in the original object (e.g. a transaction output). And so, every access requires to re-deserialise (and potentially fail) the said fields.
So my proposal, is to ensure that those types are properly deserialized in full, such that a successful deserialisation yields a complete type.
Stronger type safety
pallas-codec
introduces many helpers to capture invariants from deserialised objects, which is good. What is more problematic is that many of those invariants can actually be bypassed. For example, aNonEmptyKeyValuePairs
doesn't enforce non-emptiness whatsoever, and provide public access to its constructors:pallas/pallas-codec/src/utils.rs
Lines 194 to 209 in adab71f
Another example is
Set
, which doesn't expose its internal constructors, but still allow conversion from a plain vector without any maintained guarantees:pallas/pallas-codec/src/utils.rs
Lines 705 to 725 in adab71f
From what I see on some other structures, such as
NonEmptySet
, I believe there's a desire to have those invariants enforced, and I put it on the rapid and organic changes that happened in the crate due to hard fork constraints to not have enforced them across the board. In particular, many structure do distinguish between definite/indefinite CBOR, which introduces quite a lot of duplications and inconsistencies.So I see room to perhaps unify this aspect with a more holistic and generic Def/Indef wrapper while avoiding repetition for each sub-type internals. This is, however, yet another breaking change.
At the very least, I would like to be more thorough about which constructors are exposed as well as the From/Into trait instances so that invariants are actually always enforced, ensuring that if one sees a given type with constraints, it comes with stronger guarantees.
pallas-primitives extensions ?
This is already partially discussed in #592, but there are various types from within
pallas-network
,pallas-traverse
andpallas-addresses
that would highly benefit from being moved down topallas-primitives
. Having them inpallas-network
forces consuming crates to introduce undesired network-specific dependencies, and has led to inconsistency in several occasions (with similar types being defined in multiple places, though in a slightly different way).From the top of my head, I can think of at least
Point
,PoolParams
, but is generally true of any type that doesn't appear in a block but is typically found elsewhere (e.g. in a state-query protocol).The text was updated successfully, but these errors were encountered: