Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pallas Evolutions #599

Open
KtorZ opened this issue Feb 13, 2025 · 10 comments
Open

Pallas Evolutions #599

KtorZ opened this issue Feb 13, 2025 · 10 comments

Comments

@KtorZ
Copy link
Member

KtorZ commented Feb 13, 2025

Hello! It's been a while now that I've been using Pallas for a variety of projects and got a good grasp of it. I'd like to propose a few changes (which I am volunteering to implement as well, but would like to open a discussion first). Some of them are potential breaking changes, though they feel necessary to me to bring Pallas to the next level.

Note

I am making a single issue now to discuss those points, but I am open to splitting it into different issues if that makes it easier.

Owned memoization ?

pallas-primitives comes with various Minted wrappers used to keep a reference to original bytes from which values are deserialized. For example:

Fundamentally, those are mostly generic structures that leverage the KeepRaw type:

#[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Clone)]
pub struct KeepRaw<'b, T> {
raw: &'b [u8],
inner: T,
}

The main downside of this structure is that it comes with a lifetime and doesn't own the bytes. This has proven challenging in numerous places when drilling through the structure and passing data around. However, it does solve an important problem: it avoids re-serialisation altogether, which can be quite error-prone in the context of Cardano since the CBOR encoders/decoders are non-canonical and because the slightest byte divergence will invalidate all hashes and signatures based on the original data.

Given that it's also a prevalent type likely used by many application downstream, simply removing them is probably not a good idea. One immediate naive suggestion would be to introduce a new Memoize type similar to KeepRaw, but that takes ownership of the bytes. Although, the problem with such type is that it can lead to a quadratic memory size growth (since the size of each sub-element and their children will now be stored twice). I am not sure it's much of a big deal though, because we do not have many of such types needing to own their serialization bytes.

In addition, for many of them, we can also store hashes and sizes instead of bytes since it is often what we actually want (e.g. for blocks or transactions). So there are perhaps subtle variations of KeepRaw that would be more practical than keeping full bytes.

De-couple Rust from CDDL/CBOR ?

pallas-primitives currently keeps a Rust representation of all types that is mimicking the on-the-wire format. In a sense, pallas-primitives is Rust eDSL to writing CBOR specific to Cardano.

I understand very much that pallas-traverse is meant to be used as an API layer on top of pallas-primitives that ensures many of the annoying CBOR-specifics are hidden away from users. Yet, pallas-traverse forces one to think in terms of multi-era objects, whereas in many occasions, one truly only care about a single era.

So I see two complementary options here:

  1. Have pallas-primitives come with some extra quality-of-life additions, for example, more dogmatic iterators on all iterable structures or, accessors for most common fields (e.g. pulling out Ada value out of an output). Of course, I totally get the desire to keep pallas-primitive as "raw" and as close to the wire format as possible. So it boils down to finding the right balance, and months of usage has convinced me that it isn't there -- so I am willing to have a go at it.

  2. Introduce perhaps a new pallas-latest crate that provides this layer on top of pallas-primitives, with a focus on a single era (the latest) and with more a opinionated interface for Pallas' internals.

I see both as complementary, and designing both at the same time can perhaps help iterating more quickly, as we can move out of pallas-primitives into pallas-latest was is deemed too high level for pallas-primitives

Full-deserialisation

Some data-structure in pallas-primitives do not deserialize their internals fully. This is at least the case for types holding addresses and reward accounts. Those are deserialised to Bytes instead of structured objects, which has three main issues:

  1. It prevents an early failing in case where one these nested type is in fact ill-formed. One needs a second round of deserialisation to fully validate a deserialised object.

  2. It comes with a performance hit since we need to traverse the structures an additional time to deserialise those leftovers.

  3. It's overall unpractical, since those elements are not generic we can't simply replace them once parsed in the original object (e.g. a transaction output). And so, every access requires to re-deserialise (and potentially fail) the said fields.

So my proposal, is to ensure that those types are properly deserialized in full, such that a successful deserialisation yields a complete type.

Stronger type safety

pallas-codec introduces many helpers to capture invariants from deserialised objects, which is good. What is more problematic is that many of those invariants can actually be bypassed. For example, a NonEmptyKeyValuePairs doesn't enforce non-emptiness whatsoever, and provide public access to its constructors:

/// Custom collection to ensure ordered pairs of values (non-empty)
///
/// Since the ordering of the entries requires a particular order to maintain
/// canonicalization for isomorphic decoding / encoding operators, we use a Vec
/// as the underlaying struct for storage of the items (as opposed to a BTreeMap
/// or HashMap).
#[derive(Serialize, Deserialize, Clone, Debug, PartialEq, Eq, PartialOrd, Ord, Hash)]
#[serde(try_from = "Vec::<(K, V)>", into = "Vec::<(K, V)>")]
pub enum NonEmptyKeyValuePairs<K, V>
where
K: Clone,
V: Clone,
{
Def(Vec<(K, V)>),
Indef(Vec<(K, V)>),
}

Another example is Set, which doesn't expose its internal constructors, but still allow conversion from a plain vector without any maintained guarantees:

pub struct Set<T>(Vec<T>);
impl<T> Set<T> {
pub fn to_vec(self) -> Vec<T> {
self.0
}
}
impl<T> Deref for Set<T> {
type Target = Vec<T>;
fn deref(&self) -> &Self::Target {
&self.0
}
}
impl<T> From<Vec<T>> for Set<T> {
fn from(value: Vec<T>) -> Self {
Set(value)
}
}

From what I see on some other structures, such as NonEmptySet, I believe there's a desire to have those invariants enforced, and I put it on the rapid and organic changes that happened in the crate due to hard fork constraints to not have enforced them across the board. In particular, many structure do distinguish between definite/indefinite CBOR, which introduces quite a lot of duplications and inconsistencies.

So I see room to perhaps unify this aspect with a more holistic and generic Def/Indef wrapper while avoiding repetition for each sub-type internals. This is, however, yet another breaking change.

At the very least, I would like to be more thorough about which constructors are exposed as well as the From/Into trait instances so that invariants are actually always enforced, ensuring that if one sees a given type with constraints, it comes with stronger guarantees.

pallas-primitives extensions ?

This is already partially discussed in #592, but there are various types from within pallas-network, pallas-traverse and pallas-addresses that would highly benefit from being moved down to pallas-primitives. Having them in pallas-network forces consuming crates to introduce undesired network-specific dependencies, and has led to inconsistency in several occasions (with similar types being defined in multiple places, though in a slightly different way).

From the top of my head, I can think of at least Point, PoolParams, but is generally true of any type that doesn't appear in a block but is typically found elsewhere (e.g. in a state-query protocol).

@SupernaviX
Copy link
Contributor

SupernaviX commented Feb 13, 2025

I would love if KeepRaw held a Cow<'b, [u8]>. It could keep the same API and avoid unnecessary copies, but it could own its data when it made sense to own its data.

#[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Clone)] 
pub struct KeepRaw<'b, T> {
    raw: Cow<'b, [u8]>,
    inner: T,
}

impl<T> KeepRaw<'_, T> {
    pub fn to_owned(self) -> KeepRaw<'static, T> {
        KeepRaw {
            raw: Cow::Owned(self.raw.into_owned()),
            inner: self.inner,
        }
    }
}

@Quantumplation
Copy link
Collaborator

Introduce perhaps a new pallas-latest crate that provides this layer on top of pallas-primitives, with a focus on a single era (the latest) and with more a opinionated interface for Pallas' internals.

In theory, we could also snapshot the final versions of these crates before a hard fork and publish them as pallas-{era}; we don't commit to maintaining the code for all of them, but they're there if you really want to parse previous era CBORs

@KtorZ
Copy link
Member Author

KtorZ commented Feb 13, 2025

I believe there are use-cases for something like pallas-traverse and multi-era representations; especially if you're building indexers and apps that use historical data.

@scarmuega
Copy link
Member

@KtorZ I agree with all of the pain points and with some of the proposed solutions, but I have counter-proposals for a few items.

As a prelude to my comments, lets keep in mind that we have a short-term goal of stabilizing the api for a v1 and leave drastic api changes for a v2.

Owned memoization

The Cow approach that @SupernaviX mentioned would indeed allow us to keep the same api while achieving the same goal. This is something that could be introduce in v1 without impacting the api surface.

On a longer term, the approach I want to explore is having a heap-based, semantic cache for block bytes. This means having something like a process-wide map of block_hash -> Arc<Box<block_bytes>> that sub-structures can reference freely. The KeepRaw would have an option to point to this cache, to owned bytes or to a slice in the stack. This would probably go in a v2.

De-couple Rust from CDDL/CBOR

I would split the requirements in the following way:

  1. we need an era-specific, protocol-specific CBOR codec for all structures shared through the wire.
  2. we need a way to traverse block data across eras for historical data processing (this won't go away)
  3. we need a way to inspect block / tx data from latest era with the goal of running consensus / ledger logic
  4. we need a way to construct tx data for the latest era with the goal of submitting new txs.

My understanding is that you're advocating to fulfill each requirement using the following:

  1. pallas-primitives
  2. pallas-traverse
  3. pallas-latest
  4. pallas-latest

My concern is that I don't see the value of having both pallas-latest and pallas-traverse. To me, requirements 2 & 3 can be fulfilled by the same api. What's the downside of pallas-traverse? If the downside is having Rust lifetimes, we're fixing that already by having the owned bytes option.

My proposal is to simplify pallas-primitives by removing a lot of unnecessary complexity but still focus exclusively on requirement 1:

  • remove all of the cbor-nuanced structures (MaybeIndefArray, KeyValuePairs, etc) and replace them with std types. These were introduced initially when I wanted to avoid the keeping the cbor around. Having both approaches co-existing doesn't make much sense.
  • remove the PseudoXXX intermediate structures and unify both versions of the dual structs (minted vs non-minted).

I acknowledge that I might be missing something here, I'm happy to have an in-depth discussion to explore this particular topic.

Full-deserialization

The fact that addresses and similar structs are kept as bytes is a design decision:

  • The CDDL defines them as bytes and, as stated before, the focus of primitives is fulfilling wire-level serialization.
  • Address parsing and cbor decoding are two very different concerns, coupling the two is a code smell.

Regarding your concerns:

  • Not failing early is a feature, not a bug. Block deserialization happens as an atomic operation, any fallible operation nested in the deserialization code could prevent the whole block from being interpreted. As an example, there are addresses on mainnet history that don't follow CIP-19. We've now managed to accommodate for those edge-cases in pallas-addresses, but it was a huge issue having many systems halting unexpectedly without any way to isolate the offending data.
  • I think that your performance concern is the exact opposite. Paying the price of decoding addresses at block deserialization is not something that everybody would appreciate. I understand where you're coming from, Amaru will probably need to parse every address present in the block, but that's not the case for many of the other Pallas consumers (eg: Oura, Scrolls, etc).

My counter proposal: we remain with the lazy parsing approach but we add memoization in the ergonomic api layer (either pallas-traverse or pallas-latest). In this way, the overall performance hit is still paid once, but only by those who need it.

Stronger type safety

Agreed, there's a half-baked attempt to force these struct-level business invariants in the primitives crate. We started by adding the validation logic at deserialization-time, but we immediately hit reality: testnet data was not being compliant so our pipelines halted. This is the exact same issue as with eager address parsing: the enforcement of any business invariant hidden inside deserialization code is too constraining.

My proposal here is to have primitives that provide the semantic context for the data, but enforce the business invariants as part of the validation code that happens during phase-1 or similar. These struct might even have the validation code embedded in them (not to my liking but willing to compromise), but the check would be enforced imperatively by the user after the deserialization phase.

pallas-primitives extensions

No counter proposal here. This needs to happen, we just need to decide if is a change reasonable for v1 or if it should wait for v2.

@KtorZ
Copy link
Member Author

KtorZ commented Feb 14, 2025

Thanks for the reply @scarmuega;

My concern is that I don't see the value of having both pallas-latest and pallas-traverse. To me, requirements 2 & 3 can be fulfilled by the same api. What's the downside of pallas-traverse? If the downside is having Rust lifetimes, we're fixing that already by having the owned bytes option.

I believe they address two different concerns, and thus deserves two different APIs (because of different trade-offs). The problem with pallas-traverse aren't the lifetimes actually, but the need to constantly have to deal with multi-era objects. For example, if I parse a Conway block, which I know to be a Conway block; it doesn't make sense for me to lift it into a Cardano block and later in the app, have to deal with the unmatched eras cases because I only care about Conway.

Pallas-traverse is great if you're building solutions that needs to span over multiple eras (e.g. chain indexers) where you do want logic that must handle multiple eras in an agnostic way.

So, I need an API where I can fix an era and work with it. And so far, the only option is to keep things at the level of pallas-primitives which can be impractical because, it's not designed for that but it's designed to be a Rust-CBOR representation.

Sprinkling some Rust-idiomatic Iterators & whatnot to pallas-primitives would certainly make it more bearable, and perhaps alleviate the need for an extra layer like a pallas-latest so I would certainly like to start there. And then, see if we truly need that layer.

The CDDL defines them as bytes and, as stated before, the focus of primitives is fulfilling wire-level serialization.

Allow me to slightly disagree here; while not CBOR, the format of addresses is clearly specified. And things like rewards accounts are in fact not only bytes. Often enough, they are CBOR-in-CBOR bytes, and they require by design another level of serializations.

Address parsing and cbor decoding are two very different concerns, coupling the two is a code smell.

It depends where you draw the line I guess. You say CBOR-decoding, I say binary-decoding. That it's a mix of CBOR, Address and Flat shouldn't be much a concern of the consumer. It's about going from serialized bytes to a higher level Rust representation.

Now, I understand that this may be a different design goal. But then, I am curious of how we can resolve that. Perhaps a useful case for the minicbor decoder contexts? Allowing different deserialization behaviours specified by the caller?

As an example, there are addresses on mainnet history that don't follow CIP-19.

I assume you're referring to addresses with extra bytes at the end, back when the ledger wouldn't check that all bytes in an addresses had been consumed? If that's the only case then, that should be easily addressable by the library (allowing or not extra bytes). I am pretty certain though that outside of those and semantically invalid pointer addresses (though still well-formed from a binary perspective), there no addresses not abiding by CIP-19;

Would you be okay if I gave a try to the suggestion above perhaps (i.e. let the caller decides whether to perform a full deserialization or a partial one; while leaving the default to be partial/incremental)?

@scarmuega
Copy link
Member

(I'll be splitting topics moving forward)

it doesn't make sense for me to lift it into a Cardano block and later in the app, have to deal with the unmatched eras cases because I only care about Conway.

Not sure what you mean. Can you give me an example?

The fact that the internal representation of the multiera object is sum type is hidden from the user. For example, lets say you want to interact with a Conway block to iterate over each tx, each output and each asset:

// since you know you're dealing with conway, the runtime overhead here is
// roughly the same as using a newtype.
let block = MultiEraBlock::from(conway_block);

for tx in block.txs() {
    for txo in tx.outputs() {
        let coin = txo.value().coin();

        for multi_asset in txo.value().assets() {
            let policy = multi_asset.policy();

            for asset in multi_asset.assets() {
                let name = asset.name();
                let coin = asset.coin();
            }
        }
    }
}

What's wrong with this api? where is the multiera bothering you? What would the difference be compared to a pallas-latest?

@KtorZ
Copy link
Member Author

KtorZ commented Feb 18, 2025

What's wrong with this api? where is the multiera bothering you? What would the difference be compared to a pallas-latest?

The difference lies here:

let block = MultiEraBlock::from(conway_block);

I have a conway_block, which I want to manipulate as such: a conway block. But I am forced to lift it into a MultiEraBlock which at this point, makes the type more general than it was (whereas in practice, I want to keep my type more specific). This has a few consequences:

  1. All field-accesses now becomes function calls, and internally, bare the cost of a pattern-match. While this is perhaps optimized away by the compiler, I am not versed enough in Rust internals to make a confident statement about it. So, it comes with a performance cost.

  2. Because the type is now more general, I loose knowledge about things that were true before. For example, fees now returns an Option<u64> instead of a u64. Another example could be accessing governance proposals, where I would get a vector of MultiEraProposal which is not only empty for all other eras but Conway, but comes with a as_conway method to turn them back into conway proposals, optionally. Other options will create boxes, and dynamic dispatch forcing re-allocation on the heap.

So all-in-all, Pallas-traverse is adding runtime overhead and forcing the use of unwrap and the handling of errors that wouldn't exist to begin with if I had just preserved my original type.

My suggestion would be to leverage traits and static dispatch as much as possible to avoid any runtime overhead (which is where Rust really shines IMO). And the good thing is that we don't even have to think too hard about how to break down the API since it's already been done in the Haskell codebase, e.g.:

@scarmuega
Copy link
Member

There's no dynamic dispatch in pallas-traverse, I think you're confusing terms.

However, there are some heap allocations that were added to keep clippy happy, but these could be removed.

Using downcasting methods such as as_conway is only intended as an escape hatch. The api surface of these MultiEra structs should be comprehensive (meaning, we should add as many getters as required).

If your concern is dealing with the latest era, returning placeholder structs for unsupported eras shouldn't be a problem . If you somehow enter one of these placeholder code branch, returning empty vecs is cheap since they don't allocate anything in the heap.

I agree that losing resolution in the api surface (your "fee" example) is not good. The policy should be that the api surface must reflect the natural shape of the latest era. These cases should be refactored.

Having a trait as a mechanism to represent the desired api surface will introduce the same overhead of a function call that you're describing in pallas-traverse. At the end of the day, I believe that the only real overhead of pallas-traverse is the pattern matching.

So, we have these options:

  • Create a new api surface based on traits. This will force all downstream implementations to use generics functions. We'll need to maintain 2 very similar api surfaces. The benefit is avoiding pattern matching on getters.
  • Adjust traverse to avoid losing resolution. We pay the price of pattern matching overhead. The benefit is that we don't force generics downstream and that we can focus on a single api surface.
  • Promote downcasting into era-specific primitives. Simplify data structures in pallas-primitives so that they are more ergonomic. Implementations that have very specific performance needs will need to link to a particular era (aliases might help here). The benefits are that we don't force neither generics or extra pattern matching downstream.

@KtorZ
Copy link
Member Author

KtorZ commented Feb 18, 2025

@scarmuega There's no dynamic dispatch in pallas-traverse, I think you're confusing terms.

I am not; I didn't say that pallas-traverse had dynamic dispatch; I said that it's how one would typically deal with pallas-traverse when using it. But, whatever 😅...

@scarmuega: So, we have these options:

  • Create a new api surface based on traits. This will force all downstream implementations to use generics functions.

As said earlier, I think this makes sense because they would serve different purposes.

  • Adjust traverse to avoid losing resolution.

It is unfortunately not so easy I think. It's usually painful because of Byron, but also other eras; and I don't think we're willing to drop previous eras that get in the way since this is the very purpose of pallas-traverse. To give another example, in Conway, many structures have been made non-empty (e.g. inputs, withdrawals, ...), while pallas-traverse will default back to a vector because it needs an identity element for default cases.

And don't get me wrong, I am not saying that the pallas-traverse API is wrong. It is perfectly fine in the context of exploring the chain and processing data from it. But it isn't from a perspective of validation, carrying guarantees obtained from deserialising.

  • Promote downcasting into era-specific primitives

I would argue that this isn't mutually exclusive with the first and second options, and could be done where we see fit (and I assume you also mean it this way 👍).

@scarmuega
Copy link
Member

As said earlier, I think this makes sense because they would serve different purposes.

Sorry, I don't see how this answer addresses my previous concerns, but I'm ok if you want to try prototyping this approach within Amaru and then move it upstream when we have a tangible example.

Moving forward on this topic:

  • we'll be simplifying the pallas-primitives structures to make the downcasting approach more ergonomic
  • we'll keep pallas-traverse with its current focus but refactor some parts to avoid unnecessary overhead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

4 participants