Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[da-vinci][common] DVC consumer for materialized view (batch only) #1466

Merged
merged 4 commits into from
Jan 31, 2025

Conversation

xunyin8
Copy link
Contributor

@xunyin8 xunyin8 commented Jan 22, 2025

[da-vinci][common] DVC consumer for materialized view (batch only)

Limiting the change to support batch only materialized view to keep the PR small. Hybrid DVC consumer support and related features such as heartbeat will be added in a separate PR

  1. New DaVinciClientFactory APIs for creating DVC for a given view.

  2. Defined a new "storeName" rule for views to be used for metrics reporting and DVC client. See VeniceView.getViewStoreName for details.

  3. Introduced NativeMetadataRepositoryViewAdapter and HelixReadOnlyStoreViewConfigRepositoryAdapter to provide read-only interface to access various store metadata for both regular Venice stores and store views with the VeniceView.getViewStoreName.

There is some issue with chunking support on the read path. When chunking is enabled the view topic keys are doubly wrapped by serializeNonChunkedKey. This is because during NR pass-through mode the view writer is essentially trying to chunk the chunk. The tactical fix now is to unwrap the key with chunked suffix bytes appended and pass it to the view writer to be wrapped again and sent to the correct partition. This only works with non-large messages. I.e. chunking is enabled but nothing is actually getting chunked. Large messages will require a proper fix.

How was this PR tested?

Unit tests and integration test

Does this PR introduce any user-facing changes?

  • No. You can skip the rest of this section.
  • Yes. Make sure to explain your proposed changes and call out the behavior change.

@xunyin8 xunyin8 force-pushed the MaterializedViewDVCConsumer branch from 42b48dd to e546936 Compare January 23, 2025 00:07
@gaojieliu
Copy link
Contributor

Hi @xunyin8
I have a big question about how the adapting layer looks like and I read your PR and I think you are trying to plugin the adapting layer in a more spread fashion where sometimes we use the original store name and some other times, we use the combined name by concatenating store name and view name.
There are two points I can think of:

  1. The code would become less maintainable as we need to make a decision at different places to use either the original store name or the concatenated name.
  2. In theory, this solution is not 100% safe as some store name can be identical to the concatenated name, and if the customer subscribes both, it will encounter weird issues.

Here is one idea in my mind:

  1. Make sure the concatenated name unique in Venice platform, and it means we need to guarantee the uniqueness among store names and concatenated view names.
  2. In DaVinci world, we can treat the concatenated view name as the store name, and we can build a new adapting layer in the common repositories, such as StoreRepository, SchemaRepository and they need to handle both view and store request. For example, for the passed param, it can detect whether it is a view or a store and if it is a view, it will find out the corresponding store name and take the corresponding actions.
    With this way, it will be very safe to avoid conflict and we only need to change very few places to make it work everywhere and the idea is similar to HelixReadOnlySchemaRepositoryAdapter, which can handle both system stores and user stores in a transparent way.
  3. Less error prone as the adapting logics are only in several important repositories, nowhere else (in theory).

If we build such capability, the change in DVRT or CDC will be fairly straightforward.

WDYT?

@xunyin8
Copy link
Contributor Author

xunyin8 commented Jan 26, 2025

Member

Hi @xunyin8 I have a big question about how the adapting layer looks like and I read your PR and I think you are trying to plugin the adapting layer in a more spread fashion where sometimes we use the original store name and some other times, we use the combined name by concatenating store name and view name. There are two points I can think of:

  1. The code would become less maintainable as we need to make a decision at different places to use either the original store name or the concatenated name.
  2. In theory, this solution is not 100% safe as some store name can be identical to the concatenated name, and if the customer subscribes both, it will encounter weird issues.

Here is one idea in my mind:

  1. Make sure the concatenated name unique in Venice platform, and it means we need to guarantee the uniqueness among store names and concatenated view names.
  2. In DaVinci world, we can treat the concatenated view name as the store name, and we can build a new adapting layer in the common repositories, such as StoreRepository, SchemaRepository and they need to handle both view and store request. For example, for the passed param, it can detect whether it is a view or a store and if it is a view, it will find out the corresponding store name and take the corresponding actions.
    With this way, it will be very safe to avoid conflict and we only need to change very few places to make it work everywhere and the idea is similar to HelixReadOnlySchemaRepositoryAdapter, which can handle both system stores and user stores in a transparent way.
  3. Less error prone as the adapting logics are only in several important repositories, nowhere else (in theory).

If we build such capability, the change in DVRT or CDC will be fairly straightforward.

WDYT?

This makes sense, I was hoping to be able to keep the adapting layer only in StoreBackend and ReadOnlyVersion. However, as you mentioned and probably noticed while reviewing the PR other components of DVC and ingestion path also requires differentiating a store view from a store hence the adapter keeps leaking. I think we can enforce (1) fairly easily during view/store creation and (2) could introduce more initial change but let me give it a try and it may be able to solve some of the known issue (e.g. chunking not working) cleanly since the suggested solution is more bullet proof.

@xunyin8 xunyin8 force-pushed the MaterializedViewDVCConsumer branch 2 times, most recently from 6460bcc to a7ff6e2 Compare January 27, 2025 08:17
@xunyin8 xunyin8 requested a review from gaojieliu January 27, 2025 08:20
Copy link
Contributor

@gaojieliu gaojieliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some more comments about the adapting layer?

BTW, can you remind me of the place where we can guarantee the uniqueness of view names?
One way we can do is to make sure the store name doesn't contain the special separator being used to concatenate the store name and view name.

@xunyin8
Copy link
Contributor Author

xunyin8 commented Jan 28, 2025

Left some more comments about the adapting layer?

BTW, can you remind me of the place where we can guarantee the uniqueness of view names? One way we can do is to make sure the store name doesn't contain the special separator being used to concatenate the store name and view name.

Currently we don't check uniqueness of view names across different stores and I think that's fine. The way we make sure view store names can be processed is with added check to ensure store names and view names do not contain view separator:
See VeniceHelixAdmin#checkStoreNameConflict

    if (VeniceView.isViewStore(storeName)) {
      throw new VeniceException(
          "Store name: " + storeName + " clashes with the internal usage, please remove the prefix: "
              + VeniceView.VIEW_STORE_PREFIX);
    }

and validateAndDecorateStoreViewConfig

      if (viewName.contains(VIEW_NAME_SEPARATOR)) {
        throw new VeniceException(
            String.format("View name cannot contain view name separator: %s", VIEW_NAME_SEPARATOR));
      }

@xunyin8 xunyin8 force-pushed the MaterializedViewDVCConsumer branch 2 times, most recently from 0968db3 to c8ff803 Compare January 30, 2025 08:32
Copy link
Contributor

@gaojieliu gaojieliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code change looks good overall and left some more comments.

Limiting the change to support batch only materialized view to keep the PR small.
Hybrid DVC consumer support and related features such as heartbeat  will be added in a separate PR

1. New DaVinciClientFactory APIs for creating DVC for a given view.

2. Defined a new "storeName" rule for views to be used for metrics reporting and shared DVC client.
See VeniceView.getStoreAndViewName for details. Trying to reuse much of the existing DVC code and
structure while providing the support for users to subscribe to the original store and different views
of the same store.

3. Introduced ReadOnlyMaterializedViewVersion and StoreViewBackend abstractions to minimize the
code change needed for DVC to understand and work with views. ReadOnlyMaterializedViewVersion
will provide view sepcific properties for a ReadOnlyVersion and StoreViewBackend will provide
view specific properties for a StoreBackend.
Taking the suggestion of adding adapter layer in the common repositories  instead of
inserting it in StoreBackend and leaking in DVC client related classes.
@xunyin8 xunyin8 force-pushed the MaterializedViewDVCConsumer branch from c8ff803 to 40c5489 Compare January 31, 2025 19:33
Copy link
Contributor

@gaojieliu gaojieliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@xunyin8 xunyin8 merged commit df28d1e into linkedin:main Jan 31, 2025
58 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants