Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

registry-api needs to use the new ancestry fields #353

Closed
tloubrieu-jpl opened this issue May 23, 2023 · 16 comments
Closed

registry-api needs to use the new ancestry fields #353

tloubrieu-jpl opened this issue May 23, 2023 · 16 comments
Assignees

Comments

@tloubrieu-jpl
Copy link
Member

💡 Description

@alexdunnjpl
Copy link
Contributor

Specifically, members/member-of et al. need to be updated to leverage the new metadata fields.

Groom ticket later

@tloubrieu-jpl tloubrieu-jpl changed the title registry-api needs to use the new anscestry fields registry-api needs to use the new ancestry fields May 23, 2023
@alexdunnjpl alexdunnjpl transferred this issue from NASA-PDS/registry Jun 21, 2023
@alexdunnjpl alexdunnjpl added the s.high High severity label Jun 21, 2023
@alexdunnjpl
Copy link
Contributor

alexdunnjpl commented Jun 21, 2023

s.high via proxy #352

@jordanpadams @tloubrieu-jpl actually, are the membership-related API endpoints even desirable in a post-ancestry-metadata world?

Seems like (for example) products/{id}/member-of is made redundant by products/q="parent_bundle_lidvid eq {id}"

@jordanpadams
Copy link
Member

@alexdunnjpl it is redundant, but the /products/{id}/member-of provides a more "user-friendly" API access points to the query. The entire /classes endpoint is redundant, but is more a shortcut for users, especially when we start talking API client libraries.

@alexdunnjpl
Copy link
Contributor

@jordanpadams part of the reason I ask is because I'm not sure how user-friendly the member-of and member-of-of endpoints actually are.

If we're going to keep them, I'd tentatively suggest replacing them with parent-collection(s) and parent-bundle(s), respectively. I'd question whether a user ever wants to ask "what is the aggregate product one step up in the hierarchy from this product" vs "what collection(s) contain this product"

@jordanpadams
Copy link
Member

@alexdunnjpl it's definitely feasible your recommendation is more user-friendly, but we went back and forth about this a lot, discussed with our data modeling team, and came up with these terms to try to generalize the relationship vs. using "collections"/"bundles" which are very PDS-specific. In the end, I think we just march forward with what we have, for now, and then wait for users to tell us it's wrong. I will surely expect an "I told you so" when the community comes clamoring for those endpoints.

@alexdunnjpl alexdunnjpl added s.medium Medium level severity and removed s.high High severity labels Jun 21, 2023
@alexdunnjpl
Copy link
Contributor

@alexdunnjpl alexdunnjpl self-assigned this Jun 21, 2023
@tloubrieu-jpl
Copy link
Member Author

When done, tell @dscholes who needs that feature.

@alexdunnjpl
Copy link
Contributor

@tloubrieu-jpl what is implementation of this ticket blocking for their needs? The functionality exists currently, and while this ticket should improve performance, the test calls I'm making against prod seem performant.

@alexdunnjpl
Copy link
Contributor

alexdunnjpl commented Nov 22, 2023

RefLogicAny

  • Refactor everything but GroupConstraint construction in member()/memberOf() from individual subclasses into RefLogicAny since it's just duplicated everywhere

RefLogicBundle

  • member
  • member (two-step)

RefLogicCollection

  • parents
  • childrenAll
  • childrenLatest

RefLogicNonAggregateProduct (may need Document/Observational as well - need to check)

  • member-of
  • member-of (two-step)

The following may be broken out into additional ticket(s) to avoid completion/merge delays

Post-feature cleanup

  • implement use of PdsProductIdentifier in all referencing logic classes, as well as Pagination and PaginationLidvidBuilder
  • implement search-after pagination in PaginationLidvidBuilder (see below - feature will be broken without this)
  • nuke Pagination/PaginationLidvidBuilder
  • nuke children/grandchildren/parents/grandparents from all files
  • finish Fundamental querying logic construct appears incorrect #392 fix
  • Rework RefLogic inheritance - Should be RefLogicBase into [RefLogicBundle, RefLogicCollection, RefLogicNonAggregate, RefLogicAny], with RefLogicNonAggregate split further into it's more-specific components. RefLogicAny is not a valid base class.

@alexdunnjpl
Copy link
Contributor

After initial implementation, it will be necessary to design an approach for how the API-internal pagination class (PaginationLidvidBuilder) will implement search-after pagination, as index-based pagination is now impossible to infer from the request parameters, and PaginationLidvidBuilder cannot apply arbitrary search-after behaviour as it only has visibility over LIDVIDs, not document content.

The simplest and possibly only approach would be to enforce sorting/search-after on lidvid, and communicating this requirement to users in documentation and via explanatory HTTP400 responses in the event that unexpected values are provided in the requests.

@tloubrieu-jpl
Copy link
Member Author

Thanks @alexdunnjpl , I like the idea of an explanatory HTTP400: when user use search-after parameter but sort parameter is not used we should tell the user, that it is mandatory with search-after and we can suggest lidvid or timestamp (modification_date). But I don't think we made the detailed specification for the sort parameter. Should I create a ticket for that ?

@alexdunnjpl
Copy link
Contributor

@tloubrieu-jpl modification to the sort qparam doco is included as part of the search-after PR, so no need unless you're referring to something else

@alexdunnjpl
Copy link
Contributor

Current sticky issue:

Search-after pagination is hard to work into the internal pager PaginationLidvidBuilder, which introduces additional complexity besides, and the point of all these abstractions is to re-use functionality (in this case, handling of qparams), so given that membership is expressible in both directions via a terms query (actually one per step of relation), it's desirable to have the ReferencingLogic construct a GroupConstraint yielding the output hits, then treat the request like any other (ex. a user query or request for a particular product type).

This is complicated by the fact that any route under /products/{identifier} causes a single product identifier to be written to the immutable URIParameters context, which then enforces an additional must term clause, resulting in no hits (because, obviously, the members do not have a lidvid matching the parent lidvid).

It might be possible to break the application of this clause by rebuilding the URIParameters in a not-too-janky fashion with URIParametersBuilder, but that's still pretty nasty.

@al-niessner do you have any suggestions for a better approach to implementing the behaviour in the first paragraph?

@al-niessner
Copy link
Contributor

@alexdunnjpl

Do you still need the pagination builder? My understanding is that we no longer need to do double lookups such that the hits returned by opensearch are the hits needed. Hence, pagination builder is no longer necessary or if removing it is a pain trivial in that it simply echoes opensearch.

URIParameters implements one or more of the various interfaces/facades so that if you need a new one like lidvid just use a tiny implementation of the right interface(s) and pass it around. Nothing outside of controller should know about URIParameters. Kind of the point of the interfaces. Used it heavily in double look ups to say this is the lidvid given by user but this is the lidvid that needs worked on now. The interfaces should allow one to be the user at any stage and get what is needed as though asked by the user but without URIParameters.

In other words, I am not understanding the sticky bit.

@alexdunnjpl
Copy link
Contributor

alexdunnjpl commented Nov 29, 2023

@al-niessner The sticky bit is that when the request is being processed (i.e. when the ReferencingLogic is doing its thing), the UserContext, which informs necessary stuff like "how many results to get", "what fields to get", "what search-after values to use" is a URIParameters instance, with a value of URIParameters.identifier set equal to the identifier for which you're getting references.

That value is then applied as a must term query clause, preventing any hits from being matched.

Since that instance is immutable, it's not possible to just null out the identifier value during execution of the ReferencingLogic code.

The workaround of rebuilding a new UserContext instance without the problematic value is the only path forward I can see which allows removal of PaginationLidvidBuilder, but I figured you might have a better view of the forest (of abstractions) as it were, and be able to suggest something cleaner.

@al-niessner
Copy link
Contributor

@alexdunnjpl

Yes. It is because of the double lookup. Are you getting rid of the double lookup?

If you are getting rid of the double lookup, then all of the implementations of ReferencingLogic should be rewritten anyway to where that lidvid is not required. For instance, for a bundle ID wanting its collections this is being done:

SearchRequest request =
new SearchRequestFactory(RequestConstructionContextFactory.given(idContext.getLidVid()),
ctlContext.getConnection())
.build(
RequestBuildContextFactory.given(false, fields,
ReferencingLogicTransmuter.Bundle.impl().constraints()),
ctlContext.getConnection().getRegistryIndex());

but instead of doing that search, you will want call:

public static RequestConstructionContext given(String key, String value, boolean asTerm) {

You may need the id at that point depending on what is in the new field. The pagination it would create would then just be an echo of opensearch result if pagination is set correctly through whatever interface.

Again, I do not understand the sticky bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🏁 Done
Development

No branches or pull requests

4 participants