-
Notifications
You must be signed in to change notification settings - Fork 3
Conversation
up to date
Merge from parent repo
Updated the pom to use the correct version of ANTLR. Updated ElasticSearchRegistrySearchRequestBuilder to make log output clearer for isolating current problems. Reworked Antlr4SearchListener to get the wildcard request right. Gave it state and a stack which it processes more or less uniformly. Groups are the same as the parent such that if there are no () there are no sub boolean query groups. Then added the groups which add the current stack level onto the group and then shift up one when the groups ends. The and/or expressions similarly handle the conjunction in use the same way. However their stack levels need be the same. This allows (x) to be processed the same as (x and y and (w or u or v)) without and special checking or branching. Antlr4SearchListenerTest was added to verify that the stack works correctly with respect to tranforming the grammar to a set of QueryBuilder(s). The only one being tested currently is the wildcard but many more should be added.
I have to start the testing but the wildcard testing works and I will then work through the rest to see if it comes together correctly. I will also add the fail tests for the acceptance criteria back on the original ticket. However, this will allow you to see the changes in the code. |
Behavior:
Questions:
|
Fix erroneous stack handling in Antlr4SearchListener. Fix up the second test to show that the group does work.
@al-niessner How does that work with " around the string value with wildcard ? That might be unrelated but the grammar was able to parse it before. There was no complaint on the API side although you know that does not mean the request was actually understood, but I guess the grammar parsed it since there is no error in the previous version of the code. |
Tested the bulk of the grammar but need to do nested groups and find a way to verify the and/or statements are correctly stacked.
Actually, the previous code has same problem but it fails silently in the sense that the parser does not throw an exception but does not alter the boolean query object either. Unless the actual content of the search is verified, the request likely succeeded from an http perspective but might have all the wrong content. |
Spaces are required after ( and before ) as defined in the 'group' parser rule. If we want to get rid of them, then going to have to do a big rework of the grammar. I think we can change queryTerm to be: "queryTerm: andStatement | comparison | group | orStatement ;" The parser rule 'expersion' can be removed and the parser rule 'group' would become: "group: GROUP ;" requiring the new lexer rule: "GROUP: '(' .* ')' ;"' While this is probably not perfect and will require tweaking to make it actually work, it gives an idea on the scope. However, if the user will never be typing these items and an interface will generate it, then these changes are going to cost a lot for no gain. |
That would make sense to me to keep the space around terms so that the parser can separate them without us having to define a specific lexer. By any chance, did you also try to have explicit whitespaces in the parser rule, like: group : WS* NOT? WS* LPAREN WS* expression WS* RPAREN WS*; with: The WS expression was not used in the parser initially which might be wrong. I found that on https://tomassetti.me/antlr-mega-tutorial/#chapter23 The example is: |
Completed the testing with nested conditionals and a not group. Also added negative checks that should result in bad parsing. They are handled by forcing ANTLR to bail when there is a problem rather than doing the best it can.
Completed the unit testing only in the sense that it seems to cover the big bits like wildcarding, escaping the wildcarding, grouping, nested grouping, nots, and all operators. Here is the output:
|
Added catch block to change any parsing error into an HTTP 422 (unprocessable entity). Searching on the web indicated that this error is the most appropriate in that that the syntax is sufficient to pass all URL tests and get routed to the proper processing unit (the lexer) but it cannot understand it semantically (from the standpoint of the URL). Changed the testing to JUnit test.
Please add anyone else that should review this. To show that the acceptance criteria are met, developed and included verify/issue_54.py. The standard test database contains this family of lidvids:
The simple script does two checks:
It then checks that the correct number of items is returned and that they contain the wildcards being looked for. The output from the script is:
Hence, it should be clear that the acceptance criteria has been met. |
src/main/java/gov/nasa/pds/api/engineering/controllers/MyProductsApiBareController.java
Outdated
Show resolved
Hide resolved
@tloubrieu-jpl where are we at with testing this? should we get Eugene to take a look if you don't have the time? |
@jordanpadams I made some comments to @al-niessner but did not get feedback yet. I will email him. I have my test suite in postman that I would like to test against the changes. Eugene could also test but I would like to keep an eye on that. |
Fixing "a eq b" required updating the api-search-query-lexer that had some fallout here. Maining that ctx.FIELD() now requires a parameter for which of them is being requested. The quoting option of a eq "b" should not carry quotes into search request.
Should be fixed with a lexer/parser update. Your requests should now work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @al-niessner ,
The request I was trying before (mercury target) seems to work. Now I am trying a request on a different property.
It shows as follow in one of the record in my elastissearch database:
... "properties": { "ops:Data_File_Info.ops:file_size": "4395", ...
Si I am trying:
curl --location --request GET 'http://localhost:8080/collections?q=ops:Data_File_Info.ops:file_size%20eq%204395'
or
curl --location --request GET 'http://localhost:8080/collections?q=ops:Data_File_Info.ops:file_size%20eq%20%224395%22'
The status is 200 but I don't get a result. I am expecting at least the one I found in my database.
request elasticSearch :SearchRequest{searchType=QUERY_THEN_FETCH, indices=[registry], indicesOptions=IndicesOptions[ignore_unavailable=false, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, expand_wildcards_hidden=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false, ignore_throttled=true], types=[], routing='null', preference='null', requestCache=null, scroll=null, maxConcurrentShardRequests=0, batchedReduceSize=512, preFilterShardSize=null, allowPartialSearchResults=null, localClusterAlias=null, getOrCreateAbsoluteStartMillis=-1, ccsMinimizeRoundtrips=true, source={"from":0,"size":100,"timeout":"60s","query":{"bool":{"must":[{"match":{"ops:Data_File_Info.ops:file_size":{"query":"4395","operator":"OR","prefix_length":0,"max_expansions":50,"fuzzy_transpositions":true,"lenient":false,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"boost":1.0}}},{"term":{"product_class":{"value":"Product_Collection","boost":1.0}}}],"adjust_pure_negative":true,"boost":1.0}}}}
The error comes from the fact that the API properties must be transformed by replacing the '.' by a '/' in the elasticsearch properties.
Thanks,
Thomas
What dot to what slash? There are lots of dots and I cannot tell what you may or may not be trying to show as now a slash. Do we always turn dots to slash or is this a particular identifiable case? In other words if a user ever types a dot, like foo.bar, how does one know it should or should not be foo./bar? |
When I look my instance of test data from harvest I find this: 'ops:Data_File_Info/ops:file_size'. Are you saying that we want to turn / to . to become 'ops:Data_File_Info.ops:file_size'? Or are you saying the 'ops:Data_File_Info.ops:file_size' needs to become 'ops:Data_File_Info/ops:file_size' in the search? What is the rule for doing the substitution? Always when in a label? There are labels that have dots in the name so how does one know to use . over / in an algorithmic sense? Why is the user not typing what they are really looking for? Are we (PDS) doing an irreversible / to . conversion for display and you are trying to make it easy? Since dots are used in other labels it makes / to . conversion irreversible as dots are boson not fermion. |
If I type it right it works so I am really confused:
|
Hi @al-niessner , That is the second case: For the conversion, you should use the methods available in /pds-api-service/src/main/java/gov/nasa/pds/api/engineering/elasticsearch/ElasticSearchUtil.java, line 23, jsonPropertyToElasticProperty Although the conversion is super-simple, this is safest to use this method in case, we want to change the syntax of properties again (this has been discussed). |
One question, which label with . are you refering to in elasticSearch ? The conversion should be reversible, let me know where you think it is an issue. Thanks. Regarding the request you are testeing, in my case they returned status 200 which is fine but the result is empty whereas I should have one result. |
For all FIELDs? All values too? What part of the Search.g4 comparison should be converted, as in left side of operator, all FIELDs independent which side of the operator it appears on, anything the user types, etc? When, as in always, just when it is a property but how is that determined, etc? |
@al-niessner Let me know if that is still unclear. Thanks, |
If I understood all of this correctly, the conversion is now being done. |
Hi @al-niessner , do you have a working example of request with "not" operator ? I tried curl --location --request GET 'http://localhost:8080/products?q=not(%20ops:Data_File_Info.ops:file_size%20le%201050%20)%20' But that did not work, Thanks, Thomas |
Yes, there is a test for not. 'not( a eq b )' does not work. It has to be 'not ( a eq b )'. It is that whole white space parenthesis thing. Use curl --location --request GET 'http://localhost:8080/products?q=not%20(%20ops:Data_File_Info.ops:file_size%20le%201050%20)%20' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@al-niessner thanks. The new antlr4 listener looks good and more importantly it works. I updated my postman test suite.
I appreciate the work you've done on that.
I still have one question on the management of the right term of the query (in my code comments), let me know what you think about that.
Thanks
src/main/java/gov/nasa/pds/api/engineering/elasticsearch/Antlr4SearchListener.java
Show resolved
Hide resolved
src/main/java/gov/nasa/pds/api/engineering/controllers/MyProductsApiBareController.java
Show resolved
Hide resolved
src/main/java/gov/nasa/pds/api/engineering/elasticsearch/Antlr4SearchListener.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your latest update does not work, you can remove it. It is not something we need now. Thanks
src/main/java/gov/nasa/pds/api/engineering/controllers/MyProductsApiBareController.java
Show resolved
Hide resolved
Yeah, well, what I thought was fix is not. Ah well fix it on another ticket. |
@tloubrieu-jpl comment added. |
Closes pds-api-54