Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] dandi search command #1126

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from
Draft

Conversation

djarecka
Copy link
Member

@djarecka djarecka commented Oct 6, 2022

edited

I've added draft of the dandi search command. I've added some test, but below you will show some example of usage:

  • using query from a file:
(dandicli_dev_py39) MacBook-Pro-2:dandi dorota$ dandi search -F q1.txt 
2022-10-05 22:15:01,573 [ WARNING] A newer version (0.46.3) of dandi/dandi-cli is available. You are using 0.14.2+1335.ge021829.dirty
                                            apr
0                           behavioral approach
1                 electrophysiological approach
2                          optogenetic approach
3  microscopy approach; cell population imaging
2022-10-05 22:15:02,080 [    INFO] Logs saved in /Users/dorota/Library/Logs/dandi-cli/20221006021501Z-58106.log
(dandicli_dev_py39) MacBook-Pro-2:dandi dorota$ cat q1.txt 
SELECT DISTINCT ?apr WHERE 
{
    ?as dandi:approach / schema:name ?apr
}
  • using a simplified version of query for dandisets:
    • providing one field for --select_fields (or -s):
dandi search -t dandisets --select_fields approach

                             apr                                                  d
0            behavioral approach      http://dandiarchive.org/dandiset/000003/draft
1  electrophysiological approach      http://dandiarchive.org/dandiset/000003/draft
2            behavioral approach  http://dandiarchive.org/dandiset/000003/0.2108...
3  electrophysiological approach  http://dandiarchive.org/dandiset/000003/0.2108...
4  electrophysiological approach      http://dandiarchive.org/dandiset/000004/draft
5  electrophysiological approach  http://dandiarchive.org/dandiset/000004/0.2201...
6  electrophysiological approach  http://dandiarchive.org/dandiset/000004/0.2201...
7  electrophysiological approach      http://dandiarchive.org/dandiset/000005/draft
8           optogenetic approach      http://dandiarchive.org/dandiset/000005/draft
9  electrophysiological approach  http://dandiarchive.org/dandiset/000005/0.2201...
  • providing multiple fields for select_fields can be done in two ways:
dandi search -t dandisets --select_fields approach --select_fields species_name
dandi search -t dandisets --select_fields approach,species_name
  • adding filtering with --filter_fields (or -f):
dandi search -t dandisets --select_fields approach,species_name --filter_fields species_name Human

                             apr    snm                                                  d
0  electrophysiological approach  Human      http://dandiarchive.org/dandiset/000004/draft
1  electrophysiological approach  Human  http://dandiarchive.org/dandiset/000004/0.2201...
2  electrophysiological approach  Human  http://dandiarchive.org/dandiset/000004/0.2201...
3  electrophysiological approach  Human      http://dandiarchive.org/dandiset/000012/draft
4  electrophysiological approach  Human      http://dandiarchive.org/dandiset/000019/draft
5  electrophysiological approach  Human  http://dandiarchive.org/dandiset/000019/0.2201...
6            behavioral approach  Human      http://dandiarchive.org/dandiset/000055/draft
7  electrophysiological approach  Human      http://dandiarchive.org/dandiset/000055/draft
8  electrophysiological approach  Human  http://dandiarchive.org/dandiset/000055/0.2201...
9            behavioral approach  Human  http://dandiarchive.org/dandiset/000055/0.2201...
  • adding multiple values for filtering with --filter_fields (or -f):
dandi search -t dandisets --select_fields approach,species_name --filter_fields species_name "Human,House mouse"
  • using a simplified version of query for assets:
    • most things are similar to dandisets search, e.g.
dandi search -t assets --select_fields format -f format application/x-nwb,application/json
  • allowing for range in filters
dandi search -t assets --select_fields size -f size (7*10e9, 9*10e9)
dandi search -t assets --select_fields size -f size "(, 9*10e9)"
dandi search -t assets --select_fields size -f size "(9*10e9,)"

@satra @bendichter - let me know what don you think about these examples

@satra
Copy link
Member

satra commented Oct 10, 2022

@djarecka - you may want to consider adopting the slurm sacct command syntax for specifying fields.

also let's focus on asset based search as that's what was seen as most useful by the participants of the workshop.

@yarikoptic yarikoptic marked this pull request as draft October 10, 2022 21:06
@yarikoptic
Copy link
Member

note: converted to draft since WiP etc

@djarecka
Copy link
Member Author

@djarecka - you may want to consider adopting the slurm sacct command syntax for specifying fields.
I don't understand your comment. What is specific about the sacct command?

@satra
Copy link
Member

satra commented Oct 12, 2022

see the field syntax of the sacct command.

@djarecka
Copy link
Member Author

@satra - I did...
you just mean that check_fields should allow for multiple fields as format in saact..?

@satra
Copy link
Member

satra commented Oct 12, 2022

you just mean that check_fields should allow for multiple fields as format in saact..?

yes

… fields in select_fields, allowing for multiple values and ranges in filter fields; adding tests
@codecov
Copy link

codecov bot commented Nov 7, 2022

Codecov Report

Base: 87.91% // Head: 65.90% // Decreases project coverage by -22.01% ⚠️

Coverage data is based on head (447c805) compared to base (e021829).
Patch coverage: 20.65% of modified lines in pull request are covered.

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #1126       +/-   ##
===========================================
- Coverage   87.91%   65.90%   -22.02%     
===========================================
  Files          73       75        +2     
  Lines        8549     8845      +296     
===========================================
- Hits         7516     5829     -1687     
- Misses       1033     3016     +1983     
Flag Coverage Δ
unittests 65.90% <20.65%> (-22.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
dandi/cli/tests/test_search.py 18.18% <18.18%> (ø)
dandi/cli/cmd_search.py 20.95% <20.95%> (ø)
dandi/cli/command.py 48.75% <100.00%> (+0.64%) ⬆️
dandi/support/pyout.py 71.66% <100.00%> (-18.34%) ⬇️
dandi/support/tests/test_iterators.py 13.15% <0.00%> (-86.85%) ⬇️
dandi/support/tests/test_digests.py 15.38% <0.00%> (-84.62%) ⬇️
dandi/tests/test_pynwb_utils.py 20.00% <0.00%> (-77.78%) ⬇️
dandi/cli/tests/test_formatter.py 22.22% <0.00%> (-77.78%) ⬇️
dandi/cli/tests/test_ls.py 22.38% <0.00%> (-77.62%) ⬇️
dandi/tests/test_organize.py 18.40% <0.00%> (-76.75%) ⬇️
... and 57 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@bendichter
Copy link
Member

Looks good!

  • What if you wanted to filter a dandiset by contributors? Is that possible?
  • How would you query for age? My thinking is you would first need to convert an ISO 8601 duration to seconds and to the comparison in seconds.

@djarecka
Copy link
Member Author

@bendichter - everything is possible, it will just require adding specific queries, see example for adding age to assets fields. This part unfortunately has to be added for each field.

You have a good point that when searching for age you will have everything in ISO:

(dandicli_dev_py39) MacBook-Pro-2:dandi-cli dorota$ dandi search -t assets --select_fields age
                                                path          d_id                                              asset           age
0  sub-YutaMouse54/sub-YutaMouse54_ses-YutaMouse5...  DANDI:000003  http://dandiarchive.org/asset/036cce35-9518-45...  P174DT0H0M0S
1  sub-YutaMouse42/sub-YutaMouse42_ses-YutaMouse4...  DANDI:000003  http://dandiarchive.org/asset/06cdaf32-1e4d-4a...  P389DT0H0M0S
2  sub-YutaMouse44/sub-YutaMouse44_ses-YutaMouse4...  DANDI:000003  http://dandiarchive.org/asset/0ac87693-8d7f-4f...  P407DT0H0M0S
3  sub-YutaMouse39/sub-YutaMouse39_ses-YutaMouse3...  DANDI:000003  http://dandiarchive.org/asset/0b4f40f9-36ff-48...  P344DT0H0M0S
...

Perhaps for every field there could be an optional wrapper function that convert it to something else. Or we can have age and age_sec and the second one will call additional function to convert output of the query to second.

@bendichter
Copy link
Member

@djarecka sorry I was unclear. I meant I wanted to filter by age. How would I filter for a mouse between the ages of 20 and 30 days?

@djarecka
Copy link
Member Author

My filtering function does not work for ranges in ISO right now, I have to add it

@satra satra added the enhancement New feature or request label Jun 11, 2023
Copy link
Member

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @djarecka I have just got to look at it. I would love to see dandi search so would be great to finish it up, and also work it out in "collaboration" with all the other searches. When should we do it? see some comments spread out through codebase

"--search_type",
help="Type of the search.",
type=click.Choice(["dandisets", "assets"]),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FTR we have

download
  --download [assets,dandiset.yaml,all]
                                  Comma-separated list of elements to download
                                  [default: all]
ls
  --metadata [api|all|assets]

I wonder if we should unify somehow

@click.option(
"-f",
"--filter_fields",
help="Field name for dandisets search",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

help is the same as for -s , so what is the difference?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also we have in ls

  -F, --fields TEXT               Comma-separated list of fields to display.
                                  An empty value to trigger a list of
                                  available fields to be printed out
  -f, --format [auto|pyout|json|json_pp|json_lines|yaml]
                                  Choose the format/frontend for output. If
                                  'auto', 'pyout' will be used in case of
                                  multiple files, and 'yaml' for a single
                                  file.

I think here we should be uniform with that -- with dandi/dandi-archive#1659 in mind, we do adhere to MVC pattern here too, and clearly separate out the model from rendering and thus have similar -f|--format option which might be just an output of those "search result records". Well -- you already have --format below, so we are on the right track.
May be here we should even consider RF/reusing ls code -- we just need a list of records to render. they might come from different search engines etc.

)
@click.option(
"-t",
"--search_type",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here and everywhere below use --long-option, not --long_option as we do everywhere else (run git grep -A5 '@click.option')

"-t",
"--search_type",
help="Type of the search.",
type=click.Choice(["dandisets", "assets"]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

familiarize with #1357 and RF similarly here

else:
raise NotImplementedError

endpoint = "https://search.dandiarchive.org:5820/dandisets_new/query"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be just 1 type of search. We already have a few more (even if not exposed via API): https://llmsearch.dandiarchive.org/, https://dandiarchive.org/search etc.

I think we should generalize this dandi search [options] [query] interface into dandi search [common-options] HOW [how-specific-options-if-any] [query] or dandi search [options] [--how HOW] [query] interface, so we could expand etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants