Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catching up with the new version of the API #42

Open
wants to merge 42 commits into
base: master
Choose a base branch
from

Conversation

mustberuss
Copy link
Collaborator

This is the remaining code to catch up with the current state of the API:

  • paging by only primary and sometimes secondary key and applying the user's sort in R; deprecating the page and per_page search_pv() parameters; padding the patent_id's after value when patent_id is used as the sort field; exporting the padding function and adding after as a search_pv parameter for custom paging; removal of paging limits
  • accepting group names in the field parameter (API's new shorthand); get_fields() returning group names as fields rather than fully qualifying (group.member notation) each member of the group
  • unnesting and casting changes now that the endpoint names (singular, some nested) and returned entities (mostly plural) aren't interchangeable; dealing with the two endpoints that return rel_app_texts (from patent/rel_app_text and publication/rel_app_text)
  • switching to httr2, httr says it's superseded: only changes necessary to keep it on CRAN will be made.
  • there's an additional inclulde_pk boolean on get_fields() and encoded_url boolean on retrieve_linked_data()
  • there is a new qry_fun in_range() that isn't strictly needed, it's not fronting anything in the API but would generate the tedious _and/_gte/_lte needed ex: qry_funs$in_range(patent_date = c("1976-01-01", "1983-02-28"))
  • field name/endpoint name changes in the examples
  • bug fix on posts when fields or sort wasn't specified
  • bug fix when there was more than a primary sort
  • bug fix to avoid a coercion warning

@mustberuss
Copy link
Collaborator Author

Are you OK with us removing the functionality of adding in additional fields to the query for situations where the user doesn't include that field in their field list?

I think it's worth keeping. The real reason is because we may have to add a primary and in some cases a secondary sort to the fields parameter for paging to work. I like the idea of removing the fields we add - isolating users from having to understand this. I just pushed the secondary sort code. There is also info about it here

@crew102
Copy link
Collaborator

crew102 commented Jan 11, 2025

Hey Russ, in looking over the sorting stuff I noticed a few things. Hoping you can help.

Re: the build, don't worry about the failing status for now. I'm at my wits end re: why it keeps thinking that you're the actor who is triggering the build when I trigger it. I'll take a look into it later.

library(patentsview)

query <- qry_funs$gt(patent_year = 2010)

# Can't sort by patent title (patent abstract is also problematic)
search_pv(
  query,
  fields = c("patent_id", "patent_title"),
  sort = c("patent_title" = "asc")
)
#> Error in `httr2::req_perform()` at patentsview/R/search-pv.R:87:3:
#> ! HTTP 400 Bad Request.
#> • Internal Server Error

# Can't sort by nested field
search_pv(
  query,
  fields = c("patent_id", "patent_title", "inventors.inventor_country"),
  sort = c("inventors.inventor_country" = "asc")
)
#> Error in `httr2::req_perform()` at patentsview/R/search-pv.R:87:3:
#> ! HTTP 400 Bad Request.
#> • Invalid field: inventors.inventor_country

Created on 2025-01-10 with reprex v2.1.1

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.2 (2024-10-31)
#>  os       macOS Sonoma 14.4
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2025-01-10
#>  pandoc   3.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  cli           3.6.3   2024-06-21 [2] CRAN (R 4.4.0)
#>  curl          5.2.1   2024-03-01 [2] CRAN (R 4.4.0)
#>  digest        0.6.36  2024-06-23 [2] CRAN (R 4.4.0)
#>  evaluate      0.24.0  2024-06-10 [2] CRAN (R 4.4.0)
#>  fansi         1.0.6   2023-12-08 [2] CRAN (R 4.4.0)
#>  fastmap       1.2.0   2024-05-15 [2] CRAN (R 4.4.0)
#>  fs            1.6.4   2024-04-25 [2] CRAN (R 4.4.0)
#>  glue          1.7.0   2024-01-09 [2] CRAN (R 4.4.0)
#>  htmltools     0.5.8.1 2024-04-04 [2] CRAN (R 4.4.0)
#>  httr2         1.0.2   2024-07-16 [2] CRAN (R 4.4.0)
#>  jsonlite      1.8.8   2023-12-04 [2] CRAN (R 4.4.0)
#>  knitr         1.48    2024-07-07 [2] CRAN (R 4.4.0)
#>  lifecycle     1.0.4   2023-11-07 [2] CRAN (R 4.4.0)
#>  magrittr      2.0.3   2022-03-30 [2] CRAN (R 4.4.0)
#>  patentsview * 0.3.0   2025-01-11 [1] local
#>  pillar        1.9.0   2023-03-22 [2] CRAN (R 4.4.0)
#>  R6            2.5.1   2021-08-19 [2] CRAN (R 4.4.0)
#>  rappdirs      0.3.3   2021-01-31 [2] CRAN (R 4.4.0)
#>  reprex        2.1.1   2024-07-06 [2] CRAN (R 4.4.0)
#>  rlang         1.1.4   2024-06-04 [2] CRAN (R 4.4.0)
#>  rmarkdown     2.27    2024-05-17 [2] CRAN (R 4.4.0)
#>  rstudioapi    0.16.0  2024-03-24 [2] CRAN (R 4.4.0)
#>  sessioninfo   1.2.2   2021-12-06 [2] CRAN (R 4.4.0)
#>  utf8          1.2.4   2023-10-22 [2] CRAN (R 4.4.0)
#>  vctrs         0.6.5   2023-12-01 [2] CRAN (R 4.4.0)
#>  withr         3.0.1   2024-07-31 [2] CRAN (R 4.4.0)
#>  xfun          0.46    2024-07-18 [2] CRAN (R 4.4.0)
#>  yaml          2.3.10  2024-07-26 [2] CRAN (R 4.4.0)
#> 
#>  [1] /Users/cbaker/Library/R/arm64/4.4/library
#>  [2] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

@mustberuss
Copy link
Collaborator Author

For the build, maybe try using triggering_actor? That article mentions

You might notice that I use github.triggering_actor rather than github.actor. This is what allows us to run tests from forks if the job is re-run by someone with the correct permission

On the code I'm not sure I'm following. I opened a bug with the API team about not being able to sort on some of the fields, including patent_title. It's about 40 fields in all. I suspect it's full_text fields but the API team didn't confirm that. There was a test to check all fields but it took quite a while to run, now it just checks a handful of fields. The test should fail if all the sample sorts work. All the tests in test-api-bugs are set to fail to when bugs are fixed.

On the sorting on nested fields, I don't think that is something supported by the API. I'm not sure if it's documented anywhere but I'd imagine that's why it's saying it's an invalid field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants