Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More intuitive CLI commands #19

Open
lsg551 opened this issue Apr 28, 2024 · 3 comments
Open

More intuitive CLI commands #19

lsg551 opened this issue Apr 28, 2024 · 3 comments
Assignees
Labels
breaking change enhancement Enhancement or improvement to existing features

Comments

@lsg551
Copy link
Owner

lsg551 commented Apr 28, 2024

Description

See #3 for some details on what Matricula hosts and how things are organized as well as terminology.

The following command scrapes all parishes available to Matricula (depending on the optional search/filter parameters):

$ matricula-online-scraper fetch location -e csv

This returns a list with > 8000 entries. Here's the head of the output:

country  ,region             ,name          ,url
Slovenia ,Nadškofija Maribor ,001 Apače     ,https://data.matricula-online.eu/en/slovenia/maribor/apace/
Slovenia ,Nadškofija Maribor ,002 Artiče    ,https://data.matricula-online.eu/en/slovenia/maribor/artice/
Slovenia ,Nadškofija Maribor ,004 Bele Vode ,https://data.matricula-online.eu/en/slovenia/maribor/bele-vode/
Slovenia ,Nadškofija Maribor ,005 Beltinci  ,https://data.matricula-online.eu/en/slovenia/maribor/beltinci/
Slovenia ,Nadškofija Maribor ,006 Bizeljsko ,https://data.matricula-online.eu/en/slovenia/maribor/bizeljsko/

Taking the output of the first command, i.e. the urls, we can pipe it to the second one. This following command then scrapes all available sources of a parish. For 001 Apače:

$ matricula-online-scraper fetch parish -e csv --url https://data.matricula-online.eu/en/slovenia/maribor/apace/

This returns a list with all available digitized sources of a parish. Here's the head of the output:

name                     ,url                                                               ,accession_number ,date      ,register_type            ,date_range_start ,date_range_end
Krstna knjiga / Taufbuch ,https://data.matricula-online.eu/en/slovenia/maribor/apace/00001/ ,           00001 ,1673-1689 ,Krstna knjiga / Taufbuch ,"Jan. 1, 1673"   ,"Dec. 31, 1689"
Krstna knjiga / Taufbuch ,https://data.matricula-online.eu/en/slovenia/maribor/apace/00002/ ,           00002 ,1728-1742 ,Krstna knjiga / Taufbuch ,"Jan. 1, 1728"   ,"Dec. 31, 1742"
Krstna knjiga / Taufbuch ,https://data.matricula-online.eu/en/slovenia/maribor/apace/00003/ ,           00003 ,1742-1760 ,Krstna knjiga / Taufbuch ,"Jan. 1, 1742"   ,"Dec. 31, 1760"
Krstna knjiga / Taufbuch ,https://data.matricula-online.eu/en/slovenia/maribor/apace/00004/ ,           00004 ,1760-1804 ,Krstna knjiga / Taufbuch ,"Jan. 1, 1760"   ,"Dec. 31, 1804"
Krstna knjiga / Taufbuch ,https://data.matricula-online.eu/en/slovenia/maribor/apace/00005/ ,           00005 ,1804-1820 ,Krstna knjiga / Taufbuch ,"Jan. 1, 1804"   ,"Dec. 31, 1820"

I advocate for changing the names of the subcommands to match them better to the entities of Matricula (= more intuitive):

  1. fetch location becomes list parishes which can be used like list parishes --all or list parishes --filter-place "name"
  2. fetch parish becomes list sources which can be used like list sources --parish … --parish …
  3. a new command for fetching the sources of a parish (feat: support scraping sources #3) will be get source which can be used like get source --url … --url …

Affected Versions

All including the most recent one v0.3.0

This proposes a breaking change!

@lsg551
Copy link
Owner Author

lsg551 commented Apr 28, 2024

  • Also rename filters. Add a prefix to filtering option --place -> --filter-place
  • Rename option --silent to --show-scrapy-log something
  • Rename option --log-level to --scrapy-log-level something

@lsg551 lsg551 added enhancement Enhancement or improvement to existing features test All kinds of software tests refactoring breaking change and removed test All kinds of software tests refactoring labels Apr 29, 2024
@lsg551 lsg551 changed the title refactor(cli): more intuitive commands More intuitive CLI commands Apr 29, 2024
@lsg551
Copy link
Owner Author

lsg551 commented Apr 30, 2024

  • When fetching all available locations/parishes > 8000, prompt the user and warn, recommend to exclude coordinates etc.

@lsg551
Copy link
Owner Author

lsg551 commented May 5, 2024

  • Make CSV default.
  • Put latitude before longitude

@lsg551 lsg551 self-assigned this Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change enhancement Enhancement or improvement to existing features
Projects
None yet
Development

No branches or pull requests

1 participant