As a user, I want to exclude files from harvest run based upon directory path #218

jordanpadams · 2024-11-14T18:53:55Z

Checked for duplicates

No - I haven't checked

🧑‍🔬 User Persona(s)

Node Operator

💪 Motivation

...so that I can skip directories I do not want harvest to load

📖 Additional Details

From user:

We have non-archive files in our service directories to support serving the data. Things like DOI landing pages, for example, and pre-made bulk download files. The the loader is going to assume that everything in a directory is either a PDS4 label or something pointed to by a PDS4 label, it's going to choke. We could make an exclusion list of directory names, file names, and file extensions to ignore, if that would help.

Acceptance Criteria

Given a bundle_root/ directory with sub-directories root/subdir1 and root/subdir2, all containing PDS4 XML products
When I perform harvest run with dataPath = bundle_root/ and excludePath = root/subdir2
Then I expect all the data from bundle_root/ and root/subdir1 to be loaded in to the Registry, but NOT data from root/subdir2

⚙️ Engineering Details

No response

🎉 I&T

No response

The text was updated successfully, but these errors were encountered:

jordanpadams added requirement needs:triage labels Nov 14, 2024

jordanpadams self-assigned this Nov 14, 2024

jordanpadams added this to EN Portfolio Backlog Nov 14, 2024

github-project-automation bot moved this to ToDo in EN Portfolio Backlog Nov 14, 2024

jordanpadams added p.must-have B15.1 and removed needs:triage labels Nov 14, 2024

jordanpadams added this to B15.1 Nov 14, 2024

github-project-automation bot moved this to Sprint Backlog in B15.1 Nov 14, 2024

jordanpadams added the needs:triage label Nov 14, 2024

jordanpadams removed their assignment Nov 14, 2024

jordanpadams added needs:scheduling and removed needs:triage labels Nov 14, 2024

jordanpadams mentioned this issue Nov 14, 2024

The productFilter/excludeClass option in the harvest job configuration does not work #214

Closed

jordanpadams assigned al-niessner Nov 14, 2024

jordanpadams added p.should-have and removed p.must-have labels Nov 14, 2024

jordanpadams unassigned al-niessner Nov 14, 2024

jordanpadams added p.could-have icebox and removed p.should-have B15.1 needs:scheduling labels Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

As a user, I want to exclude files from harvest run based upon directory path #218

As a user, I want to exclude files from harvest run based upon directory path #218

jordanpadams commented Nov 14, 2024 •

edited

Loading

As a user, I want to exclude files from harvest run based upon directory path #218

As a user, I want to exclude files from harvest run based upon directory path #218

Comments

jordanpadams commented Nov 14, 2024 • edited Loading

Checked for duplicates

🧑‍🔬 User Persona(s)

💪 Motivation

📖 Additional Details

Acceptance Criteria

⚙️ Engineering Details

🎉 I&T

jordanpadams commented Nov 14, 2024 •

edited

Loading