Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SCC-4347 - Add scripts/update-property-by-csv script #104

Merged
merged 4 commits into from
Feb 6, 2025

Conversation

nonword
Copy link
Member

@nonword nonword commented Dec 17, 2024

Add update-property-by-csv script for updating a single property in many ES documents based on a CSV of any length. For example, this can be used to update the recordTypeId for every bib in the index if you have a CSV containing columns id, nyplSource, recordTypeId.

https://newyorkpubliclibrary.atlassian.net/browse/SCC-4347

Add update-property-by-csv script for updating a single property in many ES documents based on a CSV of any length. For example, this can be used to update the `recordTypeId` for every bib in the index if you have a CSV containing columns id, nyplSource, recordTypeId.

https://newyorkpubliclibrary.atlassian.net/browse/SCC-4347
* - errored {object[]} - Array of objects representing non-404 errors, one per document
* - missing {object[]} - Array of objects representing 404 errors, one per document
**/
const parseErroredDocuments = (bulkResponse, operations) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is operations here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added documentation. It's the array of objects we send to the ES _bulk api that alternately identify the document to update and the property to update. E.g.:

   [
        { update: { _index: 'index-name', _id: 'b1' } },
        { doc: { myProp: 'new prop value 1' } },
        { update: { _index: 'index-name', _id: 'b2' } },
        { doc: { myProp: 'new prop value 2' } },
        ...
   ]

Sending a stream of those in one call to the ES _bulk api is very fast compared with issuing POSTs to each document id

@nonword nonword merged commit 5afa05c into main Feb 6, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants