Skip to content
This repository has been archived by the owner on Jan 27, 2025. It is now read-only.

Consistent handling of repeated fields / arrays #65

Closed
fsteeg opened this issue Nov 5, 2021 · 14 comments
Closed

Consistent handling of repeated fields / arrays #65

fsteeg opened this issue Nov 5, 2021 · 14 comments
Assignees

Comments

@fsteeg
Copy link
Member

fsteeg commented Nov 5, 2021

Explicit arrays (e.g. in JSON input) vs. explicit / repeated values (e.g. in XML input).

Field names like:

metadata.mods.identifier.value vs.
metadata.mods.identifier.*.value vs.
metadata.mods.identifier[].*.value

See:

@blackwinter
Copy link
Member

Copying previous discussion for better overview:


I have to admit, I find this behaviour highly debatable. What's the rationale? I guess I want to have full control over the output I'm building, not have Metafix decide that it knows better what I wanted?

Granted, I don't have an example at hand right now. Maybe I'm overreacting. Certainly feels iffy, though...

Originally posted by @blackwinter in #60 (comment)


Yeah, the whole topic of repeated values / arrays needs work. I've opened #65 for that.

Originally posted by @fsteeg in #60 (comment)

@fsteeg
Copy link
Member Author

fsteeg commented Nov 10, 2021

From discussion:

  • Don't add [] automatically, leave it to the user to set that
  • Add index number as entity names for arrays of objects
  • Support * as placeholder for the index number (general wildcard support in separate issue)

@fsteeg
Copy link
Member Author

fsteeg commented Nov 10, 2021

Functional review: @TobiasNx, code review: @blackwinter / @fsteeg

@TobiasNx
Copy link
Collaborator

TobiasNx commented Nov 11, 2021

There is one difficulty that pop in my head.

If we have multiple records and a field is repeatable. But in a couple of records the field exist 1-time and in other n+1 times.
The fix would have to handle this differently since it would have to operate once with a list bind and one with the direct operation on the field since the notation would be different: test.*.feld (when n+1 occurences) vs. test.feld (1 occurence)

record 1:

<record>
   <test>
      <feld>Ich bin der Inhalt</feld>
   </test>
</record>

record2:

<record>
   <test>
      <feld>Ich bin der Inhalt</feld>
      <feld>Noch mehr Inhalt</feld>
      <feld>Und noch mehr Inhalt</feld>
   </test>
</record>

@blackwinter
Copy link
Member

Indeed, that's what I'm concerned about too. But you can always iterate over the parent element, can't you? (do list(path:test, var:loop) => do something with loop.feld)

@TobiasNx
Copy link
Collaborator

Indeed, that's what I'm concerned about too. But you can always iterate over the parent element, can't you? (do list(path:test, var:loop) => do something with loop.feld)

do list only works with arrays so unfortunately this would not work. the bind to repeat things on a hash/element is do each

Also if you want to select the path for an array in an other field/array you need the index-number or wildcard https://github.com/LibreCat/Catmandu/wiki/Paths#wildcards:

record.test.feld.*

@TobiasNx
Copy link
Collaborator

TobiasNx commented Nov 16, 2021

It seems that catmandu is behaving the same way:
https://github.com/TobiasNx/Catmandu-Testing/tree/master/repeatable_fields_xml

It could be that the specific importer change tha behaviour if in a specific format like MODS a field is repeatable or not. That could be managed in an handler.

@TobiasNx
Copy link
Collaborator

TobiasNx commented Nov 19, 2021

I investigated this a little further: https://github.com/TobiasNx/Catmandu-Testing/tree/master/repeatable_fields_xml

Catmandu converts general formats like XML as follow:

  • single field => key-value-pair
  • repeated field => list

But specific importers like MODS or the OAI importer that converts Dublin Core create lists by default for fields that are defined as repeatable fields in the format/schema. Even if they only occure once.

tibdevelopment pushed a commit to TIBHannover/oersi-etl that referenced this issue Nov 25, 2021
`./gradlew run --args 'data/experimental/duepub-metafacture.flux'`

See metafacture/metafacture-fix#65
@fsteeg fsteeg removed their assignment Nov 25, 2021
fsteeg added a commit that referenced this issue Nov 30, 2021
fsteeg added a commit that referenced this issue Nov 30, 2021
tibdevelopment pushed a commit to TIBHannover/oersi-etl that referenced this issue Nov 30, 2021
`./gradlew run --args 'data/experimental/duepub-metafacture.flux'`

See metafacture/metafacture-fix#65
@fsteeg
Copy link
Member Author

fsteeg commented Nov 30, 2021

Pre-published current state as 0.2.0-rc6, used in oersi-etl!99. See basic checklist from our meeting in #65 (comment). I think this issue is too broad and should be split up into / replaced by smaller issues (e.g. #66, #80).

Assigning @TobiasNx for functional review; see also the discussion at #78 (comment) about a possible better process for providing and implementing specific use cases.

@TobiasNx
Copy link
Collaborator

I think it will be best if I will create small use cases/test cases that need to be handled in https://gitlab.com/oersi/oersi-etl/-/tree/FunctionalReview/data/experimental

Perhaps we could have a separate metafacture instance only for testing where we could collect test scenarios besides the OERSI context?

@TobiasNx
Copy link
Collaborator

@blackwinter what should I do here?

@blackwinter
Copy link
Member

what should I do here?

I don't know. This issue has status "Review" and you're the reviewer. If the status is no longer appropriate it should be updated, I guess.

@TobiasNx
Copy link
Collaborator

I think this is resolved in context of: #109

@TobiasNx
Copy link
Collaborator

I close this ticket, we can reopen it, if we need.

Repository owner moved this from Review to Done in Metafacture Mar 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

3 participants