Skip to content
This repository has been archived by the owner on Jan 27, 2025. It is now read-only.

Emit internal _ fields #63

Closed
fsteeg opened this issue Nov 5, 2021 · 11 comments · Fixed by #84
Closed

Emit internal _ fields #63

fsteeg opened this issue Nov 5, 2021 · 11 comments · Fixed by #84
Assignees

Comments

@fsteeg
Copy link
Member

fsteeg commented Nov 5, 2021

Came up in #60. Internal _ fields should be emitted if explicitly listed in retain.

@blackwinter
Copy link
Member

Copying previous discussion for better overview:


"don't emit _ fields" (24d72f0): Do you really think this is such a great idea? Why prevent anyone from mapping arbitrary fields? Who's to say there can't be a legitimate use case for this?

It should at least be possible to bypass this filter by somehow marking desired fields. E.g., all fields explicitly mapped by the user and/or listed in retain(). I guess it's okay to filter internal fields (such as _id), unless specifically requested for output.

Originally posted by @blackwinter in #60 (comment)

@fsteeg
Copy link
Member Author

fsteeg commented Nov 10, 2021

From the discussion: emit _id if specified in retain().

Functional review: @TobiasNx
Code review: @fsteeg

@TobiasNx
Copy link
Collaborator

Since I saw this in the catmandu documentation. It always seems to emit fields that have _ and does not need additional specification in retain():

See the example here _id:
https://librecat.org/Catmandu/#fixes

Also I think that retain should not be the function by choice since you need to specify all fields that you want to retain. All fields that are not stated are deleted.

I think we should mimik the behaviour that keeps the field by default as in catmandu.

Test case:
I want to change something small in a record but keep the rest. Keeping fields with _ by default would emit all fields.
If I needed to expicitly state "_id" in retain I would also need to state all the other fields at least the parent/root element of them that I need additionally. Otherwise I would loose all the other fields but "_id" oder that I forget.

If this is only about the metamorph internal fields (like "_id" are there even more?) and not all fields with _
I still would recommend a seperate function like keep() additionally to retain() so that we do not have the deletion behaviour that is connected to retain()

blackwinter added a commit that referenced this issue Dec 1, 2021
Virtual fields can be accessed like regular metadata fields, but aren't emitted by default.
blackwinter added a commit that referenced this issue Dec 1, 2021
Don't mark internal fields with underscore prefix; instead, make `_id` a virtual field.

NOTE: Inlined `Value.merge()` in `Value.Hash.add()` in order to prevent subtle bugs with virtual fields in the future.
blackwinter added a commit that referenced this issue Dec 3, 2021
Virtual fields can be accessed like regular metadata fields, but aren't emitted by default.
blackwinter added a commit that referenced this issue Dec 3, 2021
Don't mark internal fields with underscore prefix; instead, make `_id` a virtual field.

NOTE: Inlined `Value.merge()` in `Value.Hash.add()` in order to prevent subtle bugs with virtual fields in the future.
@blackwinter blackwinter removed their assignment Dec 3, 2021
@blackwinter
Copy link
Member

blackwinter commented Dec 3, 2021

@TobiasNx: Ready for review.

blackwinter added a commit that referenced this issue Dec 6, 2021
@blackwinter
Copy link
Member

Oops, I may have been too quick with merging the pull request. Is the functional review still coming?

@TobiasNx
Copy link
Collaborator

TobiasNx commented Dec 6, 2021

To be honest, I was planing to, but I realized that I do not know how to review new MR for Fix, this seems different from metafacture when building from source.

@fsteeg deploid his new commit to the test playground. But I think that this is not the solution. How should I do this

@TobiasNx TobiasNx assigned blackwinter and fsteeg and unassigned TobiasNx Dec 6, 2021
@blackwinter
Copy link
Member

I do not know how to review new MR for Fix

In your Metafacture project (oersi-etl?), you can add a source dependency on the metafacture-fix branch in question:

diff --git build.gradle build.gradle
index 2724310..67b2bcc 100644
--- build.gradle
+++ build.gradle
@@ -44,7 +44,7 @@ dependencies {
     implementation 'org.metafacture:metafacture-elasticsearch:5.3.0-rc2'
     implementation 'org.metafacture:metafacture-xml:5.3.0-rc2'
     implementation 'org.metafacture:metafacture-fix:0.2.0-rc2'
-    implementation 'org.metafacture:metafix:0.2.0-rc6'
+    implementation('org.metafacture:metafix') { version { branch = '63-emitUnderscoreFields' } }
     implementation('org.metafacture:metafacture-biblio:5.3.0-rc2') {
         exclude group: 'xml-apis', module: 'xml-apis'
     }
diff --git settings.gradle settings.gradle
index 10ca9b3..26490b5 100644
--- settings.gradle
+++ settings.gradle
@@ -8,3 +8,9 @@
  */
 
 rootProject.name = 'oersi-etl'
+
+sourceControl {
+    gitRepository('https://github.com/metafacture/metafacture-fix.git') {
+        producesModule('org.metafacture:metafix')
+    }
+}

This depends on the particular system/configuration, of course.

@fsteeg
Copy link
Member Author

fsteeg commented Dec 6, 2021

Right, two issues here (colliding with comment by @blackwinter, but posting anyway):

  1. In our process, as currently documented, the PR should have only been assigned after successful functional review
  2. How should @TobiasNx actually do the functional review? Maybe for now the easiest way is to use oersi-etl in the metafix branch? In build.gradle, set the metafix dependency to 'org.metafacture:metafix:0.2.0-SNAPSHOT and run ./gradlew publishToMavenLocal in your local metafacture-fix.

@blackwinter
Copy link
Member

In our process, as currently documented, the PR should have only been assigned after successful functional review

True, but I was trying to save time by parallelizing the process. Maybe unsuccessfully :/ Sorry!

@blackwinter blackwinter removed their assignment Dec 6, 2021
tibdevelopment pushed a commit to TIBHannover/oersi-etl that referenced this issue Dec 8, 2021
@TobiasNx
Copy link
Collaborator

TobiasNx commented Dec 8, 2021

+1

I tested it.
https://gitlab.com/oersi/oersi-etl/-/tree/FunctionalReview/data/experimental/emit_fields

What does not seem to work though is to keep "_id" additionally to other fields if you do not want to use retain. I stated the problem above.

@blackwinter
Copy link
Member

Thanks for the review!

What does not seem to work though is to keep "_id" additionally to other fields if you do not want to use retain.

You should be able to map it onto itself: copy_field("_id", "_id")

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants