Skip to content
This repository has been archived by the owner on Jan 27, 2025. It is now read-only.

Add module or function to delete all empty/null values #34

Closed
TobiasNx opened this issue Dec 2, 2020 · 8 comments
Closed

Add module or function to delete all empty/null values #34

TobiasNx opened this issue Dec 2, 2020 · 8 comments
Assignees

Comments

@TobiasNx
Copy link
Collaborator

TobiasNx commented Dec 2, 2020

When using fix to map a data set one has to address all fields that could have empty or null values with an additional not-equal function.

e.g.
https://gitlab.com/oersi/oersi-etl/-/blob/develop/data/production/edu-sharing.fix#L44

do map(node.description, description)
  not_equals(string: '')
end

The problem with that is that you have to know all fields that could have empty values and for each of them you have to add the not-equals function. This would expand the FIX quiet a bit with all the exceptions.

A Flux-module or a FIX-function would be good that would delete all metadata fields with emtpy values in a data set per record. So that no additional exception is necessary. Since different data sets have different values marking that it is empty as null or "" It would be good to add an attribute to that functions so one can determine what an empty value is.

@blackwinter
Copy link
Member

For null values, there's org.metafacture.mangling.NullFilter. I'm not aware of anything for filtering literals based on their content (org.metafacture.strings.StringFilter/filter-strings works on record level). So maybe a org.metafacture.mangling.StringFilter needs to be added (with a configurable pattern). To metafacture-core, though, right?

@fsteeg
Copy link
Member

fsteeg commented Dec 3, 2020

So maybe a org.metafacture.mangling.StringFilter needs to be added (with a configurable pattern). To metafacture-core, though, right?

Yes, this sounds good. We'd have two StringFilter then though (one for the record level, one for the literal level). Maybe we can make NullFilter configurable instead? One could argue that conceptually it's always about null values, but they are represented differently, sometimes actual null, sometimes strings like "NULL" or "nil", sometimes by the empty string. Or a different name for the new class like StringLiteralFilter.

@dr0i
Copy link
Member

dr0i commented Dec 3, 2020

+1 for enhancing NullFilter. It's highly arguable how to identify a null and how/if to treat it.

@blackwinter
Copy link
Member

Well, that's a bit of stretch ;) It was explicitly designed for (Java) null values so downstream receivers won't choke on them.

But I won't oppose expanding NullFilter's scope to include semantically "null" values.

@blackwinter
Copy link
Member

Will this issue be resolved by #60? It adds the vacuum() function (998da3d).

@TobiasNx
Copy link
Collaborator Author

TobiasNx commented Nov 5, 2021

If I remember it vaccuum() only deletes empty strings not null values. But I might be wrong.

Also in context of OERSI we have he module filter-null-values: https://gitlab.com/oersi/oersi-etl/-/blob/master/data/production/hoou-to-oersi.flux#L14

@blackwinter
Copy link
Member

If I remember it vaccuum() only deletes empty strings not null values.

That's correct. But it seems that the new Metafix implementation ignores null values automatically, although I haven't verified it to the full extent.

If you believe that's sufficient, you can add #60 under "Linked pull requests". Otherwise, just leave this ticket open.

@fsteeg
Copy link
Member

fsteeg commented Nov 10, 2021

Closing, supported by vacuum & filter-null-values.

@fsteeg fsteeg closed this as completed Nov 10, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants