Fix for #123 - Add support for preservation of unknown properties #124

aseovic · 2018-05-10T03:16:10Z

Had to revert back to Java 1.7 in order for Travis tests to pass, but it will work with Java 8 as well, once #122 is merged.

EugenCepoi

Looks good! Left just a few comments that are here more for discussion.

EugenCepoi · 2018-05-11T06:33:36Z

genson/src/main/java/com/owlike/genson/reflect/BeanDescriptor.java

@@ -122,6 +136,8 @@ public void deserialize(T into, ObjectReader reader, Context ctx) {
        } else {
          reader.skipValue();
        }
+      } else if (unknownPropertyHandler != null) {
+        unknownPropertyHandler.onUnknownProperty(into, propName, ctx.genson.deserialize(UNKNOWN, reader, ctx));


This means that if a property handler is registered it is responsible of being able to handle missing properties for all targeted types.
Another option could be to actually implement failOnMissingProperty as a special UnknownPropertyHandler, and chain them. That would however complicate a bit UnknownPropertyHandler as it would need to "signal" if it did handle the missing property or not. Just food for thought.

I see handler and failOnMissingProperty as mutually exclusive options, or rather, the latter is just a special case of the former, which we want to leave as-is for backwards compatibility, but isn't strictly necessary -- anyone can write a handler that fails immediately, if that's what they want to.

More likely than not, people will use either one UPH (probably the default one we provide) or none, and if they do use one they will not want to fail on missing properties, so I really don't see a point of complicating the design. In the unlikely event that they do need chained/composite UPH, they can easily write one themselves that does exactly what they need.

Because it's a special case of unknown property handling, is the main reason why I was saying it could be reimplemented that way. The advantage I see to it is, unified logic and removal of unnecessary code.

Preserving some of the compatibility could be achieved by translating the boolean failOnMissingProperty into a handler at the builder level (and remove it from the other places like Genson and BeanDescriptor, which are less likely to be directly used).

I agree that the chaining part is not necessary.

I see... Yes, that would probably make sense.

We just need to document that behavior, but considering that they are already mutually exclusive, we should probably document it anyway.

Let me rework the code along those lines and I'll update the PR.

EugenCepoi · 2018-05-11T06:43:55Z

genson/src/test/java/com/owlike/genson/reflect/UnknownPropertyHandlerTest.java

+        Map<String, Object> unknownProperties();
+    }
+
+    static class EvolvableHandler implements UnknownPropertyHandler {


Worth providing a simple impl like that as part of the lib.
In that case I think it might be better to not mutate the result of unknownProperties (for ex. this would then fail for an immutable map) and instead just define a addProperty method. Also some alternative namings could be PropertyBag, WithUnknownProperties, UnknownProperties.

I (mostly) agree :-)

The reason I didn't do it is because any useful implementation must make certain assumptions about how the data class is implemented (ie. it must implement Evolvable and expose a holder map). If Java had mix-ins this would be a no-brainer, but unfortunately it doesn't, so it's really a question of whether we think an interface provides enough value to include it, and to create a default UPH implementation. If you feel we should, I'm not against it.

As for not mutating property map directly, it can get complicated. I originally had what you are suggesting, a pair of addUnknownProperty/getUnknownProperties methods. However, in that case you'd end up with the unknown properties serialized twice -- once as unknownProperties map, because of the getter, and once by the BeanDescriptor because they are, well, unknown properties.

In the end I decided that the simplest solution was to expose the mutable map via a single method that won't impact normal serialization. Now, the issue is still there -- if the field-based serialization was enabled, we'd have to mark the field itself with @JsonIgnore or make it transient. Again, a lot of this depends on personal preferences, and for test purposes I just did the easiest thing that worked. If we decide to add a standard implementation into Genson, we should probably discuss how it should look like and what the external contract and semantics should be.

I think that a default impl would be useful if we were able to address the points you mention.
That way the user would only have to implement the contract without needing to also implement the handler and we could mitigate the double ser. problem.

To address the later problem we could provide an abstract class instead of an interface (mark the field as transient and annotate the methods). This would still be annoying for people that can't extend the class but would work in every case for all the others.

OR/AND (we can do both)

Have an interface with JsonIgnore (or use names that don't match std get/set patterns like you did in this test) and as a bonus fix this issue Merge JsonIgnore like annotations #85 to merge JsonIgnore with subclasses.

Fixing this issue is not trivial, so that could be done in a second time later on. The main problem is due to the fact that exclusion of properties happens before merging of annotations on all discovered properties. This is an useful feature to have...

OR

Do something like this hacky idea from https://stackoverflow.com/a/25907755/840441. Though this is probably too hacky for what we need :)

We can certainly do 1) and 2) and provide both an interface and an abstract base class that implements it.

I'm not a big fan of base classes, as most Java developers are not happy when you force them to extend something and take away their only opportunity for inheritance, but I guess providing it as an option and not something that's required softens the blow a bit -- they can always choose not to use our base class and implement the interface directly in their own class(es).

The option 3) scares me... ;-)

aseovic · 2018-05-12T02:41:59Z

There is one issue with this PR that we still need to address.

At the moment, I am reading and deserializing unknown properties just as I would if the were known, but there is a subtle difference. With unknown properties, there is a chance that an older version of an app will attempt to read it, and find @class property that references the class that doesn't exist. For example, we could've added address property to Person v2, and implemented Address class along with it, but the older version of the app doesn't have either, in which case deserialization would fail with ClassNotFoundException.

I see two ways around it:

To read unknown JSON value as opaque blob/string, without trying to deserialize its content into something meaningful. Effectively, JsonReader.skipValue that returns skipped value as string. Considering that it is not meaningful to the older version of the class anyway, this is my preference.
To read it as generic javax.json.JsonValue, without taking class metadata into account if it's there. That way we avoid ClassNotFoundException, but we are parsing meaningless data into unnecessary data structure.

Thoughts?

EugenCepoi · 2018-05-14T00:49:08Z

Wouldn't option 1) make it close to useless from an usage point of view? As it's unlikely that the user will be able to do anything with it. Also if the user was to programmatically do something with the unknwon properties, it will be hard for him to know if this is a simple string or actually a raw json value.

Also it feels weird that the same document would be deserialized to different target types depending if the class is present or not.

Maybe unknwon properties should always be deserialized to a Map<String, JsonValue> irrespective of @Class annotation ? Or they could be deserialized to a Map<String, Object> (where object would be string/long/list/map/etc), in which case you would need a bit more work to make sure the class metadata converter doesn't kick in (there are a few options to do so).

This makes me think that none of that would be customizable at the handler level. Perhaps this is where it would make more sense to do all this?

aseovic · 2018-05-20T03:36:27Z

Well, in case of option 1), if you don't know what the data is and aren't even aware it exists, there probably isn't much you can do with it anyway, so it's not as big of an issue as it seems and completely avoids deserializing something you likely don't care about into potentially large object graph that is nothing but garbage. A simple String or even byte[] tend to work just fine for temporary storage.

That said, I'm not quite sure how to implement it. It seems like the best option would be to have JsonReader.readUnknown and JsonWriter.writeUnknown, but I need to look closer and see how involved this would actually be.

I am not opposed to option 2) either, but in that case what options do you have in mind for preventing converters from kicking in?

Finally, I do agree with the last comment -- it would be nice if we could let the handler itself decide how to read and write unknown properties, instead of us making the decision for it.

Maybe something along the lines of:

public interface UnknownPropertyHandler {
    /**
     * Called whenever a property is encountered in a JSON document
     * that doesn't have a corresponding {@link PropertyMutator}.
     * <p>
     * Typically, the implementation of this interface concerned
     * with schema evolution will handle this event by storing
     * property value somewhere so it can be written by the
     * {@link #writeUnknownProperties} method.
     *
     * @param target    the object we are deserializing JSON into
     * @param propName  the name of the unknown property
     * @param reader    the ObjectReader to read property value from
     * @param ctx       deserialization context                 
     */
    void onUnknownProperty(Object target, String propName, ObjectReader reader, Context ctx);

    /**
     * Write unknown properties encountered during deserialization.
     * <p>
     * This method can be optionally implemented by {@code UnknownPropertyHandler}s
     * that want to write unknown properties during serialization. The default 
     * implementation is a no-op.
     * 
     * @param source  the object we are serializing into JSON
     * @param writer  the ObjectReader to read property value from
     * @param ctx     serialization context                 
     *
     * @return a map of unknown properties
     */
    default void writeUnknownProperties(Object source, ObjectWriter writer, Context ctx) {
    }
}

Thoughts?

aseovic · 2018-05-21T07:41:25Z

I made the changes based on the discussion above, and will submit a new PR.

Closing this one as obsolete.

EugenCepoi · 2018-05-21T07:48:31Z

The low level streaming code is probably the less friendly part in Genson. Initially the impl was much simpler, but then I went to a few extremes trying to improve speed to match up with Jackson. Things that were optimizations a couple years back are probably less useful today...

I think 2) will be the easiest to achieve. The few options that come to mind would be to:

use directly a converter like JsonValueConverter from JSR353Bundle
or implement a similar converter to the one above but that deals with plain java map/list/primitives/etc and use it directly (this type of converter could be useful in general)
or use the default genson instance that doesn't use the metadata and deser to a Map which should use DefaultConverters.UntypedConverter.

For the last point the target object might not be yet constructed. Also I'm not sure about not writing back unknown properties, it seems like a nice property to be able to round trip data.
Making the ser/de part customizable is a nice to have but probably not that important and could be added later on if needed.

EugenCepoi · 2018-05-21T07:50:17Z

BTW you don't have to open a new PR for the changes. You can just update your branch. Force pushing to it is fine as anyway it's your work branch that no one else is pushing to.

aseovic · 2018-05-21T10:03:16Z

Yeah, I know... The main reason I created a new PR was because I originally committed fix to my master instead of a branch by accident, which makes it hard to sync with the upstream master or work on other, unrelated issues.

Anyway, it's all cleaned up and moved now, see #127 for details.

aseovic added 2 commits May 9, 2018 23:11

Fix for owlike#123 - Add support for preservation of unknown properties

f484b07

Fix for owlike#123 - Add support for preservation of unknown properties

25456ed

EugenCepoi reviewed May 11, 2018

View reviewed changes

aseovic closed this May 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for #123 - Add support for preservation of unknown properties #124

Fix for #123 - Add support for preservation of unknown properties #124

aseovic commented May 10, 2018 •

edited

Loading

EugenCepoi left a comment

EugenCepoi May 11, 2018

aseovic May 12, 2018

EugenCepoi May 14, 2018

aseovic May 20, 2018

EugenCepoi May 11, 2018

aseovic May 12, 2018

EugenCepoi May 14, 2018

aseovic May 20, 2018 •

edited

Loading

aseovic commented May 12, 2018

EugenCepoi commented May 14, 2018

aseovic commented May 20, 2018 •

edited

Loading

aseovic commented May 21, 2018

EugenCepoi commented May 21, 2018

EugenCepoi commented May 21, 2018

aseovic commented May 21, 2018 •

edited

Loading

Fix for #123 - Add support for preservation of unknown properties #124

Fix for #123 - Add support for preservation of unknown properties #124

Conversation

aseovic commented May 10, 2018 • edited Loading

EugenCepoi left a comment

Choose a reason for hiding this comment

EugenCepoi May 11, 2018

Choose a reason for hiding this comment

aseovic May 12, 2018

Choose a reason for hiding this comment

EugenCepoi May 14, 2018

Choose a reason for hiding this comment

aseovic May 20, 2018

Choose a reason for hiding this comment

EugenCepoi May 11, 2018

Choose a reason for hiding this comment

aseovic May 12, 2018

Choose a reason for hiding this comment

EugenCepoi May 14, 2018

Choose a reason for hiding this comment

aseovic May 20, 2018 • edited Loading

Choose a reason for hiding this comment

aseovic commented May 12, 2018

EugenCepoi commented May 14, 2018

aseovic commented May 20, 2018 • edited Loading

aseovic commented May 21, 2018

EugenCepoi commented May 21, 2018

EugenCepoi commented May 21, 2018

aseovic commented May 21, 2018 • edited Loading

aseovic commented May 10, 2018 •

edited

Loading

aseovic May 20, 2018 •

edited

Loading

aseovic commented May 20, 2018 •

edited

Loading

aseovic commented May 21, 2018 •

edited

Loading