In this article and examples we look at the problem of handling polymorphic JSON structures in the Golang programming language.
Deal with polymorphic serialized JSON data is a problem discussed in the Go community. Polymorhpisn can be found in existing APIs and is sometimes needed for new APIs. Unfortunately the default Go JSON code does not handle polymorphism out of the box and hence the motivation for this work.
Let us first understand what polymorphism is and how it maps to Go language.
As per Wikipedia Polymorphism is
In programming languages and type theory, polymorphism is the provision of a single interface to entities of different types
In the JSON serialization format Polymorphism is used to select among one of multiple data structure types. Those data types may be a flat enumeration or represent a hierarchical system such as the classification of living things or exception hierarchy.
In this work we would detail the more complex case of deep hierarchical structure. The same approach is applicable to a flat enumeration of data structure types.
One example of polymorphic JSON API is RFC 7946 Geo JSON.
{
"type": "Point",
"coordinates": [102.0, 0.5]
}
// OR
{
"type": "Polygon",
"coordinates": [
[
[100.0, 0.0],
[101.0, 0.0],
[101.0, 1.0],
[100.0, 1.0],
[100.0, 0.0]
]
]
}
In this specification we would look into a simple error model that would be a common chore for API authors and consumers.
{
"errors": [
{
"Kind" : "Fault",
"Message": "Something went wrong.",
"Cause": { "Kind": "Fault", "Message": "Missing file" }
},{
"Kind" : "RuntimeFault",
"Message": "Unexpected error"
},{
"Kind" : "Not Found",
"Message": "The cat Lucie is missing.",
"Obj": "Lucie",
"ObjKind": "Cat"
}
]
}
The first task is to define the data objects in Go. This is rather straightforward job:
type FaultStruct struct {
Message string
Cause *Fault
}
type RuntimeFaultStruct struct {
Fault
}
type NotFoundStruct struct {
RuntimeFault
ObjKind string
Obj string
}
Note how embedding can be used to construct more complex types without repeating
the definitions. This is similar to the allOf
construct in JSON schema.
We have now defined a schema of three objects that extend each other. However we do not get Polymorphic behavior. In other words the following results in error
func printFault(f *Fault) {
fmt.Println("The fault message is:", f.Message);
}
func main() {
printFault(&RuntimeFault{ Fault { Message: "test" } })
}
We cannot use RuntimeFault
or NotFound
in place where Fault
is needed.
Let's see how to solve this.
The Go language provides us with the concept of interfaces implemented through methods on various types. It is with interfaces that different data structures can be used interchangably and exhibit polymorphic behavior.
We can apply interfaces in several ways to our example to get the desired polymorphic behavour. The first way we look at involves getters and setters. This is simpler to understand while not idiomatic for Go. The second apporach defines interfaces that provide access to the underlying data structure. It is not as obvious yet produces cleaner and more concise go code.
Applying interfaces to our example above produces working code
type FaultInterface interface {
GetMessage() string
}
func (f *Fault) GetMessage() string {
return f.Message
}
func printFault(f FaultInterface) {
fmt.Println("The fault message is:", f.GetMessage())
}
func main() {
printFault(&RuntimeFault{Fault{Message: "test"}})
}
In similar way an interface can be used as return type, member of another structure or element type of a slice.
In addition to data model we would need to define interfaces for each type to have polymorphic behavior.
I prefer to keep things tidy and kept the models and their implementations in
the models
package. While the interface definitions reside in the interfaces
package. Thus we get the following definitions:
type Fault interface {
GetMessage() string
SetMessage(string)
GetCause() Fault
SetCause(Fault)
}
type RuntimeFault interface {
Fault
}
type NotFound interface {
RuntimeFault
GetObjKind() string
SetObjKind(string)
GetObj() string
SetObj(string)
}
The code is still DRY. There is no need to include members of generic abstractions into specialized ones. Instead Go provides us with embedding to reuse existing code.
Of course we will need to implement the methods in the models
package at
least once on the most generic type they appear on. We could also implement some
methods more than once on more specific types if there is specific business
logic to include at that level of abstraction.
Rendering JSON from Go structure even referred through an interface is a straightforward task. In our case we want to add the class name or discriminator in the JSON payload for the recipients to know what type of fault they are receiving. The following simple pattern works in Go:
func (nfo *NotFound) MarshalJSON() ([]byte, error) {
type marshalable NotFound
return json.Marshal(struct {
Kind string
marshalable
}{
Kind: "NotFound",
marshalable: marshalable(*nfo),
})
}
Here is what this method does:
- It declares a method for marshalling
NotFound
objects. Go discovers it dynamically by checking if the object implementsencoding/json.Unmarshaler
. - A new type is declared that has no marshaling logic -
marshalable
this is needed to evade recursion - An anonymous struct type is declared that provides additional field for the discriminator.
- An object of the new type is created that embeds the current
NotFound
instance data and adds theNotFound
value for theKind
field.
It is good idea to have the field Kind
on top. This way Go will render it
first on the wire and clients will consume the output more efficiently.
After we add these methods to our error types the serialization starts to work.
func main() {
var fi FaultInterface = &RuntimeFault{Fault{Message: "test"}}
data, err := json.Marshal(fi)
if err != nil {
fmt.Println("Cannot write JSON", err)
return
}
fmt.Println("JSON:", string(data))
}
prints:
{"Kind":"RuntimeFault","Message":"test","Cause":null}
Here is the evolving example.
Reading back the JSON values is slightly more convoluted. The biggest obstacle is that Go has no easy way to associate functionality to an interface type. Thus it is not possible to implement unmarshal method on our interface types. Instead we need to take care of unmarshaling interface one layer above them. There are several scenarios where we can see the need to unmarshal interfaces:
- Read an top level JSON object into an interface
- Read member field of an object into an interface
- Read array member into an interface
Reading a JSON document like the one we have above into na interface is simplest.
We need a function that receives a []byte
and returns interface.Fault
. Let
us call this UnmarshalFault
:
func UnmarshalFault(in []byte) (Fault, error) {
d := &struct {
Kind string
}{}
// Double pointer detects null values
err := json.Unmarshal(in, &d)
if err != nil {
return nil, err
}
if d == nil {
return nil, nil
}
kind := d.Kind
var res Fault
switch kind {
case "NotFound":
res = &NotFound{}
case "RuntimeFault":
res = &RuntimeFault{}
default: // Error on default or try to use base type?
res = &Fault{}
}
json.Unmarshal(in, res)
return res, nil
}
The basic operation of the function is as follows:
- Scan the incoming JSON bytes for the discriminator field
Kind
- Depending on the type the proper struct type is initialized and read from the wire
This pretty much does the trick for basic interface unmarshaling.
Here is the sample getting bigger
The last challenge is reading interfaces when they are members of an object or an array.
To achieve this each object that has field of polymorphic type will need custom unmarshaling logic that reads into a type with placeholder fields that can be unmarshaled into temporary object and later converted into the proper type.
For example
var _ json.Unmarshaler = &Container{}
func (c *Container) UnmarshalJSON(in []byte) error {
// Deserialize into temp object
temp := struct {
FaultField json.RawMessage
}{}
err := json.Unmarshal(in, &temp)
if err != nil {
return err
}
c.FaultField = nil
if temp.FaultField != nil {
c.FaultField, err = UnmarshalFault(temp.FaultField)
if err != nil {
return err
}
}
return nil
}
Our goal in the code above is to unmarshal Container
structure. To achieve it
we first unmarshal into spacial structure that has json.RawMessage
field
instead of the Fault
interface type. We use the UnmarshalFault
function to
convert the json.RawMessage
to Fault
,
In this way we can now easily read our Container
object from a JSON file and
get the correct Fault
instance.
c := Container{}
err = json.Unmarshal(b, &c)
Working with arrays is similar to object fields. We need to first read into
special slice of json.RawMEssage
elements and then build the proper slice of
Fault
interface implementations.
You can see the code in array_test.go
type ArrayContainer struct {
Faults []Fault
}
var _ json.Unmarshaler = &ArrayContainer{}
func (c *ArrayContainer) UnmarshalJSON(in []byte) error {
// Deserialize into temp object of utility class
temp := struct {
Faults []json.RawMessage
}{}
err := json.Unmarshal(in, &temp)
if err != nil {
return err
}
c.Faults = nil
if temp.Faults != nil {
c.Faults = []Fault{}
for _, rawFault := range temp.Faults {
if rawFault == nil {
c.Faults = append(c.Faults, nil)
}
fault, err := UnmarshalFault(rawFault)
if err != nil {
return err
}
c.Faults = append(c.Faults, fault)
}
}
return nil
}
The Fault
object contains a Cause
field that links to a related Fault
object. We need to change the field from pointer to struct type to interface as to allow polymorphic behavior. Further we need to add UnmarshalJSON
operations to all types in the hierarchy as to correctly deserialize. The methods on every type need to care about all fields and cannot delegate to the embedded types.
func (nfo *NotFound) UnmarshalJSON(in []byte) error {
pxy := &struct {
Message string
Cause json.RawMessage
ObjKind string
Obj string
}{}
err := json.Unmarshal(in, pxy)
if err != nil {
return err
}
var cause Fault
if pxy.Cause != nil {
cause, err = UnmarshalFault(pxy.Cause)
if err != nil {
return err
}
}
nfo.Message = pxy.Message
nfo.Cause = cause
nfo.Obj = pxy.Obj
nfo.ObjKind = pxy.ObjKind
return nil
}
Using deep hierarchies in Go may be problematic as Go interface conversions are based on the presence methods. In our example any Fault
object works well as RuntimeFault
. This is different from the behavior of C++ and Java where objects with no members are used to provide type safety and classification. This functionality can be emulated by adding synthetic member functions for each interface type in a hierarchy. For example the following prevents a Fault
to be converted to RuntimeFault
type RuntimeFault interface {
Fault
ZzRuntimeFault()
}
// ZzRuntimeFault is a marker
func (rf *RuntimeFault) ZzRuntimeFault() {
}
RuntimeFault
fields are polymorphic and are not root of a hierarchy. One option is to have UnmarshalRuntimeFault
bank on the root objects like Fault
and invoke UnmarshalFault
function. This will save code size as in a large hierarchy the unmashal methods may become too big.
func UnmarshalRuntimeFault(in []byte) (RuntimeFault, error) {
fault, err := UnmarshalFault(in)
if err != nil {
return nil, err
}
if runtimeFault, ok := fault.(RuntimeFault); ok {
return runtimeFault, nil
}
return nil, fmt.Errorf("Cannot unmarshal RuntimeFault %v", fault)
}
This article and sample code illustrate the basic handling of polymorphic JSON in Go. We see that out of the box support is lacking. Yet a little bit of creativity helps us get near native experience with polymorphic unmarshaling in Go.
The article and sample code leave out some details.
One area to discuss is how this work can be mapped onto polymorphic OpenAPI schema. The combination of allOf
and discriminator
constructs used in OpenAPI generator with Java provides good base.
The performance of the switch statement in UnmarshalFault
on a string when thousands of classes exist in a hierarchy may require optimized implementation. For example use of state machine that iterates the characters to discern different options and assert valid sequences.
This article builds on a number of previous implementations. Some of those are listed below: