Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor meta section of pds4+json and pds4+xml to use ops namespace #154

Closed
jordanpadams opened this issue Feb 3, 2022 · 19 comments
Closed
Assignees
Labels

Comments

@jordanpadams
Copy link
Member

jordanpadams commented Feb 3, 2022

💡 Description

The flattened ops attributes are in the metadata (e.g. ops:Data_File_Info.ops:creation_date_time).

Then it looks like it is being translated / renamed into the API, e.g. for pds4+json response:

      "meta": {
        "node_name": "PDS_ENG",
        "label_file": {
          "file_name": "bundle_ovirs.xml",
          "file_ref": "/data/pds4/test-data/registry/orex.ovirs/bundle_ovirs.xml",
          "creation_date": "2021-02-22T22:57:56Z",
          "file_size": "11302",
          "md5_checksum": "b7405a8e071d84a521e73c451399a2a5"
        }

Instead, we should keep that namespace as-is, and create the JSON Objects to mimic our ops data model. This should not be custom code for each attribute, but instead something generic that just parses out the ops: attributes, and outputs into the meta object. We may be adding/removing/renaming ops: attributes over time, so we don't want to have to update the API code every time that happens. , e.g.

      "ops:Data_File_Info.ops:md5_checksum",
      "ops:Label_File_Info.ops:creation_date_time",
      "ops:Foo.ops:bar"

becomes

      "meta": {
        "ops:Data_File_Info": {
             "ops:md5_checksum": [ "b7405a8e071d84a521e73c451399a2a5" ]
        }
        "ops:Label_File_Info": {
             "ops:creation_date_time": [ "2021-02-22T22:57:56Z" ]
        }
        "ops:Foo": {
             "ops:bar": [ "baz" ]
        }

Similarly, we should map this to XML in the pds4+xml response.

Related requirements

🦄 NASA-PDS/registry-api#440
🦄 NASA-PDS/registry-api#450

@al-niessner
Copy link
Contributor

@jordanpadams @tloubrieu-jpl

I think this means that you want to modify the swaggerhub spec (yaml) to have Pds4Metadata represent what you are looking for. Currently it is:

    pds4Metadata:
      type: object
      properties:
        node_name:
          type: string
        label_file:
          type: object
          properties:
            file_name:
              type: string
            file_ref:
              type: string
            creation_date:
              type: string
            file_size:
              type: string
            md5_checksum:
              type: string        
        data_files:
          type: array
          items:
            type: object
            properties:
              file_name:
                type: string
              file_ref:
                type: string
              creation_date:
                type: string
              file_size:
                type: string
              md5_checksum:
                type: string
              mime_type:
                type: string

Now you want it to be more like:

    pds4Metadata:
      type: object
      properties:
        ops.Label_File_Info:
          type: object
          properties:
            ops.file_name:
              type: string
            ops.file_ref:
              type: string
            ops.creation_date:
              type: string
            ops.file_size:
              type: string
            ops.md5_checksum:
              type: string        
        ops.Data_File_Info:
          type: array
          items:
            type: object
            properties:
              ops.file_name:
                type: string
              ops.file_ref:
                type: string
              ops.creation_date:
                type: string
              ops.file_size:
                type: string
              ops.md5_checksum:
                type: string
              ops.mime_type:
                type: string

Is that what you are looking for or did you want something else?

Not sure how ops.Foo maps into swaggerhub. Is that the node_name that got lost?

@jordanpadams
Copy link
Member Author

jordanpadams commented Feb 7, 2022

@al-niessner the ops.Foo is a general example that we can continue to expand the ops namespace with more content. I actually propose that we generalize this section of the output similar to what we do with the pds4 objects in the pds4+json and pds+xml responses. e.g.

        meta:
          type: object
          description: | 
              Meta information for the PDS product including file system 
              metadata, tracking information, etc.

I think down the road we can more strongly type what appears in the section by auto-generating the YAML from the data model, but I don't think we are there yet.

Thoughts? @tloubrieu-jpl

also, looks like in the swagger spec this object is currently names metadata, but then the output from the API response is actually meta. Is that a bug?

@tloubrieu-jpl
Copy link
Member

@jordanpadams will add an example for XML

@tloubrieu-jpl
Copy link
Member

@jordanpadams , @al-niessner , Since PR NASA-PDS/registry-api#82 the content of the meta tag has not prefix space (then not defined namespace ). This is not fully right but we did not meant to improve that before this ticket does it.

@jordanpadams
Copy link
Member Author

@tloubrieu-jpl @al-niessner where are we at with this one? am I on the hook for something?

@jordanpadams
Copy link
Member Author

jordanpadams commented Feb 23, 2022

as a clarification of above, per conversations we had at a breakout, we want this for the meta section for both JSON and XML:

    pds4Metadata:
      type: object
      properties:
        ops:Label_File_Info:
          type: object
          properties:
            ops:file_name:
              type: string
            ops:file_ref:
              type: string
            ops:creation_date:
              type: string
            ops:file_size:
              type: string
            ops:md5_checksum:
              type: string        
        ops.Data_File_Info:
          type: array
          items:
            type: object
            properties:
              ops:file_name:
                type: string
              ops:file_ref:
                type: string
              ops:creation_date:
                type: string
              ops:file_size:
                type: string
              ops:md5_checksum:
                type: string
              ops:mime_type:
                type: string

@jordanpadams
Copy link
Member Author

so XML example would be something like:

<pds_api:meta>
   <ops:Label_File_Info>
      <ops:file_name></ops:file_name>
...

@al-niessner
Copy link
Contributor

Part of the Jackson tool that does the mapping of xml namespaces is the need of a URI for the name. For instance, pds_api has a URI defined in the first tag. When Jackson mapper exchanges a bean to XML it wants the URI not the name and it then translates the URI to pds_api. What is the URI for ops?

@jordanpadams
Copy link
Member Author

@al-niessner new namespace (which doesn't exist yet but don't tell anyone...):

https://pds.nasa.gov/pds4/ops/v1

@al-niessner
Copy link
Contributor

al-niessner commented Feb 25, 2022

@jordanpadams @tloubrieu-jpl

Sorry, but this ticket has become really confusing (contradictory). I thought you wanted all 'ops.' to become 'ops:' where 'ops:' is a proper XML namespace with URI given above as https://pds.nasa.gov/pds4/ops/v1.

However, when I look a the pds4+xml output this is what I see:

  <pds_api:meta>
    <pds_api:node_name>PSA</pds_api:node_name>
    <pds_api:label_file>
      <pds_api:file_name>ns_inst.xml</pds_api:file_name>
      <pds_api:file_ref>/var/local/harvest/archive/document/ns_inst.xml</pds_api:file_ref>
      <pds_api:creation_date>2022-01-24T20:08:23Z</pds_api:creation_date>
      <pds_api:file_size>3589</pds_api:file_size>
      <pds_api:md5_checksum>a8d09cca0a01728db50c15052c2736cf</pds_api:md5_checksum>
    </pds_api:label_file>
    <pds_api:data_files>
      <pds_api:data_files>
        <pds_api:file_name>ns_inst.pdf</pds_api:file_name>
        <pds_api:file_ref>/var/local/harvest/archive/document/ns_inst.pdf</pds_api:file_ref>
        <pds_api:creation_date>2022-01-24T20:08:23Z</pds_api:creation_date>
        <pds_api:file_size>138172</pds_api:file_size>
        <pds_api:md5_checksum>8103f20c13a3c321dac4a193aba19d16</pds_api:md5_checksum>
        <pds_api:mime_type>application/pdf</pds_api:mime_type>
      </pds_api:data_files>
    </pds_api:data_files>
  </pds_api:meta>

There are no 'ops.'. Do want to the 'pds_api:' to become 'ops:'?

Oh, and no I never removed the 'ops.' in any of my code. Maybe gets removed in general PDS4 processing that I inherited.

@al-niessner
Copy link
Contributor

al-niessner commented Feb 25, 2022

@jordanpadams @tloubrieu-jpl

Um, same goes with pds4+json

  "meta": {
    "node_name": "PSA",
    "label_file": {
      "file_name": "ns_inst.xml",
      "file_ref": "/var/local/harvest/archive/document/ns_inst.xml",
      "creation_date": "2022-01-24T20:08:23Z",
      "file_size": "3589",
      "md5_checksum": "a8d09cca0a01728db50c15052c2736cf"
    },
    "data_files": [
      {
        "file_name": "ns_inst.pdf",
        "file_ref": "/var/local/harvest/archive/document/ns_inst.pdf",
        "creation_date": "2022-01-24T20:08:23Z",
        "file_size": "138172",
        "md5_checksum": "8103f20c13a3c321dac4a193aba19d16",
        "mime_type": "application/pdf"
      }
    ]
  },

@al-niessner
Copy link
Contributor

@jordanpadams @tloubrieu-jpl

What I am seeing does not match the earliest descriptions of what I should be expecting to see. Is what now exists the real desire or should there be changes based on what is now coming out of pds+*?

@jordanpadams
Copy link
Member Author

jordanpadams commented Feb 26, 2022

@al-niessner apologies for the confusion, the issue from the start is what you see in there now is wrong:

Current pds4+xml - NOT QUITE RIGHT

  <pds_api:meta>
    <pds_api:node_name>PSA</pds_api:node_name>
    <pds_api:label_file>
      <pds_api:file_name>ns_inst.xml</pds_api:file_name>
      <pds_api:file_ref>/var/local/harvest/archive/document/ns_inst.xml</pds_api:file_ref>
      <pds_api:creation_date>2022-01-24T20:08:23Z</pds_api:creation_date>
      <pds_api:file_size>3589</pds_api:file_size>
      <pds_api:md5_checksum>a8d09cca0a01728db50c15052c2736cf</pds_api:md5_checksum>
    </pds_api:label_file>
    <pds_api:data_files>
      <pds_api:data_files>
        <pds_api:file_name>ns_inst.pdf</pds_api:file_name>
        <pds_api:file_ref>/var/local/harvest/archive/document/ns_inst.pdf</pds_api:file_ref>
        <pds_api:creation_date>2022-01-24T20:08:23Z</pds_api:creation_date>
        <pds_api:file_size>138172</pds_api:file_size>
        <pds_api:md5_checksum>8103f20c13a3c321dac4a193aba19d16</pds_api:md5_checksum>
        <pds_api:mime_type>application/pdf</pds_api:mime_type>
      </pds_api:data_files>
    </pds_api:data_files>
  </pds_api:meta>

This should not be translated to some custom names and elements. I would like them to be more closely tied to what is in the registry. So instead of the mapping above, we would have something like:

<pds_api:meta>
   <ops:Label_File_Info>
      <ops:file_name></ops:file_name>
    ...
   <ops:Data_File_Info>
    ...
    etc.
    ...
    with everything ops:* translated into the XML

Same goes for the JSON:
current pds4+json - NOT QUITE RIGHT

  "meta": {
    "node_name": "PSA",
    "label_file": {
      "file_name": "ns_inst.xml",
      "file_ref": "/var/local/harvest/archive/document/ns_inst.xml",
      "creation_date": "2022-01-24T20:08:23Z",
      "file_size": "3589",
      "md5_checksum": "a8d09cca0a01728db50c15052c2736cf"
    },
    "data_files": [
      {
        "file_name": "ns_inst.pdf",
        "file_ref": "/var/local/harvest/archive/document/ns_inst.pdf",
        "creation_date": "2022-01-24T20:08:23Z",
        "file_size": "138172",
        "md5_checksum": "8103f20c13a3c321dac4a193aba19d16",
        "mime_type": "application/pdf"
      }
    ]
  },

More better solution we want:

  "meta": {
    "ops:Label_File_Info": {
      "ops:file_name": "ns_inst.xml",
    ...
    }
    ... 
    "ops:Data_File_Info": {
      "ops:file_name": "ns_inst.pdf",
    ...
    }
    ...
    etc.
    ...
    with everything ops:* translated into the JSON

as a note: the ops:Label_File_Info.ops:file_name value in the registry should map to the unflattened JSON version above.

Note: I may be off in some of the specifics of the syntax. I don't have a working registry / API available so I can't see what the API output looks like now.

hope that makes somewhat sense?

@al-niessner
Copy link
Contributor

@jordanpadams @tloubrieu-jpl

Thanks that makes my steps clear.

  1. update swagger.yml to have the new names that match what is in the registry
  2. correct existing pds4 handling to not lose the ops.*
  3. in xml and json mapping change ops. to ops: where it exists
    a. xml use URL for proper namespace
    b. json just do string substitution as json has not namespacing

@jordanpadams
Copy link
Member Author

@al-niessner 👍 looks good to me. I will allow @tloubrieu-jpl to chime in here in case something else is missing

@al-niessner
Copy link
Contributor

@tloubrieu-jpl @jordanpadams

Is this what you want (aka did miss anything)?

XML output:

  <pds_api:meta>
    <node_name>PSA</node_name>
    <ops:Label_File_Info>
      <ops:file_name>ns_inst.xml</ops:file_name>
      <ops:file_ref>/var/local/harvest/archive/document/ns_inst.xml</ops:file_ref>
      <ops:creation_date>2022-01-24T20:08:23Z</ops:creation_date>
      <ops:file_size>3589</ops:file_size>
      <ops:md5_checksum>a8d09cca0a01728db50c15052c2736cf</ops:md5_checksum>
    </ops:Label_File_Info>
    <ops:Data_Files>
      <ops:Data_Files>
        <ops:file_name>ns_inst.pdf</ops:file_name>
        <ops:file_ref>/var/local/harvest/archive/document/ns_inst.pdf</ops:file_ref>
        <ops:creation_date>2022-01-24T20:08:23Z</ops:creation_date>
        <ops:file_size>138172</ops:file_size>
        <ops:md5_checksum>8103f20c13a3c321dac4a193aba19d16</ops:md5_checksum>
        <ops:mime_type>application/pdf</ops:mime_type>
      </ops:Data_Files>
    </ops:Data_Files>
  </pds_api:meta>

and JSON

  "meta": {
    "node_name": "PSA",
    "ops:Label_File_Info": {
      "ops:file_name": "ns_inst.xml",
      "ops:file_ref": "/var/local/harvest/archive/document/ns_inst.xml",
      "ops:creation_date": "2022-01-24T20:08:23Z",
      "ops:file_size": "3589",
      "ops:md5_checksum": "a8d09cca0a01728db50c15052c2736cf"
    },
    "ops:Data_Files": [
      {
        "ops:file_name": "ns_inst.pdf",
        "ops:file_ref": "/var/local/harvest/archive/document/ns_inst.pdf",
        "ops:creation_date": "2022-01-24T20:08:23Z",
        "ops:file_size": "138172",
        "ops:md5_checksum": "8103f20c13a3c321dac4a193aba19d16",
        "ops:mime_type": "application/pdf"
      }
    ]
  },

@al-niessner
Copy link
Contributor

@jordanpadams

You got your wish. If you add new items to swagger.yml and then fill it, the name will come out right.

@jordanpadams
Copy link
Member Author

@al-niessner is node_name also an ops attribute? can you email me all the available ops properties? And I will try to do a mapping. it looks like we missing some of harvest info attributes.

@jordanpadams
Copy link
Member Author

this is done. additional tickets have been created to address additional concerns for the over-arching requirement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants