Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catalog resolution problems when running on AWS Lambda #2

Closed
abdelazer opened this issue Jan 10, 2020 · 7 comments
Closed

Catalog resolution problems when running on AWS Lambda #2

abdelazer opened this issue Jan 10, 2020 · 7 comments

Comments

@abdelazer
Copy link

Thanks for all your work on this family of libraries. Please let me know if you'd prefer this Issue was filed inside another project.

I'm trying to create an environment to run the idml2xml pipeline inside AWS Lambda and have found the configuration process quite challenging. The latest problem: in some cases catalog-powered URI resolution errors when it gets an encoded %2F value instead of the / it expects.

I would like your guidance on how to continue debugging this issue.

Here's the details I think I've established:

  • When the Lambda function that runs this pipeline is invoked locally (sam local invoke --event events/S3SampleEvent.json), the filesystem is configured as expected (by something) and completes the Lambda function invocation as expected. Good.

  • When the function runs on the real Lambda environment, the configuration is somehow different. Bad. A %2F value is sometimes passed as a valid $catalog-resolved-uri value and the system (Saxon) errors (extra <cx:message> added by me):

2020-01-10 18:59:07 55382016-1591-418c-a3d3-c846cf7e784b DEBUG SetAttributes:169 - Running set-attributes add-orig-href
2020-01-10 18:59:07 55382016-1591-418c-a3d3-c846cf7e784b DEBUG Sink:169 - Running sink sink1
2020-01-10 18:59:07 55382016-1591-418c-a3d3-c846cf7e784b DEBUG Message:169 - Running {http://xmlcalabash.com/ns/extensions}message !1.22.2.12.2
18:59:07
Message: Calling resolve-uri() with $catalog-resolved-uri value: -%2F- and @local-href value: --
18:59:07
2020-01-10 18:59:07 55382016-1591-418c-a3d3-c846cf7e784b ERROR Main:140 - Pipeline failed: An empty sequence is not allowed as the second argument of fn:resolve-uri()
2020-01-10 18:59:07 55382016-1591-418c-a3d3-c846cf7e784b ERROR Main:143 - Underlying exception: An empty sequence is not allowed as the second argument of fn:resolve-uri()
2020-01-10 18:59:07 55382016-1591-418c-a3d3-c846cf7e784b DEBUG Main:145 - An empty sequence is not allowed as the second argument of fn:resolve-uri()
18:59:08
net.sf.saxon.s9api.SaxonApiUncheckedException: An empty sequence is not allowed as the second argument of fn:resolve-uri()
18:59:08
at net.sf.saxon.s9api.XPathSelector.iterator(XPathSelector.java:203) ~[Saxon-HE-9.9.1-5.jar:?]
  • I traced the issue as far as analyze filename in xproc-util/file-uri/xpl/file-uri.xpl, where I added the <cx:message> shown above:
      <p:otherwise>
        <p:documentation>Other protocol or relative filename. We don’t support other protocols/notations, so we assume it to be
          a relative path.</p:documentation>
        <!-- KHF -->
        <cx:message>
          <p:with-option name="message"
            select="concat('Calling resolve-uri() with $catalog-resolved-uri value: -', $catalog-resolved-uri, '- and @local-href value: -', /c:result/@local-href,  '-')"/>
        </cx:message>
        <tr:file-uri name="resolved-uri">
          <p:with-option name="filename" select="resolve-uri($catalog-resolved-uri, /c:result/@local-href)">
            <p:pipe port="result" step="cwd-uri"/>
          </p:with-option>
        </tr:file-uri>
        <tr:unescape-uri attribute-names="os-path"/>
      </p:otherwise>
  • Next I tried to modify xproc-util/file-uri/xpl/file-uri.xpl to add a case just for this:
      <p:when test="matches($catalog-resolved-uri, '^%2F$')">
        <p:documentation>Encoded Unix Root</p:documentation>
        <!-- KHF -->
        <cx:message>
          <p:with-option name="message"
            select="concat('KHF: Correcting $catalog-resolved-uri value: -', $catalog-resolved-uri)"/>
        </cx:message>
        <p:add-attribute match="/*" attribute-name="local-href">
          <p:with-option name="attribute-value" select="'file:/'"/>
        </p:add-attribute>
        <p:add-attribute match="/*" attribute-name="os-path">
          <p:with-option name="attribute-value" select="$catalog-resolved-uri"/>
        </p:add-attribute>
        <tr:unescape-uri attribute-names="os-path"/>
      </p:when>
  • Adding that code did work, although it's not a particularly informed solution:
19:53:03
2020-01-10 19:53:03 825efa0b-9f27-48b3-9e20-dc1d74c12ef7 DEBUG SetAttributes:169 - Running set-attributes add-orig-href
19:53:03
2020-01-10 19:53:03 825efa0b-9f27-48b3-9e20-dc1d74c12ef7 DEBUG Sink:169 - Running sink sink1
19:53:03
2020-01-10 19:53:03 825efa0b-9f27-48b3-9e20-dc1d74c12ef7 DEBUG Message:169 - Running {http://xmlcalabash.com/ns/extensions}message !1.22.2.5.2
19:53:03
Message: KHF: Correcting $catalog-resolved-uri value: -%2F
19:53:03
2020-01-10 19:53:03 825efa0b-9f27-48b3-9e20-dc1d74c12ef7 DEBUG AddAttribute:169 - Running add-attribute !1.22.2.5.3
19:53:03
2020-01-10 19:53:03 825efa0b-9f27-48b3-9e20-dc1d74c12ef7 DEBUG AddAttribute:169 - Running add-attribute !1.22.2.5.4
...
(then it happens at least two more times)
  • My JAVA_TOOL_OPTIONS include -Dxml.catalog.files=/var/task/calabash/xmlcatalog/catalog.xml and that part seems (?) to be working:
Picked up JAVA_TOOL_OPTIONS: -Dxml.catalog.files=/var/task/calabash/xmlcatalog/catalog.xml -Dlog4j.configurationFile=log4j2.xml
START RequestId: Version: $LATEST
2020-01-10 19:52:31 INFO  IDMLHandler:32 - Handling S3 PUT to source-idml-bucket-758045473916/StoneRiverCrossing.idml
2020-01-10 19:52:34 INFO  IDMLHandler:65 - Downloading IDML file from S3 (arn:aws:s3:::source-idml-bucket-758045473916/StoneRiverCrossing.idml) to /tmp/input.idml
2020-01-10 19:52:34 INFO  IDMLHandler:47 - Invoking XProc pipline (/var/task/xpl/idml2xml-frontend.xpl) with config /var/task/transpect-config.xml
2020-01-10 19:52:37 DEBUG XProcConfiguration:426 - XML Calabash searching for Saxon extension functions on the class path
2020-01-10 19:52:37 DEBUG XProcRuntime:170 - XML Calabash version 1.1.30 (for Saxon 9.9.1.5/HE)
2020-01-10 19:52:40 INFO  Catalog:338 - Resolved: http://transpect.io/idml2xml/xpl/idml2hub.xpl
2020-01-10 19:52:41 INFO  Catalog:338 - Resolved: http://transpect.io/calabash-extensions/unzip-extension/unzip-declaration.xpl
2020-01-10 19:52:41 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/file-uri/xpl/file-uri.xpl
2020-01-10 19:52:42 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/recursive-directory-list/xpl/recursive-directory-list.xpl
2020-01-10 19:52:42 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/store-debug/xpl/store-debug.xpl
2020-01-10 19:52:42 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/file-uri/xpl/file-uri.xpl
2020-01-10 19:52:42 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/simple-progress-msg/xpl/simple-progress-msg.xpl
2020-01-10 19:52:43 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/xslt-mode/xpl/xslt-mode.xpl
2020-01-10 19:52:44 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/store-debug/xpl/store-debug.xpl
2020-01-10 19:52:45 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/xml-model/xpl/prepend-xml-model.xpl
2020-01-10 19:52:45 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/simple-progress-msg/xpl/simple-progress-msg.xpl
2020-01-10 19:52:47 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/xslt-mode/xpl/xslt-mode.xpl
2020-01-10 19:52:47 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/store-debug/xpl/store-debug.xpl
2020-01-10 19:52:47 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/xml-model/xpl/prepend-xml-model.xpl
2020-01-10 19:52:47 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/simple-progress-msg/xpl/simple-progress-msg.xpl
2020-01-10 19:52:47 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/xml-model/xpl/prepend-hub-xml-model.xpl
2020-01-10 19:52:47 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/simple-progress-msg/xpl/simple-progress-msg.xpl
2020-01-10 19:52:48 INFO  Catalog:338 - Resolved: http://transpect.io/calabash-extensions/mathtype-extension/xpl/mathtype2mml-declaration.xpl
2020-01-10 19:52:48 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/store-debug/xpl/store-debug.xpl
2020-01-10 19:52:48 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/store-debug/xpl/store-debug.xpl
2020-01-10 19:52:55 DEBUG XSLT:169 - Running xslt !1.2.7.3.3
2020-01-10 19:52:58 DEBUG Store:169 - Running store !1.2.7.3.4: href=/tmp/CB-a6b8a81d-6aa9-4a83-8dd2-200e58b2b931_idml2hub-start.txt
2020-01-10 19:52:58 DEBUG Info:169 - Running {http://exproc.org/proposed/steps/os}info info
2020-01-10 19:52:58 DEBUG XSLT:169 - Running xslt catalog-resolve
2020-01-10 19:52:58 INFO  Catalog:338 - Resolved: http://transpect.io/xslt-util/resolve-uri/xsl/resolve-uri.xsl
2020-01-10 19:53:02 DEBUG AddAttribute:169 - Running add-attribute empty-result
2020-01-10 19:53:02 DEBUG Sink:169 - Running sink !1.17
2020-01-10 19:53:02 DEBUG Info:169 - Running {http://exproc.org/proposed/steps/os}info info
2020-01-10 19:53:02 DEBUG XSLT:169 - Running xslt catalog-resolve
2020-01-10 19:53:02 INFO  Catalog:338 - Resolved: http://transpect.io/xslt-util/resolve-uri/xsl/resolve-uri.xsl
2020-01-10 19:53:03 DEBUG AddAttribute:169 - Running add-attribute empty-result
2020-01-10 19:53:03 DEBUG Sink:169 - Running sink !1.17
2020-01-10 19:53:03 DEBUG Identity:169 - Running identity !1.19.1.2
2020-01-10 19:53:03 DEBUG SetAttributes:169 - Running set-attributes add-orig-href
2020-01-10 19:53:03 DEBUG Sink:169 - Running sink sink1
2020-01-10 19:53:03 DEBUG Message:169 - Running {http://xmlcalabash.com/ns/extensions}message !1.22.2.5.2
Message: KHF: Correcting $catalog-resolved-uri value: -%2F
2020-01-10 19:53:03 DEBUG AddAttribute:169 - Running add-attribute !1.22.2.5.3
2020-01-10 19:53:03 DEBUG AddAttribute:169 - Running add-attribute !1.22.2.5.4
2020-01-10 19:53:03 DEBUG XSLT:169 - Running xslt unescape
2020-01-10 19:53:03 INFO  Catalog:338 - Resolved: http://transpect.io/xslt-util/hex/xsl/hex.xsl
2020-01-10 19:53:04 DEBUG AddAttribute:169 - Running add-attribute lastpath
2020-01-10 19:53:04 DEBUG AddAttribute:169 - Running add-attribute lastpath-os

What would you recommend to debug this further?

Apologies on the length of this report, but I wanted to add the context I have. My partially working code is here, although I've (clumsily, given my lack of proficiency with Java) tried to do as little as possible to get Calabash and idml2xml-frontend running as possible: https://github.com/codeburl/transpect-on-lambda

@gimsieke
Copy link
Contributor

gimsieke commented Jan 10, 2020

Hi Keith,

Sorry that getting it to run caused so much trouble so far.

I see that you are using Calabash for Saxon 9.9. There have been some issues with Saxon 9.9. (and base URIs in particular) that I didn’t manage to look into for some time now.

On the other hand idml2xml-frontend is configured to use a calabash-frontend branch as a submodule, and it seems to use a Saxon 9.9 Calabash, so I guess that it should be working in principle.

I see that the steps can be located using catalog resolution.

I wonder why $catalog-resolved-uri has the value %2F though. A catalog resolved URI should at least be a URI, starting with file:. Can you uncomment the message that outputs the attributes of the pos:info result and tell me what it reports?

@gimsieke
Copy link
Contributor

Maybe there is no cwd on Lambda, that’s what I suspect or at least factor in as a possible cause. Since I didn’t anticipate that, unpredictable results may ensue.

@abdelazer
Copy link
Author

Thank you for the suggestions. I've uncommented the message you suggested and included the output logs here (it does seem to have a cwd):

2020-01-10T23:59:31.995000 START RequestId: Version: $LATEST
2020-01-10T23:59:31.999000 2020-01-10 23:59:31 INFO  IDMLHandler:32 - Handling S3 PUT to source-idml-bucket-y/x.idml
2020-01-10 23:59:32 INFO  IDMLHandler:66 - Downloading IDML file from S3 (arn:aws:s3:::source-idml-bucket-y/x.idml) to /tmp/input.idml
2020-01-10 23:59:32 INFO  IDMLHandler:48 - Invoking XProc pipline (/var/task/xpl/idml2xml-frontend.xpl) with config /var/task/transpect-config.xml
2020-01-10 23:59:32 DEBUG XProcConfiguration:426 - XML Calabash searching for Saxon extension functions on the class path
2020-01-10 23:59:32 DEBUG XProcRuntime:170 - XML Calabash version 1.1.30 (for Saxon 9.9.1.5/HE)
2020-01-10 23:59:32 INFO  Catalog:338 - Resolved: http://transpect.io/idml2xml/xpl/idml2hub.xpl
2020-01-10 23:59:32 INFO  Catalog:338 - Resolved: http://transpect.io/calabash-extensions/unzip-extension/unzip-declaration.xpl
2020-01-10 23:59:32 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/file-uri/xpl/file-uri.xpl
2020-01-10 23:59:32 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/recursive-directory-list/xpl/recursive-directory-list.xpl
2020-01-10 23:59:32 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/store-debug/xpl/store-debug.xpl
2020-01-10 23:59:32 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/file-uri/xpl/file-uri.xpl
2020-01-10 23:59:32 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/simple-progress-msg/xpl/simple-progress-msg.xpl
2020-01-10 23:59:32 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/xslt-mode/xpl/xslt-mode.xpl
2020-01-10 23:59:32 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/store-debug/xpl/store-debug.xpl
2020-01-10 23:59:32 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/xml-model/xpl/prepend-xml-model.xpl
2020-01-10 23:59:32 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/simple-progress-msg/xpl/simple-progress-msg.xpl
2020-01-10 23:59:32 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/xslt-mode/xpl/xslt-mode.xpl
2020-01-10 23:59:33 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/store-debug/xpl/store-debug.xpl
2020-01-10 23:59:33 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/xml-model/xpl/prepend-xml-model.xpl
2020-01-10 23:59:33 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/simple-progress-msg/xpl/simple-progress-msg.xpl
2020-01-10 23:59:33 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/xml-model/xpl/prepend-hub-xml-model.xpl
2020-01-10 23:59:33 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/simple-progress-msg/xpl/simple-progress-msg.xpl
2020-01-10 23:59:33 INFO  Catalog:338 - Resolved: http://transpect.io/calabash-extensions/mathtype-extension/xpl/mathtype2mml-declaration.xpl
2020-01-10 23:59:33 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/store-debug/xpl/store-debug.xpl
2020-01-10 23:59:33 INFO  Catalog:338 - Resolved: http://transpect.io/xproc-util/store-debug/xpl/store-debug.xpl
2020-01-10 23:59:33 DEBUG XSLT:169 - Running xslt !1.2.7.3.3
2020-01-10 23:59:33 DEBUG Store:169 - Running store !1.2.7.3.4: href=/tmp/CB-7a076f5d-ab17-457a-8f43-d27a7aadf219_idml2hub-start.txt
2020-01-10 23:59:33 DEBUG Info:169 - Running {http://exproc.org/proposed/steps/os}info info
2020-01-10 23:59:33 DEBUG Message:169 - Running {http://xmlcalabash.com/ns/extensions}message !1.16
2020-01-10T23:59:33.693000 Message: pos:info file-separator=/ path-separator=: os-architecture=amd64 os-name=Linux os-version=4.14.138-99.102.amzn2.x86_64 cwd=/ user-name=sbx_user1051 user-home=/home/sbx_user1051 
2020-01-10 23:59:33 DEBUG AddAttribute:169 - Running add-attribute empty-result
2020-01-10 23:59:33 DEBUG XSLT:169 - Running xslt catalog-resolve
2020-01-10 23:59:33 INFO  Catalog:338 - Resolved: http://transpect.io/xslt-util/resolve-uri/xsl/resolve-uri.xsl
2020-01-10 23:59:33 DEBUG Info:169 - Running {http://exproc.org/proposed/steps/os}info info
2020-01-10 23:59:33 DEBUG Message:169 - Running {http://xmlcalabash.com/ns/extensions}message !1.16
2020-01-10T23:59:33.914000 Message: pos:info file-separator=/ path-separator=: os-architecture=amd64 os-name=Linux os-version=4.14.138-99.102.amzn2.x86_64 cwd=/ user-name=sbx_user1051 user-home=/home/sbx_user1051 
2020-01-10 23:59:33 DEBUG AddAttribute:169 - Running add-attribute empty-result
2020-01-10 23:59:33 DEBUG XSLT:169 - Running xslt catalog-resolve
2020-01-10 23:59:33 INFO  Catalog:338 - Resolved: http://transpect.io/xslt-util/resolve-uri/xsl/resolve-uri.xsl
2020-01-10 23:59:34 DEBUG Identity:169 - Running identity !1.20.1.2
2020-01-10 23:59:34 DEBUG Sink:169 - Running sink !1.18
2020-01-10 23:59:34 DEBUG Sink:169 - Running sink sink1
2020-01-10 23:59:34 DEBUG SetAttributes:169 - Running set-attributes add-orig-href
2020-01-10 23:59:34 ERROR Main:140 - Pipeline failed: An empty sequence is not allowed as the second argument of fn:resolve-uri()
2020-01-10 23:59:34 ERROR Main:143 - Underlying exception: An empty sequence is not allowed as the second argument of fn:resolve-uri()
2020-01-10 23:59:34 DEBUG Main:145 - An empty sequence is not allowed as the second argument of fn:resolve-uri()
2020-01-10T23:59:34.174000 net.sf.saxon.s9api.SaxonApiUncheckedException: An empty sequence is not allowed as the second argument of fn:resolve-uri()

@gimsieke
Copy link
Contributor

Apparently tr:file-uri doesn’t like '/' as the filename option value. I’m going to fix it.

gimsieke added a commit to transpect/xproc-util that referenced this issue Jan 11, 2020
gimsieke added a commit to transpect/xproc-util that referenced this issue Jan 11, 2020
@gimsieke
Copy link
Contributor

I committed something. Can you pull xproc-util and xslt-util manually and see whether it will improve things?

@abdelazer
Copy link
Author

The issue is resolved after updating those three files you modified in xproc-util and xslt-util yesterday and I'm now getting Hub XML out of Lambda. Thanks!

I appreciate you jumping on this issue so quickly.

@gimsieke
Copy link
Contributor

I appreciate you jumping on this issue so quickly.

Despite a cold and an imminent wedding party, I might add!

Filename to URI conversion will be a standard function of XProc 3.0.

We will eventually migrate all transpect libraries to XProc 3.0, keeping the current modules in maintenance mode. There will be a migration tool for 1.0 pipelines.

If you use non-XProc tools to postprocess the flat, DocBook-based XML that comes out of idml2xml, you won’t need to migrate that many pipelines though. You just wait until idml2xml-frontend gets migrated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants