Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Carriage Return characters (\r) are incorrectly replaced with Newline characters (\n) #512

Open
5 of 6 tasks
ericdrobinson opened this issue Oct 29, 2022 · 10 comments
Open
5 of 6 tasks
Labels
Pending Pending to be confirmed by user/author for some check/update/implementation

Comments

@ericdrobinson
Copy link

ericdrobinson commented Oct 29, 2022

  • Are you running the latest version?
  • Have you included sample input, output, error, and expected output?
  • Have you checked if you are using correct configuration?
  • Did you try online tool?

Description

Carriage Return characters (\r) found within parsed XML are incorrectly converted to newline characters (\n).

I did a quick scan of the source code and found a likely culprit:

const parseXml = function(xmlData) {
  xmlData = xmlData.replace(/\r\n?/g, "\n"); //TODO: remove this line

That pretty clearly replaces any \r character (and possibly \r\n pair) with \n.

The online tool does not exhibit this issue because browser node access APIs encode Carriage Return "strings" (the character \ followed by r) as \\r. The regular expression no longer matches. See:

> "1) \r 2) \\r 3) \n 4) \\n 5) \r\n 6) \\r\\n".replace(/\r\n?/g, "\n")
  '1) \n 2) \\r 3) \n 4) \\n 5) \n 6) \\r\\n'

Input

The XML that exhibits this issue is of the form:

<properties object="" engine="">
    <property type="string" name="x" state="changed">
        <![CDATA[This is a carriage return \r...]]>
    </property>
    <property type="string" name="y" state="changed">
        <![CDATA[\r]]>
    </property>
</properties>

Code

The code is pretty straightforward.

const XML_OPTIONS_NO_TAG_PARSE: fastXMLParser.X2jOptionsOptional = {
    attributeNamePrefix: "@",
    ignoreAttributes: false,
    parseAttributeValue: false,
    parseTagValue: false,
    textNodeName: "#value",
};
const XML_PARSER_NO_TAG_PARSE = new fastXMLParser.XMLParser(XML_OPTIONS_NO_TAG_PARSE);

// ...

const parsed = XML_PARSER_NO_TAG_PARSE.parse(xmlData);

After that code runs, the parsed text node content has \n instead of the expected \r.

Output

Running the above results in the following JSON:

{
    "properties": {
        "property": [
            {
                "#value": "This is a carriage return \n...",
                "@type": "string",
                "@name": "x",
                "@state": "changed"
            },
            {
                "#value": "\n",
                "@type": "string",
                "@name": "y",
                "@state": "changed"
            },
        ],
        "@object": "",
        "@engine": ""
    }
}

Expected Data

I expect the following output:

{
    "properties": {
        "property": [
            {
                "#value": "This is a carriage return \r...",
                "@type": "string",
                "@name": "x",
                "@state": "changed"
            },
            {
                "#value": "\r",
                "@type": "string",
                "@name": "y",
                "@state": "changed"
            },
        ],
        "@object": "",
        "@engine": ""
    }
}

Would you like to work on this issue?

  • Yes
  • No
@github-actions
Copy link

I'm glad you find this repository helpful. I'll try to address your issue ASAP. You can watch the repo for new changes or star it.

@amitguptagwl amitguptagwl added the Pending Pending to be confirmed by user/author for some check/update/implementation label Jan 7, 2023
@ericdrobinson
Copy link
Author

I have just confirmed that this line:

const parseXml = function(xmlData) {
  xmlData = xmlData.replace(/\r\n?/g, "\n"); //TODO: remove this line

is responsible for the bug. (Verified by stepping through with a debugger.)

@ericdrobinson
Copy link
Author

ericdrobinson commented Jan 13, 2023

I have also determined that the reason the online tool doesn't exhibit this issue is because the XML string returned by browser node access APIs (used internally by CodeMirror) returns carriage return characters as \\r instead of \r.

Depending upon the purpose of the problematic line of code that I identified, this may actually cause a bug in the online tool.

Regardless, this is clearly not operating as expected.

@nkappler
Copy link

this seems to only happen when trimValues is turned on, at least in my use case, I get a lot more new lines from the text node when I turn it on.

@amitguptagwl
Copy link
Member

can you check with v5 if it works?

@nkappler
Copy link

is v5 not available as an npm module? How do I install it and where do I get it?

@yelly
Copy link

yelly commented Jun 18, 2024

This issue is also affecting us as we need to parse XML files which do not encode carriage returns.
We can't really move to v5 because we are also building XML files which isn't supported yet (I guess we could parse with v5 and build with v4 but that seems a bit cumbersome).
I know that replacing all CRLFs or CRs with LFs is in the XML spec, but it would be nice to be able to turn this off (at least within values, between tags newlines are ignored anyway).
In the meantime as a workaround we are manually encoding all CRs before calling parse and then decoding them in the tagValueProcessor which may lead to bugs in some strange edge cases.

@amitguptagwl
Copy link
Member

@yelly , I can understand the trouble you're experiencing. But unfortunately, I'll not be able to do any changes in the code for next 2-3 months. I would take v5 ahead for new changes and build the other components like validator, builder etc for v5. I just wanted to take a feedback on v5 before I do any big change.

@livehigh
Copy link

will it be fixed in v4?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Pending Pending to be confirmed by user/author for some check/update/implementation
Projects
None yet
Development

No branches or pull requests

5 participants