-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify handling of invalid codepoint escape sequences #164
Comments
@kasei - I'm guessing you are referring to the text What alternative readings do you see? The way Turtle handles this differently to SPARQL - it has |
@afs –
I think it's unclear what "processed" means here. Should the example I gave me an error? That is, does the simple appearance of
Agreed the Turtle handling is better, and also that "we are where we are." So I'm just looking to start a discussion on which of the two possibilities I note above is the expected behavior (if we can find consensus on that), and hoping we can add some text indicating that expectation. |
An alternative is to develop some rdf-tests. The discussion would reach practitioners. Here are some more examples to add to the collection:
Hex x41 is There is an argument SPARQL should switch to Turtle-style on security grounds because of the obfuscation possibilities.
which is |
Agreed. I can try to work on a PR with some tests in this area (using both approaches) and we could solicit feedback from implementors. |
@kasei - thank you for the tests. I think there are some specific points with the current spec text that are "errata":
|
Yes, we can raise the issues in the errata. Hopefully we can discuss in the WG and get clarity on the issue so that we can also address those issues in 1.2. |
I think the current spec text is ambiguous about how codepoint escape sequences should be handled if they are invalid. For example:
I think we might want to consider adding (either normative or best-practice) text about how this case should be handled. It seems like several systems (including my own, and Jena) ignore invalid sequences, causing the above query to have a literal that starts with an escaped backslash, followed by the four characters "000Z". Other systems might see the
\u
with invalid trailing characters and raise an error. Having clarity on the expected behavior here would be good.The text was updated successfully, but these errors were encountered: