-
-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: hostname format check fails on empty string #677
base: main
Are you sure you want to change the base?
Conversation
Assert that hostname format validation fails gracefully on empty strings. This is especially for Python `jsonschema` library that raises an unexpected ValueError exception on `hostname` check (python-jsonschema/jsonschema#1121). Adds similar test for: * draft3: host-name * draft4: hostname * draft6: hosntame * draft7: hostname, idn-hostname * draft2019-09: hostname, idn-hostname * draft2020-12: hostname, idn-hostname * draft-next: hostname, idn-hostname
Doing hostname format check on empty string seems to raise a ValueError: >>> from jsonschema.validators import validator_for >>> schema = {"$schema": "https://json-schema.org/draft/2020-12/schema", "type": "string", "format": "hostname"} >>> vcls = validator_for(schema) >>> validator = vcls(schema, format_checker=vcls.FORMAT_CHECKER) >>> list(validator.iter_errors("")) ... File "lib/python3.10/site-packages/jsonschema/_format.py", line 276, in is_host_name return FQDN(instance).is_valid File "lib/python3.10/site-packages/fqdn/__init__.py", line 44, in __init__ raise ValueError("fqdn must be str") ValueError: fqdn must be str Fix by adding `raises=ValueError` to the related `@_checks_drafts` decorator call. See also json-schema-org/JSON-Schema-Test-Suite#677. Fixes python-jsonschema#1121.
So, reading the spec(s), I'm not sure this is correct actually! In particular, excluding draft 2020-12 which I'll get back to, earlier drafts all say the equivalent of:
(see e.g. here though as I say you can flip through all the drafts and they say more or less the same thing). Looking at that section, it says:
which seems to say that the empty string is indeed allowed, and in the context of DNS, refers to the root. Draft 2020-12 on the other hand was changed to reference RFC 1123 and not RFC 1034 (some context is here):
where RFC 1123 in that section seems to simply pass things along to RFC 952:
where the grammar in the latter seems to say:
which also seems to allow the empty string. Have a look and if you have some existing domain knowledge let me know, but yeah my first read looks like these are actually valid under the quoted specs. |
Does this mean we should keep the tests, but ensure that validation passes instead? |
To my read, yes. |
Uhh, what a can of worms.. ;) Some docs here and there limit DNS labels to be 1-63 octets (and some 0-63), and full entry max 253. But no mentions of allowing / denying no labels at all. And then the usual "use a single dot for the root". I wonder how various real life tools work. |
Yes well no good deed goes unpunished certainly. The authoritative thing here is the spec(s) / RFCs, not what tools do for better or worse, though certainly looking through them can be insightful anyhow -- I certainly saw one allowing an empty string. |
I'll reverse the check and reword the commit and PR later today. Is it worth adding any other checks? |
None that I can immediately think of, let's see if anyone else chimes in between now and then, and regardless thanks for raising the PR! |
I'll give this a run through my implementation tomorrow. |
Looks like in Python The check uses |
(That's not surprising to me -- the question is usually twofold --
In that case, I'm also somewhat trusting the JSON Schema spec which says "all hostnames are valid idn-hostnames" though that too may be wrong! Tricky tricky...) |
FWIW, my implementation does not accept empty string:
..which is just a wrapper around https://metacpan.org/pod/Data::Validate::Domain#is_domain($domain,-\%options) which says it implements the RFCs. |
Weirdly, my implementation says Ah! For |
I don't have time to work on this at the moment. I can see that there's some interest, and in theory the tests should be strict, but I'll leave that for others to decide. Feel free to take over and add the required tests. For what it's worth, I added additional |
As far as I can tell, for anyone picking this back up, there isn't any additional test to add I don't think beyond the ones already in this PR, so the work is simply to flip the "false"'s to "true" as the expected result, or to otherwise show where in the RFCs I quoted is a restriction on empty strings. |
(And thanks @jvtm for getting this started!) |
In the context of DNS, the root is an empty fragment to allow for absolute paths. The above behaviour is a gross security issue, so the default configuration for most implementations of DNS resolvers is to use the search path for path length of 1. (e.g. My interpretation of this:
About the change from RFC1034 to RFC1123 Considering this grammar in RFC952:
A name consists of a These character sets are not defined in the grammar, but referring to the assumptions in section 1:
so a name is represented by the regular expression and a hname (hostname) is 1 or more names joined with a A further complication are DNS service records, these explicitly begin with an underscore to distinguish them from hostnames. |
I had some comment written out about how I had read your comment 2 or 3 times and couldn't fully follow it because it had outside info that wasn't seemingly relevant to answering "what is the spec's behavior regardless of what you want to use it for and whether it's appropriate for that use case" -- but then the fourth time reading it I noticed a quite simple error (on my part) which I think explains the behavior, namely:
I think I carelessly assumed and/or forgot that that BNF syntax in RFCs puts the So I'm at least personally convinced now that they indeed are not allowed in the grammar -- I'm not sure if anyone else actually was on either side here honestly, as usual I find the "my implementation does X" not very helpful compared to "I understand the behavior to be X because that's what it says" -- if anyone does have some understanding and wants to speak up on either side feel free -- otherwise if it was just me as a holdout, I'm happy to see the merge conflicts fixed and this merged with them being invalid under these lines in the spec. |
Sorry, I realize now I wrote that whole comment out the second time without saying "thank you! clearly your analysis was helpful for getting to the bottom of this now it seems"! |
Assert that hostname format validation fails gracefully on empty strings.
This is especially for Python
jsonschema
library that raises an unexpected ValueError exception onhostname
check (python-jsonschema/jsonschema#1121).Adds similar test for: