-
-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Impl test report #540
Impl test report #540
Conversation
Woohoo! This will be awesome. Will have a look tomorrow or so, but thanks this is a big deal. |
|
||
To add your implementation, you'll need to start by creating a CLI application that takes the following parameters: | ||
|
||
- `--schema <PATH#POINTER>` - The relative URI to the schema, e.g. `../tests/draft2019-09/additionalItems.json#/0/schema`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implication here is that the instance and schema are expected to be files in JSON format. Are any other encoding formats allowed or expected to be supported? (e.g. YAML?) Must these be files on the local filesystem, or are network-reachable URIs possibly going to be used as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. is this legal? --schema https://json-schema.org/draft/2020-12/schema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These URIs point directly to the file and schema/instance inside of it. The docker-compose is copying the test suite as it exists in this repo directly into the container. Your CLI will need to find the file, open it, and navigate to the location indicated by the pointer fragment. The resulting JSON data will be the schema or instance as appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. is this legal? --schema https://json-schema.org/draft/2020-12/schema
I expect that could be accepted by your CLI, but the test suite won't submit that. It'll only submit schemas that exist in the test suite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These URIs point directly to the file and schema/instance inside of it. The docker-compose is copying the test suite as it exists in this repo directly into the container. Your CLI will need to find the file, open it, and navigate to the location indicated by the pointer fragment.
Great, please add that to the README.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had intended to add a "how it works" section that would include this.
What do you think about my idea of passing the data and schemas to the implementation directly via STDIN? This may make it easier for you to generate randomized tests on the fly (and prevent the consuming implementation from potentially cheating, which it can do if it is allowed to see the entire file that the data comes from and therefore see what the correct valid/invalid response should be). |
Your CLI can support whatever, but it MUST support what I've laid out. If you want to make it multipurpose so that you could use it outside of this test suite, that's fine. The idea I'm presenting here is that we want to exercise the implementation against the tests in this repo. If a dev wants to make their CLI do more, then that's up to them. Such extra functionality won't be exercised here. Regarding cheating, the CLI isn't given the metadata of the tests. It's only given the schema and the data, and from that it has to provide an outcome. The script puts the test metadata (including the expected result) and the actual result into the result file. |
Thats not what I'm saying. I'm suggesting that if you add to the requirements that the implementation accepts data on STDIN, it would make it easier to supply a wider variety of test data, and on an ad hoc basis.
It shouldn't limit itself to that.
It is, if you provide a filename and a reference point within it that is obviously corresponding to the standard layout of the files in the test suite. If this is what you require, I can provide you with a tool that does nothing but look to a 'valid' property adjacent to 'data' and provide that response. That's why I think only providing data via files is a mistake -- it is restricting too strongly what structure is used for testing, which means the tool's use is limited. |
For the purposes of this runner, that's not necessary. This runner will only ever run the tests present in the suite. If you're intent is to just see how implementations might behave with a new scenario before you add it to the suite, that could be done locally easily enough.
Why not? We only have the tests that are present in these files.
I think requiring this tells the implementors that we don't trust them. It's not really the signal we want to send. I see no real problem with pointing the CLI at the test case within the files. Truthfully, an enterprising cheater could hard-code all the test cases into the tool with their expected result, which would still give a 100%. I'm not really worried about cheating. If we're concerned, we can ask to see the source for the tool. |
Additionally, this is a first draft at a runner. It doesn't need to be perfect. |
Personally I don't think we need to spend any effort on making sure an implementation isn't gaming the runner, such a thing is likely to be quickly discovered (by the implementation's users) and the consequences will be bad. It's just a high risk no reward to me. |
Of course agree with this, but another thing that might be useful for a second pass, which I've already been thinking about, is that I'd like to have a JSON-based file format which represents skips of tests in the suite. I.e. it should be possible to state in JSON what tests from the suite you want to run (across required and optional) and what you want to flag as skipped (or known failing). Today I do this in my implementation in Python code, but I even on my own eventually will move to JSON in my implementation, because it's easier to work with -- standardizing such a thing though may make things easier for this runner (even if we don't formally standardize it, and just invent something for this repo). |
Yup, absolutely! I do think it would be a useful feature to have though, and it might be easier if we allow for that from the start rather than updating Docker containers multiple times.
Yes, being able to generate tests programmatically would be very useful. The test data could still be written out to a file as an intermediary, but that seems like an unnecessary step when we could just stream them. Even if the data comes from a file, we should not presume that the structure is identical to the existing test suite -- and providing examples that give json pointers that exactly match the test suite structure does imply that. How about the example saying something like
Yes, that's one reason why generated tests are useful.
It's not about cheating specifically (although we have talked about this in the past, which is when we first started talking about randomly generating tests and feeding them to implementations blind, last year), but also about generating a wider variety of tests that we may not want to commit to the test suite directly. If you recall, I submitted a more comprehensive set of tests last year (PR #385) and they were rejected because of the size, with the suggestion that we could programmatically generate these tests and send them directly to implementations when such a capability was available. I'd still like to be able to do that. |
I would suggest we devise some kind of indexing system. It could help with the report as well. The index for a given test would need to remain consistent as the suite changes.
The pointers submitted to the CLI don't need to match the test files. The runner is reading from the files, so in this case, they do. The pointer could specify anywhere.
There's nothing that's preventing this from working. Any file and any location within it is fine.
We're not doing this right now. I want to focus on what we're doing. When we figure out how to generate tests, we can have people update their CLI tools. This is the more iterative approach and will get us running something faster. |
I'll give a shot to writing a hook for my own implementation this weekend I think, may have some feedback on the implementer-facing instructions after doing so (though they seem good already). |
To add your implementation, you'll need to start by creating a CLI application that takes the following parameters: | ||
|
||
- `--schema <PATH#POINTER>` - The relative URI to the schema, e.g. `../tests/draft2019-09/additionalItems.json#/0/schema`. | ||
- `--instance <PATH#POINTER>` - The relative URI to the data instance, e.g. `../tests/draft2019-09/additionalItems.json#/0/tests/0/data`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be easier (and more future compatible) if the interface was JSON in and out, rather than CLI arguments, no?
Specifically, implementers wouldn't need to do their own URI resolution if we instead had the runner harness pipe a JSON object with schema
and instance
in it to their stdin.
And on the result side, we may at some point need information back from the CLI, so rather than using exit code, maybe we should instead immediately use {"result": "valid"}
and {"result": "invalid"}
, this way at some point we could have {"result": "invalid", "known_bug": "myrepo/issues/123"}
or whatever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing it this way would also mean we could specify that CLIs should accept JSON terminated by line delimiters so we don't pay startup cost of each CLI repeatedly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing it this way would also mean we could specify that CLIs should accept JSON terminated by line delimiters so we don't pay startup cost of each CLI repeatedly.
We'd have to minimize all of the test files to remove line delimiters. I'm not sure I can do that in a bash script.
Given that I'm learning as I go, all of this is a tall order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean newlines?
Two NL in a row could be used to signal the end of a json blob, rather than removing all newlines and passing the data as a single line.
- `--schema <PATH#POINTER>` - The relative URI to the schema, e.g. `../tests/draft2019-09/additionalItems.json#/0/schema`. | ||
- `--instance <PATH#POINTER>` - The relative URI to the data instance, e.g. `../tests/draft2019-09/additionalItems.json#/0/tests/0/data`. | ||
- `--spec-version <VERSION>` - The draft/version for the schema, e.g. `draft2020-12`. The value for this parameter will match a folder name under `/tests/` in this repo. | ||
<!-- Do we need this parameter? --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see why we'd need it, so maybe kill it until someone asks for it, though adding it would be easier with the above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need to to configure the optional format
tests. Some validators might not have that enabled and we could use this as a signal to the implementation to enable it.
For draft 2020-12 we could use a custom format-required meta-schema, but that wouldn't work for the other drafts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For an initial proof-of-concept I'd leave out all optional tests and stick to just what the specification requires.
Later on, we'll have to think about what config options would be required (and how to signal them to the application) for the various option tests, as there's a mixed bag of a bunch of them in there. (or, take the opportunity to rethink our organization of optional tests, since this keeps coming up as a point of confusion and we're being rather informal about the whole thing right now.) e.g. some tests require format validation to be supported and turned on; some tests require bignum support; some tests require non-standard regex syntax, etc. We may also want to add more optional tests later on for testing annotation collection, or the operation of the content* keywords, etc etc.
|
||
- `docker-image` - The name of your docker image. The `latest` tag will be used. | ||
- `command` - The command to run your CLI. | ||
- `versions` - An array of versions/drafts supported by your implementations. (To save time, unsupported versions will be skipped.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this may cause a rush of PRs when new drafts are released as each person adds the new draft to the file.
Maybe instead we indeed should send the draft version, and specify that the implementer's CLI needs to return an exit status that indicates to skip the draft if it doesn't recognize it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good to me. It would be nice for things to adapt automatically as we add a new spec version or an implementation extends support to a new version in their 'latest' image.
[ | ||
{ | ||
"docker-image": "gregsdennis/json-everything", | ||
"command": "dotnet json-everything.dll", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be an array to avoid the off-chance of needing shell quoting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I deleted my previous "I agree that this doesn't make sense" comment, after thinking about it harder) :)
If a command is provided as a single string, then the implication is it's passed to the shell, which may parse some characters unless they're escaped; if the command is passed as an array of words, then we can bypass the shell entirely and just hand it to exec
(or equivalent).
e.g. perl's exec
and system
functions pass the command through the shell if one argument is provided, but skips that step if multiple arguments are provided, with the presumption that tokenization has already been done (and without having to fuss with escaping characters that the shell might do something with). https://perldoc.perl.org/functions/exec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Again. I'm learning as I go here, so feel free to slow am example or add a commit. I don't know how to translate a string array into a bash command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I was referring to yeah -- and no worries at all! I can leave some commits showing what I mean -- it's true, shell scripting indeed doesn't make some of this trivial.
@gregsdennis are you OK with me closing this as superceded by Bowtie (which does this now at least as much as the PR did)? (Thanks, you def get a ton of the credit for finally pushing for this to happen...) |
Yep. Bowtie looks great. |
Based on a conversation with @karenetheridge in Slack, this is the first step toward building an implementation comparison/support site.
The scripts in this PR will run a given implementation against the entire test suite and generate a report. This report can be used to generate site content.
Still have some work to do: