-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow StringType and BytesType to have undefined byte lengths #413
Conversation
@adrianisk @mjperrone @cpard @gwukelic @alexdemeo PTAL Also, the spec change: gabledata/recap-website#13 |
cb8860b
to
9db1f36
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Having great existing tests and test frameworks makes this pretty straightforward 💯
recap/types.py
Outdated
if not self.variable and self.bytes_ is None: | ||
raise ValueError("Fixed length bytes must have a length set") | ||
if self.bytes_ is not None and self.bytes_ < 1: | ||
raise ValueError("Bytes must be greater than or equal to 1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you dropping the max length constraint here and for Bytes below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm, good question. I'm not 100% sure what I was thinking here. The spec still says 64-bit signed integer | null
, so if bytes_ is set it should be enforced as such. Good catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I pushed an updated PR here (force push squash). LMK if this looks OK.
- Went back to LONG_MAX for string/bytes/list length
- Moved list length min from 0 to 1 (in accordance with spec)
- Updated 0.3.0.json to reflect maximum=int_max
- Left 0.3.0.md as is since it already reflected these rules
@@ -20,7 +20,7 @@ | |||
|
|||
VALID_SCHEMA_DIR = "tests/spec/valid" | |||
INVALID_SCHEMA_DIR = "tests/spec/invalid" | |||
RECAP_SPEC_JSON_HTTP = "https://recap.build/specs/type/0.2.0.json" | |||
RECAP_SPEC_JSON_HTTP = "https://recap.build/specs/type/0.3.0.json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that this is why the tests are failing, the website change hasn't landed yet. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep! PR for that is here:
9db1f36
to
859777b
Compare
Prior to this commit, Recap StringType and BytesType both required `bytes_` to be set. If unset, a default of 65536 was used. This configuration allowed developers to skip the `bytes` field in the concrete syntax tree (CST), but still required `bytes_` to be set in the abstract syntax tree (AST). @adrianisk and I converged on this design in #284. Recently, @mjperrone pointed out that requiring `bytes_` in the AST is still awkward for JSON. In the absence of an undefined byte_ option, you have to set *something* for JSON--either a magic number or INT/LONG_MAX. Thus far, we defaulted to LONG_MAX. But a defined LONG_MAX and an undefined length are actually two separate states. Allowing a converter to specify an undefined string/byte length permits other converters to decide how to deal with the undefined byte length. I made the change and it fit quite naturally, so I think we're on the right track with this. JSON and Hive metastore both benefited; both support undefined lengths. I've left the other converters alone for now. This means they'll continue to barf if you go from JSON/HMS to Recap to Avro/Proto. I think that's fine for now. We can revisit this later if we want to.
859777b
to
aeaf2fa
Compare
Prior to this commit, Recap StringType and BytesType both required
bytes_
to be set. If unset, a default of 65536 was used. This configuration allowed developers to skip thebytes
field in the concrete syntax tree (CST), but still requiredbytes_
to be set in the abstract syntax tree (AST). @adrianisk and I converged on this design in#284.
Recently, @mjperrone pointed out that requiring
bytes_
in the AST is still awkward for JSON. In the absence of an undefined byte_ option, you have to set something for JSON--either a magic number or INT/LONG_MAX. Thus far, we defaulted to LONG_MAX. But a defined LONG_MAX and an undefined length are actually two separate states. Allowing a converter to specify an undefined string/byte length permits other converters to decide how to deal with the undefined byte length.I made the change and it fit quite naturally, so I think we're on the right track with this. JSON and Hive metastore both benefited; both support undefined lengths.
I've left the other converters alone for now. This means they'll continue to barf if you go from JSON/HMS to Recap to Avro/Proto. I think that's fine for now. We can revisit this later if we want to.