Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

starlarkjson: a standard json module for Starlark #179

Merged
merged 1 commit into from
Jun 11, 2020
Merged

Conversation

alandonovan
Copy link
Contributor

@alandonovan alandonovan commented Mar 29, 2019

This is a sketch of a standard module for JSON encoding/decoding for Starlark.

It is intended to subsume, generalize, and eventually
replace the ill-conceived struct.to_json method.

See related issues:
bazelbuild/bazel#7896
https://b/23962735 (Googlers only)
https://b/70210417 (Googlers only)
bazelbuild/bazel#7879 (comment)
bazelbuild/bazel#5542

@alandonovan
Copy link
Contributor Author

@laurentlb

# Exercise JSON string coding by round-tripping a string with every 16-bit code point.
def codec(x):
return json.decode(json.encode(x))
codepoints = ''.join(['%c' %c for c in range(65536)])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: add space between % operator and c identifier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funny! Fixed.

func quote(buf *bytes.Buffer, s string) {
if goQuoteIsSafe(s) {
var quoteSpace [128]byte
buf.Write(strconv.AppendQuote(quoteSpace[:0], s))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you trying to avoid memory allocation with this case? I'm not sure how inlining and escape analysis will interact, but it seems like quoteSpace may escape, and you'll end up making a copy on the heap anyway.

Perhaps instead, narrow the optimized case to strings that don't contain anything that needs to be escaped (adding " and \ to the list of bad runes). Then you could use buf.WriteString(s).

Copy link
Contributor Author

@alandonovan alandonovan Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite right; this was reduced from code in herb's JSON encoder which reused the quoteSpace variable as an object field across many calls, but it makes no sense here.

I've moved quote and quoteSpace inside encode, so the closure acts like an object and quoteSpace like a field.

@jmillikin-stripe
Copy link
Contributor

I have an outstanding request with Bazel to add JSON support in the Java implementation (bazelbuild/bazel#3732). To see if we can settle on a reasonable shared JSON API, I sent bazelbuild/starlark#83 as a proposed standard JSON module in the Starlark spec.

Copy link
Contributor Author

@alandonovan alandonovan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Jay, I have finally gotten back to this CL (motivated by yet more bugs in struct.to_json). I've rewritten the decode function and added more tests. Care to take another look?

thanks
A

@jayconrod
Copy link
Collaborator

I'll take a look on Monday if that's all right. I'm just heading out now, and taking a vacation day tomorrow.

@alandonovan
Copy link
Contributor Author

Is something about this CL not giving you a sense of utmost urgency? ; )

Enjoy your vacation.

"go.starlark.net/starlarkstruct"
)

// Module is a Starlark module of JSON-related functions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module json

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// which it converts to JSON by cases:
// - A Starlark value that implements Go's standard json.Marshal
// interface defines its own JSON encoding.
// - None, True, and False are converted their corresponding JSON atoms.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are converted to...

Though maybe just: "null, true, and false, respectively."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

// isASCII reports whether s contains only ASCII.
func isASCII(s string) bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: rename to isPrintableASCII?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// In principle, we could parameterize it to allow the caller to
// control the returned types, but there's no compelling need yet.

type failure string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A brief comment here would be useful. It's not obvious why this is needed without scrolling much further down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return starlark.String(r)

case 'n':
if strings.HasPrefix(s[i:], "null") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given a string like nulll, I think this will read null, then report an unexpected character l after that.

Error messages would be a little clearer if you scan a run of identifier characters, then check if they are one of null, true, false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it this way so that the parser never consumes more than it needs, as we may want a stream decoder in the future (like json.NewDecoder in Go), from which you can read adjacent values such as "truefalse".

i = j

// Unlike most C-like languages,
// JSON disallows a leading before a digit.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a leading 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


// Unlike most C-like languages,
// JSON disallows a leading before a digit.
if z := boolint(num[0] == '-'); num[z] == '0' && z+1 < len(num) && isdigit(num[z+1]) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think num == "-" is possible here, so this might panic.

Also, the boolint logic is a little confusing. Maybe drop boolint and spell it out here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed, simplified, and tested.

Copy link
Contributor Author

@alandonovan alandonovan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

if isASCII(s) {
buf.Write(strconv.AppendQuote(quoteSpace[:0], s))
} else {
data, _ := json.Marshal(s)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just learned about RFC 8259, which mandates UTF-8 encodings for JSON. I haven't looked at the details yet, but perhaps this would simplify the encoder here.

"go.starlark.net/starlarkstruct"
)

// Module is a Starlark module of JSON-related functions.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// which it converts to JSON by cases:
// - A Starlark value that implements Go's standard json.Marshal
// interface defines its own JSON encoding.
// - None, True, and False are converted their corresponding JSON atoms.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

// isASCII reports whether s contains only ASCII.
func isASCII(s string) bool {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// In principle, we could parameterize it to allow the caller to
// control the returned types, but there's no compelling need yet.

type failure string
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return starlark.String(r)

case 'n':
if strings.HasPrefix(s[i:], "null") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it this way so that the parser never consumes more than it needs, as we may want a stream decoder in the future (like json.NewDecoder in Go), from which you can read adjacent values such as "truefalse".

i = j

// Unlike most C-like languages,
// JSON disallows a leading before a digit.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


// Unlike most C-like languages,
// JSON disallows a leading before a digit.
if z := boolint(num[0] == '-'); num[z] == '0' && z+1 < len(num) && isdigit(num[z+1]) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed, simplified, and tested.

This change defines a standard Starlark module for JSON encoding and decoding.
See json.go for documentation.

It is intended to subsume, generalize, and eventually
replace Bazel's ill-conceived struct.to_json method.

The json module is predeclared in the Starlark REPL environment.

See related issues:
bazelbuild/bazel#7896
https://buganizer.corp.google.com/issues/23962735
https://buganizer.corp.google.com/issues/70210417
bazelbuild/bazel#7879 (comment)
bazelbuild/bazel#5542
bazelbuild/bazel#10176
bazelbuild/starlark#83
bazelbuild/bazel#3732

Change-Id: I297ffaee9349eedeeb52f5a88f40636a4095f997
@alandonovan alandonovan merged commit ac23acb into master Jun 11, 2020
@alandonovan alandonovan deleted the json branch June 11, 2020 21:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants