-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a Starlark set data type #290
Conversation
Modeled on the Python 3 set type and the existing implementations in Go, Rust, and the proposed one for Java, with the following differences: * set literals and set comprehensions *not* supported (unlike python3) * `copy()` method *not* supported (unlike python3) because we do not have it on lists or dictionaries. * comparison operators *not* supported (unlike starlark-go and python3) * `update()` method supported (unlike starlark-go) * `isdisjoint()`,`intersection_update()`, `difference_update()`, `symmetric_difference_update()` method supported (unlike starlark-go and starlark-rust) * multiple-argument form of `union()`, `intersection()`, `difference()` and corresponding _update methods supported (unlike starlark-go and starlark-rust). Fixes bazelbuild#264
Can you add a rationale for the differences you're listing? |
Do we need multiple-arguments? They are used like once in a year, and maybe not worth extra complexity. But this is not a big deal. |
@comius - rationales added.
@stepancheg - for |
Thanks! Very nice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Most of my comments are based on comparison with the Starlark-Go spec.
composed from hashable values. Most mutable values, such as lists | ||
and dictionaries, are not hashable, unless they are frozen. | ||
composed from hashable values. Most mutable values, such as lists, | ||
dictionaries, and sets, are not hashable, unless they are frozen. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pre-existing -- and don't change it in this PR -- but I think we changed this behavior so that unhashable types remain unhashable even after freezing. That's currently the behavior in the Java interpreter for Dict, for example. Something we can look at later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Acknowledged.
All the justifications of differences from prior implementations sound good to me.
I feel like "{} is a dict, not a set" is just one of those things you learn when you're dabbling in a Python dialect. But I suppose you could also go the other way, and say that even a non-empty set literal makes it harder to distinguish between sets and dicts on first sight. In either case, even if we wanted a literal syntax, it would be a separate extension. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Following python3 behavior, as explained by brandjon@
@brandjon - following the behavior python3 clearly aspires towards (but doesn't quite attain), I've added a requirement for the argument to set methods to not contain unhashable values. |
spec.md
Outdated
@@ -4070,7 +4076,7 @@ not have any values in common, and `False` otherwise. | |||
|
|||
This is equivalent to `not S.intersection(x)`. | |||
|
|||
`isdisjoint` fails if `x` is not an iterable sequence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you drop the "not an iterable sequence" wording because it's implied/obvious from context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - the first line already states that x
must be an iterable sequence.
spec.md
Outdated
element, use [`discard`](#set·discard) instead. If you need to remove multiple | ||
elements from a set, see [`difference_update`](#set·difference_update) or the | ||
[`-=`](#sets) augmented assignment operation. | ||
`remove` fails if the set does not contain `x` (in particular, if `x` is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Do you mean "in particular, even if x
is unhashable"? Without the "even" I'm not sure what "in particular" is emphasizing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put the "in particular" to indicate that "x
is unhashable" is a potentially non-obvious subcase of "the set doesn't contain x
"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworded to be less confusing.
Discussed offline but FTR: Apparently the behavior in Python, that [1] Source: reference doc quoted in first post of this thread. |
Modeled on the Python 3 set type and the existing implementations in Go, Rust, and the proposed one for Java, with the following differences:
{}
can be confusing (empty set or empty dict?)copy()
method not supported (unlike python3)update()
method supported (unlike starlark-go)isdisjoint()
,intersection_update()
,difference_update()
,symmetric_difference_update()
method supported (unlike starlark-go and starlark-rust)s.intersection_update(rhs)
, andrhs
was a non-set sequence, we'd need to instead dos &= set(rhs)
, which would mean allocating an unnecessary temporary set for rhs's elements.union()
,intersection()
,difference()
and corresponding _update methods supported (unlike starlark-go and starlark-rust)|
,&
,-
, and^
operators (and their augmented forms) require both sides to be sets if lhs is a set (unlike starlark-go)|
operator already requires both sides to be dicts if lhs is a dict.Fixes #264
@adonovan @stepancheg FYI