Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return views instead of lists in various operations #203

Open
brandjon opened this issue Jul 15, 2021 · 7 comments
Open

Return views instead of lists in various operations #203

brandjon opened this issue Jul 15, 2021 · 7 comments

Comments

@brandjon
Copy link
Member

Examples include dict.keys, dict.values, dict.items, enumerate, and zip. These operations return lists like in Python 2, but they could be made to return views like in Python 3.

Today, the fact that they return lists means that subscripting with integers is possible, e.g. mydict.keys()[0] to get the first key in the iteration over mydict. This is indeed allowed in Python 2 and not 3. We could make an incompatible change to prohibit this kind of operation. Alternatively, since Starlark has a well-defined iteration order for dictionaries, we could continue to support indexing, though it might not be constant time.

This issue came up during discussion of how to implement efficient views for Bazel's native.existing_rules operation. @tetromino

@stepancheg
Copy link
Contributor

Please elaborate:

for k in d.keys():
  d[1] = 2

Would that be allowed? I. e. iteration over keys should:

  • lock the dict, not the view object, or
  • keys is a sequence type of type which is not list, which may be implemented as COW but this would be an implementation detail, not part of the spec?

@laurentlb
Copy link
Contributor

fyi, the range function returns a view (not a list). You can do the same for native.existing_rules.

@tetromino
Copy link
Collaborator

Re @stepancheg - if we switch to views, I would suggest not allowing integer indexing of such views (e.g. no d.keys()[1], like in Python 3). So this would be an incompatible change.

On the other hand, we do want to optimize membership checking (e.g. we want if "foo" in d.keys() to be O(1) time).

@ndmitchell
Copy link
Contributor

@tetromino note that @stepancheg is asking a different question - if you are iterating over a view, can you modify the underlying object? With views, you might expect the answer is no. Without views, you'd expect it to be yes.

@tetromino
Copy link
Collaborator

Ah, thanks for the explanation. I don't think that there is any reasonable way to support modifying the set of keys of the underlying dict. (And prohibiting it would mirror python3 behavior, where adding or deleting a dict's key during iteration throws a RuntimeError.)

I don't have a strong opinion about allowing modifying the values. But it would be pretty easy to support value modifications imho (maintain a modification counter in the dict, copy the counter into the view; if the counters don't match, the view self-materializes into a list).

@tetromino
Copy link
Collaborator

@brandjon pointed out an implementation concern: once the dict is frozen, the view might be used from multiple starlark threads, so materializing the a view into a list would need to be done in a thread-safe manner.

@haxorz
Copy link

haxorz commented Nov 30, 2023

I just came across some bzl code in Google's codebase that was running into this performance issue. Working around the issue reduced the package loading wall time of a single extreme package (for which this issue was a hotspot) from ~115s to ~16s!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants